Sunday, October 16, 2011

Running Galacticus in Parallel

Galacticus is very well suited to the "embarrassingly parallel" approach to dividing the work of computing a model between multiple CPUs. Typically, each merger tree in a Galacticus model evolves independently from all others. So, each available CPU can work on a separate merger tree, with the results being combined once all trees have been computed. In this tutorial, we describe two ways in which you can run Galacticus in parallel.


Under v0.9.1 of Galacticus, you can use OpenMP parallelism to automatically run merger trees on all available cores of a shared memory machine. Galacticus v0.9.1 is built with OpenMP parallelism by default, so simply run it as usual. Each available OpenMP thread will request a tree to process. Once it's finished with that tree, it will request another. This continues until all trees have been processed. The result is a single Galacticus output, which can be used just as any other. It is a good idea to set the input variable [mergerTreeBuildTreesProcessDescending] to true when using merger trees built with Press-Schechter techniques under OpenMP. This causes the most massive trees to be assigned to threads first, which results in better load balancing.

Multiple Worker Tasks

It is possible to split a single Galacticus model across CPUs on multiple different machines. Two input parameters, [treeEvolveWorkerCount] and [treeEvolveWorkerNumber], are used to perform the split. [treeEvolveWorkerCount] specifies how many workers the model should be split between, while [treeEvolveWorkerNumber] indicates the specific worker for a particular input file.

For example, to split a model between four machines, set [treeEvolveWorkerCount]=4 in each of four identical input files. Then, in each input file set [treeEvolveWorkerNumber] to one of 1, 2, 3, and 4. Then, invoke one instance of Galacticus on each machine, each using a different input file. Merger trees will be distributed between workers, and each will create its own output file.

Two scripts are available to help with this process. The first, scripts/aux/, will take a Galacticus input file and generate a set of input files which divide this model between a given number of workers. To use it, create your input file and then use, for example:

scripts/aux/ inputParameters.xml 4

to divide the model between four workers. The script will create a set of input parameter files, named inputParameters_1.xml, inputParameters_2.xml, inputParameters_3.xml,and inputParameters_4.xml. The output file name for each input file will be the same as in the original inputParameters.xml file but will a numerical suffix appended.

The second script, scripts/aux/, will merge together the outputs from all workers. Simply use:

scripts/aux/ model1.hdf5 model2.hdf5 model3.hdf5 model4.hdf5 model.hdf5

for example, to merge the four outputs model1.hdf5 model2.hdf5 model3.hdf5 model4.hdf5 into a single Galacticus file, model.hdf5 which can then be analyzed as normal. You can also use wildcards, e.g.:

scripts/aux/ model?.hdf5 model.hdf5

would achieve the same merge.

No comments:

Post a Comment