Wednesday, August 24, 2011

The Tortoise and the Hare

It's not uncommon that I hear from criticism that "Galacticus is too slow...". I can see this point of view, but I think it's missing something really, really crucial - something that Aesop understand in his fable of "The Tortoise and the Hare". The bottom line: it's better to get there slowly than not at all. Or, more specifically, it's better to get the right answer slowly than the wrong answer quickly.

First of all, we have to define what we mean by "slow" in this case. Typically, it's used to mean that Galacticus is slow compared to other, comparable models of galaxy formation.  That's correct. But the comparison is only really fair if the models being compared are solving the same physics, at the same level of complexity. Galacticus typically runs more complicated models of black hole evolution and star formation than other models for example. Fortunately, Galacticus can mimic the implementations of other models, as I discussed here. So, let's take the Baugh et al. (2005) model and run it within Galacticus, then compare it to the same model run with its original code.

It's already been demonstrated (see this article) that Galacticus produces results which are converged to the level of a percent or better due to the way that it constructs galaxy merger trees and solves the differential equations that describe galaxy formation. To do a fair comparison we need to run the Baugh et al. (2005) model using its original code to a comparable level of accuracy. The accuracy in that case is controlled by the number of steps, Nstep, used in solving the baryonic physics - this is a fixed number in the original Baugh et al. (2005) code, whereas in Galacticus steps are chosen adaptively to preserve a specified tolerance.

A typical number used is Nstep=100. So, we'll compare results for Nstep=100 and Nstep=300. As you can see in the plot below (which shows the mean masses of hot and cold gas and stars in dark matter halos of different masses at z=0), 100 steps is clearly not enough to get a comparable level of convergence. For example, the mean stellar mass in high mass halos is systematically reduced when using 300 steps by around 5%.


OK, so let's try Nstep=1000:
It's an improvement (at least for high-mass halos - not sure what's going on in the low-mas systems....) but still not perfect.

So, how long do these calculations take. Here are benchmarks for running these models (same physics, same implementation, same set of merger trees, same computer - so they're directly comparable):

Galacticus         :  31208 seconds
Original (Nstep= 100):   2070 seconds
Original (Nstep= 300):  22270 seconds
Original (Nstep=1000): 262866 seconds

So, Galacticus is certainly slower than the original model run with Nstep=100, but very similar in speed to the Nstep=300 case and much faster than Nstep=1000.

So, is Galacticus slow? Well, it depends what you want! It's certainly slower than some other models when they're run at their usual level of accuracy. Match accuracies though and it's as fast, if not faster.

Should we care about 5% accuracy though? After all, much of the physics going in to the models is uncertain by significant amounts (certainly much larger than 5%). I think the answer is that we should care. First, in any kind of computational physics we really should absolutely always do convergence tests and get our results to whatever accuracy you want. I'd aim for 1% in these models. Second, Galacticus and other models in its class are being used in large scale, Bayesian searches of the model parameter space to generate constraints on parameters by fitting to observational datasets. Many of those observational data have errors that are much smaller than 5% - we really want the model to be numerically accurate at a higher level than the data. Otherwise, the derived constraints will be subject to biases that arise purely from numerical artifacts that we could have removed by performing a more accurate calculation.

I'm hoping that we'll see more emphasis placed on assessing the accuracy of results from galaxy formation models in the future......

Tuesday, August 23, 2011

Galacticus v0.9.0 Released!

After a little over one year of development, I'm happy to announce that Galacticus v0.9.0 is now the official, stable version of Galacticus. This means that v0.0.1 gets retired (so it won't get further bug fixes) and development shifts to v0.9.1. More importantly, it means that v0.9.0 is the version you should use by default - no new features will be added making it stable, but bug fixes will be made where necessary.

While the code itself is now fixed, there are a couple of things still to come: an automated install script and some new parameter sets which should give good matches to observational data with v0.9.0.

There's a lot that's new in v0.9.0 - the full list of changes can be found here - but here are some of the highlights:

Black Holes

Since the standard black hole implementation has grown in complexity, I introduced a much simpler version which can be used when you don't care about having such a detailed treatment of black holes. In the simple black hole implementation the black hole mass grows at a rate proportional to the spheroid star formation rate and feedback physics is implemented in a simple way with fixed efficiencies.

Star Formation Rates

I've added four new star formation timescale algorithms for disks, all based on the surface density profile. These are the classic Kennicutt-Schmidt, the extended Schmidt law, the Biltz & Rosolowsky method and the Krumholz-McKee-Tumlinson method. These provide a better match of observed galaxies, either because they're empirically calibrated to do so, or because they actually include the relevant physics.

Chemistry and Molecular Hydrogen Cooling

Galacticus now has quite general support for tracking different chemical species and their reactions. This is used to follow a network of molecular hydrogen reactions for early Universe physics which is exploited in a molecular hydrogen cooling function.

Galaxy Clustering and the Halo Model

Halo model-based calculations of galaxy clustering can now be easily made using Galacticus. Biases and Fourier transformed dark matter halo density profiles can be output, and scripts which use these to constuct power spectra and correlation functions are provided. When merger trees from N-body simulations are used, positional information can be output directly, permitting direct calculations of galaxy clustering statistics.

Redshift Surveys

Simple support is provided for constructing redshift surveys of galaxies directly from Galacticus outputs, providing redshifts, comoving and luminosity distances.

Dust Absorption/Emission with Grasil

I've added a module which automates processing of Galacticus galaxies through the Grasil dust radiative transfer code, allowing calculation of SEDs from UV through radio wavelengths.

Running Grids of Models

The Run_Galacticus.pl script now allows for running of nested grids of parameters - makes it easy to explore model parameter space.

What's Next?

Work is already underway on v0.9.1 of Galacticus. Besides a significant restructuring of Galacticus' internal data structures (to make it easier to expand and modify components) and updated physics, expect more extensive support for C/C++ interoperability and both OpenMP and MPI parallelism. Also, I have a few optimization ideas that could potentially speed up the code significantly....