Those bastards!!

Those bastards!!

I ran into more problems trying to run OpenBUGS on a Bayesian spatial regression model today. The problems are laid out pretty simply in the picture above: I ran into an illegal memory write. I know this is not the fault of my model, as it runs smoothly for 50, 500 or 1000 iterations. This illegal memory write happens somewhere between 6000 and 6100 iterations.

This model is a simple version of a larger model I’m building. It has data on about 1800 small areas collected over four time points, with age and sex covariates, with a Poisson distribution for the outcome and an adjacency matrix for the 1800 areas. The data are laid out with the time points and age- and sex-values as separate records, so for one municipality I have 4*6*2=24 observations. I can’t use other packages for this because they assume one observation per municipality (i.e a very simple relationship between data structure and adjacency matrix). So I’m using OpenBUGS or WinBUGS. The ultimate goal is to extend to 30 time points and 18 age categories, with a more complex time and age structure, but I want to get this simple model running first because I’m running into major computational issues.

Today I identified that these computational issues can’t be solved. The reason for the illegal memory write is that OpenBUGS has a 1.6Gb RAM limit. People say this but it’s hard to confirm; however, in digging through critiques of BUGS I discovered it is written in Component Pascal and compiled in BlackBox, and a bit more digging reveals BlackBox has a 1.6Gb RAM limit.

This is frankly ridiculous. It’s understandable that there would be a RAM limit in a 32-bit system, but OpenBUGS has been around for about 9 years now and no one has upgraded it to handle either multi-core computers or 64 bit architecture. Given that OpenBUGS is designed to handle complex, processor-demanding simulation-heavy Bayesian models, the decision to write it in such a restrictive framework is ridiculous. I’m a strong critic of open source systems but if it had been written for R it would at least be able to use the 64-bit architecture, even though R has been late to the multi-core party. I bought a PC with 128 Gb of RAM specifically for this big project, and I might as well have bought a laptop with the minimum RAM. For the model I ultimately aim to build I expect I will need about 100,000 iterations to get stability, which means that OpenBUGS will never get there. The only way to run this model without a workaround is to either 1) write the full adjacency matrix by hand and implement it directly [something I am considering] or 2) recompile the source code (which I think might not be available) in a different Pascal compiler. I have no idea how to even start with that.

I have considered two workarounds, however, though I don’t know whether either of them might work.

  1. Save and rerun: the reason that OpenBUGS hits its RAM limit is that it saves all iterations in RAM, but I think this is possibly bad design. So one option could be to run the model to 5000 iterations, then save the results, shut down OpenBUGS, reopen OpenBUGS, load the model file, and then use the update functions to run another 5000 iterations on what has already been run. I think this can be done (running additional updates on a past model) but I’m not sure. If this works I just need to run one set of runs a night for about 3 weeks, and I’ll get my model.
  2. Rerun with sequential initial values: If method 1 doesn’t work, another option that I am very sure will work is to run the model for 5000 iterations, extract all estimated values from the model, then start a new model of 5000 runs with the past estimated values as the initial values for the next model. I’m pretty sure it will start off where it left off, assuming I correctly specify them all, although there might be small jumps in the trace, but ultimately it’ll get where it needs to go. But restarting the model is going to take a lot of time unless I can find a way to loop through and extract values automatically (I think I can’t). So probably a month to run the model.

Ridiculous! And even if I had a supercomputer I couldn’t speed it up …

Today Stata 14 was released, and it finally has built in functionality for a wide range of Bayesian statistical tasks. I’m hoping that they’ll introduce conditional autoregression and the BYM model in version 15. Really, if you have a grant that can afford a single Stata license, it’s really worth getting. You just can’t rely on open source statistical packages. Only use them when you must!