Recently a major economics paper was found to contain basic excel errors, among other problems, and an amusing storm of controversy is growing around the paper. The controversy arises because the paper was apparently quite influential in promoting the most recent round of austerity politics in the western world, and the authors themselves used it actively in this way. The authors even managed to find a magic number – 90% – at which government debt (as a proportion of GDP) throttles growth, a threshold that many small government activists and “sustainable deficit” types have been seeking for years. It’s like manna from heaven!
There’s been a lot of hilarity arising from this, about how woeful the economics field is and about how vulnerable policy-makers on crucial issues like government spending can be to even quite terrible research that supports their agenda. But there has also been some criticism on statistics and academic blogs about the use of excel for advanced analysis, and what a bad idea it is. Andrew Gelman has a thread debating how terrible excel is as a computational tool, and Crooked Timber has a post (with excellent graphic!) expressing incredulity that anyone would do advanced stats in excel. While I agree in general, I feel an urgent need to jump to the defense of MS Excel.
MS Excel is great. It’s much, much more convenient than a calculator, and it can be used to do quite complex calculations (as in sums and multiplications) in multiple directions that would take ages on a calculator. On most computers now the calculator is buried or, if you’re a windows user, crap, and if you need anything more than addition it’s much more convenient to drag out excel. Sure it takes a moment to load compared to your calculator function, but it is so much easier to compare numbers, to calculate exponents and logs, and to present simple results in excel than in a calculator. As a simple case in point: if you get regression coefficients from Stata you can copy and paste them into excel and exponentiate to get relative risks, etc.; then you copy the formulas below, run a new regression model (with, e.g. random effects that weren’t in the previous one) and paste the results to enable you to compare between models quickly and easily. Similarly, if you’re checking a paper to see if they calculated odds ratios or relative risks, you can chuck those numbers into excel and do the comparisons with the contingency table right there in front of you. It offers a simple, graphically convenient way to visualize numbers. This is especially useful when the task you’re approaching is conceptually very simple (like a contingency table) but takes a bit of time to do on a hand calculator, and takes a bit of time to convert to the file formats required in Stata or R. In the time it takes me to think about how to structure the problem, input four lines of data to R, and then write the code to calculate the odds ratios, I can do the whole thing in excel, have the contingency table in front of me to check I’ve made no transcription errors from the paper, and fiddle quickly with changing numbers.
If you’re doing cost-effectiveness analysis in TreeAge (shudder) or R, excel is a really useful tool both for outputting results to something that is vaguely attractive to use, and for doing ballpark calculations to check that your models are behaving reasonably. This is especially useful if you’re doing stochastic Markov models, that can take hours or days to run in TreeAge, because you can’t trust software like that to give you the correct answer if you try to treat your stochastic model as a simple decision tree (because of the way that TreeAge faffs around with probability distributions, which is non-intuitive). Make a few simple assumptions, and you can do approximate calculations yourself in excel, and fiddle with key numbers – cohort size or a few different parameters – and see what effect they have.
Recently I was helping someone with survival analysis and she was concerned that her definition of time to drop out was affecting her results. She conducted a sensitivity analysis in Stata to see what effect it was having, and although with correct programming she could have produced all the material she needed in Stata, the time it takes to do this and debug your code can be time-consuming if you aren’t a natural. It’s much easier with modern machines to just run the regression 10 times with different values of drop-out time and plot the output hazard ratios in excel.
So, I think excel is a very useful tool for advanced modeling, precisely because of its ease of use and its natural, intuitive feel – the properties that recent excel bashers claim make it such a terrible device. While I definitely think it should not be used for advanced models themselves, I find it a hugely valuable addition to the model-building process. Reproducible code and standardized tools are essential for publishable work, but unless you are one of those people who never does any fiddling in the background to work out what’s going on in your model, excel will turn out to be your go-to tool for a wide range of support tasks.
In any case, the bigger problem with Rogoff and Reinhart’s work was not the excel error. Even if they had got the excel code right, their results would still have been wrong because their modeling method was absolutely appalling, and should never have seen the light of day, even at a conference. The main flaws in their work were twofold:
- They binned years together, essentially giving different years different weights in the final model
- They stripped the years out of their time series context, so crucial information contained in the time ordering of deficits and growth was lost
I think the second flaw is the most specifically terrible. By using this method they essentially guaranteed that they would be unable to show that Keynesian policies work, and they stripped the cause-effect relationship from all data collected in the Keynesian era (which lasted from the start of their data series to about 1970). In the Keynesian era, we would expect to see a sequence in which deficit increases follow negative growth, so unless the negative growth periods are very short and random, Reinhart and Rogoff’s method guarantees that this looks like an association between negative growth and higher deficits. If Keynesian policies actually work, then we would subsequently see an increase in growth and a reduction in deficits – something that by design in Reinhart and Rogoff’s model would be used to drive the conclusion that higher debt causes lower growth.
In short, no matter what package they used, and no matter how sophisticated and reproducible their methods, Reinhart and Rogoff’s study was designed[1] to show the effect it did. The correct way to analyze this data was through the presentation of time series data, probably analyzing using generalized least squares with a random effect for country, or something similar. Using annual data I think it would probably be impossible to show the relationship between debt and growth clearly, because recessions can happen within a year. But you could probably achieve better, more robust results in excel using proper time series data than you could get in R from Reinhart and Rogoff’s original method.
The problem here was the operator, not the machine – something which should always be remembered in statistics!
—-
fn1: I use the term “was designed” here without any intention to imply malfeasance on the part of the authors. It’s a passive “was designed”.
April 24, 2013 at 6:51 pm
“Using annual data I think it would probably be impossible to show the relationship between debt and growth clearly, because recessions can happen within a year.”
If we accept the “technical” definition of a recession [1], this is true. But for the purposes of a Keynesian/monetarist comparison it’s irrelevant. The annual data should be good enough as the only way a partial year result is relevant is if the partial year is terrible, but made up for by a fantastic second part of the year.
For example, if Keynesian or monetarist policies reliably resulted in a -6% growth in the first two quarters followed by two quarters of 8+% growth then it’d be valid to argue that the short term pain is irrelevant – you just strap yourself in for the bumpy ride to avoid a multi-year recession.
I’m not sure how your time series would handle it, but I’d have to say that 1) the long term growth trends and 2) the results over multiple years after a recession are more important. This is because I personally would prefer a short sharp recession followed by more recovery compared to a decade of crappy growth that ruins the dreams of a generation. [3]
[1] Two quarters of negative growth [2]
[2] I’ve seen some opinions sneering at this definition as being a definition used by journalists who want a simple definition.
[3] Note that no practical theory I’m aware delivers what I’d prefer, primarily due to political constraints. Some purist theories promise it, but generally along the lines of “And now we swing around the black hole to speed up, don’t worry about falling in.”
April 24, 2013 at 8:32 pm
Paul, I think also in reality any policy response (whether Keynesian, monetarist, fascist or unicornist) will tend to be delayed by the realities of political processes, like budget delays and such like, so one would expect that there would be a lag between e.g. negative growth and a deficit pump-priming response. This lag could be quite long, depending on the nation, which opens a nasty challenge for analysts – you need to model a different lag for each country. This could be done but would demand very careful drilling into the data (examining each country’s policy statements over the whole series, checking that statements were enacted, looking at budgetary documents, etc). If you didn’t do this you would see a massive washout of the effects of the policy. I have read that this paper was dashed off for a conference, so it’s highly unlikely that such a process would have been possible – it’s a huge amount of work[1]. But the method used in this paper would destroy these effects even if they were all strictly regimented to occur at a specified time lag from the negative growth transition.
It’s really disturbing that this paper was a) published and b) lauded. But I think we need to be careful of ascribing too much influence to it: the advocates of austerity would use pigs’ guts to justify their methods if necessary, because austerity vs. spending are a moral, rather than a scientific, debate in essence. I think there is an economic theory that properly analyzes the role of deficits and growth (modern monetary theory with some moderators) but it’s very conveniently excluded from the halls of power, which leaves an empty and dry debate between two sets of moralists (keynesians and monetarists). Plus of course, the concept of growth needs a little more serious criticism than it has received to date, on environmental grounds, but its critics are either consistently wrong (e.g. Ehrlich), fringe environmentalists, or completely excluded from the debate; and their opponents, the Simonists, are equally crazy. So we’re left with a “technical” debate being conducted on purely moral grounds. It’s depressing, and in my opinion it is something the economics profession should be ashamed of.
—
fn1: this is what we have grants for!
April 24, 2013 at 8:37 pm
re: my first paragraph. Actually I wrote a paper on how to do that. Maybe economists should pay more attention to the epidemiological literature!
April 25, 2013 at 7:03 am
“This could be done but would demand very careful drilling into the data”
Hmm. I can see something of what you suggest here, but my instinct would be to simplify the data and drop it all into buckets, then do a second pass to look for outliers.
For example, call 1945 to 1970 Keynesian and the 1970 to 2000 monetarist. then simply examine the average annual growth rate. The second pass to look for outliers would ensure you don’t have a policy delivering -10%, -10%, -10%, +100% results, as that sort of wild variation would be massively disruptive, even if it averages to a better result.
Once you’ve accepted that higher average growth [1] and avoidance of prolonged depressions are the targets, trying to draw direct correlations would seem to highly risky due to fact that all this stuff turns on animal spirits and the political process.
Another thing you could do with this process is say 1970 to 1975 was a transitional period, so lets just drop the data for those years. And of course the entire thing would need to be done on a country by country basis to properly align years and policies.
My initial look at a story on this suggested they were trying to do something like what I described and they simply dropped a bunch of data.
[1] I do agree that a blind adherence to a target of a number going up is bad news in the long term. I’ll also point out that I instinctively reject Gross Domestic Happiness as a measure too.
April 25, 2013 at 8:10 am
If you’re going to look at annual average growth rates in eras of Keynesianism and monetarism, then within those eras you essentially are arguing for a linear regression (to measure the growth rates) and a model for a change in growth rate at the junction (to model the overall policy effect). So even if you aggregate the data, within that aggregation you’re going to be needing to use data on individual years to get the information you will analyze. What you’re suggesting is a very crude form of a difference-in-difference model, but without using all the variability of the annual data. This would be preferable to the more detailed analysis I proposed only if you were very confident that a detailed record search would not produce any information better than “animal spirits and the political process.” I’m going to go out on an optimistic limb here and claim that a well-designed research strategy (for the records and political data) would produce better outcomes than this!
Note though that even though I strongly disapprove of the aggregation method you suggest, it would still be more rigorous than the Reinhart/Rogoff paper, because it retains some sense of the temporal order of the data. From what I understand of the Reinhart/Rogoff paper, they dumped years in bins without regard for their sequential order, which means any time-dependent factor (such as, e.g., post war growth, or short-term declines due to exogenous shocks) would not be caught in the data. I just don’t understand how they could have got away with such a heavy-handed approach.
I think it might be better, instead of growth, to look at the purpose of the economic policies. (IMHO) the purpose of economic policy should not be to secure growth, but to secure employment, good health, and other such indicators of human wellbeing. So for example, you could model 20 years of zero growth in Japan or you could model 20 years of continuing increases in life expectancy, low unemployment, high savings, etc. Perhaps this could be joined together into an index of some kind, or the analysis restricted to specific questions. It seems a little less fetishistic than just analyzing “growth.”
Incidentally, yesterday I saw a presentation by an insurance guy, in which he observed that for the last 100 years (? don’t quote me ?) we have been “sleeping for free”: every morning when we wake up average human life expectancy has increased by 6 hours. It’s completely fallacious use of stats but a very cute description of the trend in human welfare.
April 26, 2013 at 8:25 am
“I think it might be better, instead of growth, to look at the purpose of the economic policies. (IMHO) the purpose of economic policy should not be to secure growth, but to secure employment, good health, and other such indicators of human wellbeing. So for example, you could model 20 years of zero growth in Japan or you could model 20 years of continuing increases in life expectancy, low unemployment, high savings, etc.”
Hmm. I can understand where you’re coming from and empathise with it, but I think you’re probably building unwanted assumptions on people’s preferred outcomes/spending.
As a simple example, we know that people who smoke and drink will die younger and that they know this. But they still smoke and drink [1].
Your focus on selecting a set of measures and then managing to them will drive a focus on those at the expense of meaures not taken [2]. The use of “real growth” (which is dead easy to obtain from growth and inflation [3]) then lets us see if people get “more of the stuff they choose to want”.
Thus real growth can be viewed as “enabling people to have more of their dreams come true” [4]. It reflects that everyone gets to make their own choice and that their choice is what drives actual happiness – not what you or I think should make them happy.
NOTE: Real growth needs to have some element of universal distribution to allow “more of their dreams come true”. Screw the idea that only a small portion of the populace gets better and the rest get the shaft. But on this topic we start needing to discuss degrees of relative growth that are acceptable.
[1] And don’t go to the gym and eat crappy food and play computer for way to long and worship Satan due to role playing games. In short, every option that no sane person would select is selected by a wide variety of sane people.
[2] What gets measured being what gets managed is an elementary management principle, leading to all sorts of fun and games in the workplace.
[3] The inflation measure is going to have the same distortions I noted as your choice of metrics, but hopefully it reflects people’s actual spending habits better.
[4] Even when that dream is a better dishwasher or another pack of smokes.