Uhtred son of Uhtred, regular ale drinker, who I predict will die of injury (but will go to Valhalla, unlike you you ale-sodden wretch)

There has been some fuss in the media recently about a new study showing no level of alcohol use is safe. It received a lot of media attention (for example here), reversed a generally held belief that moderate consumption of alcohol improves health (this is even enshrined in the Greek food pyramid, which has a separate category for wine and olive oil[1]), and led to angsty editorials about “what is to be done” about alcohol. Although there are definitely things that need to be done about alcohol, prohibition is an incredibly stupid and dangerous policy, and so are some of its less odious cousins, so before we go full Leroy Jenkins on alcohol policy it might be a good idea to ask if this study is really the bees knees, and does it really show what it says it does.

This study is a product of the Global Burden of Disease (GBD) project, at the Institute for Health Metrics and Evaluation (IHME). I’m intimately acquainted with this group because I made the mistake of getting involved with them a few years ago (I’m not now) so I saw how their sausage is made, and I learnt about a few of their key techniques. In fact I supervised a student who, to the best of my knowledge, remains the only person on earth (i.e. the only person in a population of 7 billion people, outside of two people at IHME) who was able to install a fundamental software package they use. So I think I know something about how this institution does its analyses. I think it’s safe to say that they aren’t all they’re cracked up to be, and I want to explain in this post how their paper is a disaster for public health.

The way that the IHME works in these papers is always pretty similar, and this paper is no exception. First they identify a set of diseases and health conditions related to their chosen risk (in this case the chosen risk is alcohol). Then they run through a bunch of previously published studies to identify the numerical magnitude of increased risk of these diseases associated with exposure to the risk. Then they estimate the level of exposure in every country on earth (this is a very difficult task which they use dodgy methods to complete). Then they calculate the number of deaths due to the conditions associated with this risk (this is also an incredibly difficult task to which they apply a set of poorly-accredited methods). Finally they use a method called comparative risk assessment (CRA) to calculate the proportion of deaths due to the exposure. CRA is in principle an excellent technique but there are certain aspects of their application of it that are particularly shonky, but which we probably don’t need to touch on here.

So in assessing this paper we need to consider three main issues: how they assess risk, how they assess exposure, and how they assess deaths. We will look at these three parts of their method and see that they are fundamentally flawed.

Problems with risk assessment

To assess the risk associated with alcohol consumption the IHME used a standard technique called meta-analysis. In essence a meta-analysis collects all the studies that relate an exposure (such as alcohol consumption) to an outcome (any health condition, but death is common), and then combines them to obtain a single final estimate of what the numerical risk is. Typically a meta-analysis will weight all the risks from all the studies according to the sample size of the study, so that for example a small study that finds banging your head on a wall reduces your risk of brain damage is given less weight in the meta-analysis than a very large study of banging your head on a wall. Meta-analysis isn’t easy for a lot of reasons to do with the practical details of studies (for example if two groups study banging your head on a wall do they use the same definition of brain damage and the same definition of banging?), but once you iron out all the issues it’s the only method we have for coming to comprehensive decisions about all the studies available. It’s important because the research literature on any issue typically includes a bunch of small shitty studies, and a few high quality studies, and we need to balance them all out when we assess the outcome. As an example, consider football and concussion. A good study would follow NFL players for several seasons, taking into account their position, the number of games they played, and the team they were in, and compare them against a concussion free sport like tennis, but matching them to players of similar age, race, socioeconomic background etc. Many studies might not do this – for example a study might take 20 NFL players who died of brain injuries and compare them with 40 non-NFL players who died of a heart attack. A good meta-analysis handles these issues of quality and combines multiple studies together to calculate a final estimate of risk.

The IHME study provides a meta-analysis of all the relationships between alcohol consumption and disease outcomes, described as follows[2]:

we performed a systematic review of literature published between January 1st, 1950 and Dec 31st 2016 using Pubmed and the GHDx. Studies were included if the following conditions were met. Studies were excluded if any of the following conditions were met:

1. The study did not report on the association between alcohol use and one of the included outcomes.

2. The study design was not either a cohort, case-control, or case-crossover.

3. The study did not report a relative measure of risk (either relative risk, risk ratio, odds-ratio, or hazard ratio) and did not report cases and non-cases among those exposed and un-exposed.

4. The study did not report dose-response amounts on alcohol use.

5. The study endpoint did not meet the case definition used in GBD 2016.

There are many, many problems with this description of the meta-analysis. First of all they seem not to have described the inclusion criteria (they say “Studies were included if the following conditions were met” but don’t say what those conditions were). But more importantly their conditions for exclusion are very weak. We do not, usually, include case-control and case-crossover studies in a meta-analysis because these studies are, frankly, terrible. The standard method for including a study in a meta-analysis is to assess it according to the Risk of Bias Tool and dump it if it is highly biased. For example, should we include a study that is not a randomized controlled trial? Should we include studies where subjects know their assignment? The meta-analysis community have developed a set of tools for deciding which studies to include, and the IHME crew haven’t used them.

This got me thinking that perhaps the IHME crew have been, shall we say, a little sloppy in how they include studies, so I had a bit of a look. On page 53-55 of the appendix they report the results of their meta-analysis of the relationship between atrial fibrillation and alcohol consumption, and the results are telling. They found 9 studies to include in their meta-analysis but there are many problems with these studies. One (Cohen 1988) is a cross-sectional study and should not be included, according to the IHME’s own exclusion criteria. 6 of the remaining studies assess fribillation only, while 2 assess fibrillation and fibrial flutter, a pre-cursor of fibrillation. However most tellingly, all of these studies find no relationship between alcohol consumption and fibrillation at almost all levels of consumption, but their chart on page 54 shows that their meta-analysis found an almost exponential relationship between alcohol consumption and fibrillation. This finding is simply impossible given the observed studies. All 9 studies found no relationship between moderate alcohol consumption and fibrillation, and several found no relationship even for extreme levels of consumption, but somehow the IHME found a clear relationship. How is this possible?

Problems with exposure assessment

This problem happened because they applied a tool called DISMOD to the data to estimate the relationship between alcohol exposure and fibrillation. DISMOD is an interesting tool but it has many flaws. Its main benefit is that it enables the user to incorporate exposures that have many different categories of exposure definition that don’t match, and turn them into a single risk curve. So for example if one study group has recorded the relative risk of death for 2-5 drinks, and another group has recorded the risk for 1-12 drinks, DISMOD offers a method to turn this into a single curve that will represent the risk relationship per additional drink. This is nice, and it produces the curve on page 54 (and all the subsequent curves). It’s also bullshit. I have worked with DISMOD and it has many, many problems. It is incomprehensible to everyone except the two guys who programmed it, who are nice guys but can’t give decent support or explanations of what it does. It has a very strange response distribution and doesn’t appear to apply other distributions well, and it has some really kooky Bayesian applications built in. It is also completely inscrutable to 99.99% of people who use it, including the people at IHME. It should not be used until it is peer reviewed and exposed to a proper independent assessment. It is application of DISMOD to data that obviously shows no relationship between alcohol consumption and fibrillation that led to the bullshit curve on page 54 of the appendix, that does not have any relationship to the observed data in the collected studies.

This also applies to the assessment of exposure to alcohol. The study used DISMOD to calculate each country’s level of individual alcohol consumption, which means that the same dodgy technique was applied to national alcohol consumption data. But let’s not get hung up on DISMOD. What data were they using? The maps in the Lancet paper show estimates of risk for every African and south east Asian country, which suggests that they have data on these countries, but do you think they do? Do you think Niger has accurate estimates of alcohol consumption in its borders? No, it doesn’t. A few countries in Africa do and the IHME crew used some spatial smoothing techniques (never clearly explained) to estimate the consumption rates in other countries. This is a massive dodge that the IHME apply, which they call “borrowing strength.” At its most egregious this is close to simply inventing data – in an earlier paper (perhaps in 2012) they were able to estimate rates of depression and depression-related conditions for 183 (I think) countries using data from 97 countries. No prizes to you, my astute reader, if you guess that all the missing data was in Africa. The same applies to the risk exposure estimates in this paper – they’re a complete fiction. Sure for the UK and Australia, where alcohol is basically a controlled drug, they are super accurate. But in the rest of the world, not so much.

Problems with mortality assessment

The IHME has a particularly nasty and tricky method for calculating the burden of disease, based around a thing called the year of life lost (YLL). Basically instead of measuring deaths they measure the years of your life that you lost when you died, compared to an objective global standard of life you could achieve. Basically they get the age you died, subtract it from the life expectancy of an Icelandic or Japanese woman, and that’s the number of YLLs you suffered. Add that up for every death and you have your burden of disease. It’s a nice idea except that there are two huge problems:

  • It weights death at young ages massively
  • They never incorporate uncertainty in the ideal life expectancy of an Icelandic or Japanese woman

There is an additional problem in the assessment of mortality, which the IHME crew always gloss over, which is called “garbage code redistribution.” Basically, about 30% of every country’s death records are bullshit, and don’t correspond with any meaningful cause of death. The IHME has a complicated, proprietary system that they cannot and will not explain that redistributes these garbage codes into other meaningful categories. What they should do is treat these redistributed deaths as a source of error (e.g. we have 100,000 deaths due to cancer and 5,000 redistributed deaths, so we actually have 102500 plus/minus 2500 deaths), but they don’t, they just add them on. So when they calculate burden of disease they use the following four steps:

  • Calculate the raw number of deaths, with an estimate of error
  • Reassign dodgy deaths in an arbitrary way, without counting these deaths as any form of uncertainty
  • Estimate an ideal life expectancy without applying any measure of error or uncertainty to it
  • Calculate the years of life lost relative to this ideal life expectancy and add them up

So here there are three sources of uncertainty (deaths, redistribution, ideal life expectancy) and only one is counted; and then all these uncertain deaths are multiplied by the number of years lost relative to the ideal life expectancy.

The result is a dog’s breakfast of mortality estimates, that don’t come even close to representing the truth about the burden of disease in any country due to any condition.

Also, the IHME apply the same dodgy modeling methods to deaths (using a method that they (used to?) call CoDMoD) before they calculate YLLs, so there’s another form of arbitrary model decisions and error in their assessments.

Putting all these errors together

This means that the IHME process works like this:

  • An incredibly dodgy form of meta-analysis that includes dodgy studies and miscalculates levels of risk
  • Applied to a really shonky estimate of the level of exposure to alcohol, that uses a computer program no one understands applied to a substandard data set
  • Applied to a dodgy death model that doesn’t include a lot of measures of uncertainty, and is thus spuriously accurate

The result is that at every stage of the process the IHME is unreasonably confident about the quality of their estimates, produces excessive estimates of risk and inaccurate measures of exposure, and is too precise in its calculations of how many people died. This means that all their conclusions about the actual risk of alcohol, the level of exposure, and the magnitude of disease burden due to the conditions they describe cannot be trusted. As a result, neither can their estimates of the proportion of mortality due to alcohol.

Conclusion

There is still no evidence that moderate alcohol consumption is bad for you, and solid meta-analyses of available studies support the conclusion that moderate alcohol consumption is not harmful. This study should not be believed and although the IHME has good press contacts, you should ignore all the media on this. As a former insider in the GBD process I can also suggest that in future you ignore all work from the Global Burden of Disease project. They have a preferential publishing deal with the Lancet, which means they aren’t properly peer reviewed, and their work is so massive that it’s hard for most academics to provide adequate peer review. Their methods haven’t been subjected to proper external assessment and my judgement, based on having visited them and worked with their statisticians and their software, is that their methods are not assessable. Their data is certainly dubious at times but most importantly their analysis approach is not correct and the Lancet doesn’t subject it to proper peer review. This is going to have long term consequences for global health, and at some point the people who continue to associate with the IHME’s papers (they have hundreds or even thousands of co-authors) will regret that association. I stopped collaborating with this project, and so should you. If you aren’t sure why, this paper on alcohol is a good example.

So chill, have another drink, and worry about whether it’s making you fat.


fn1: There are no reasons not to love Greek food, no wonder these people conquered the Mediterranean and developed philosophy and democracy!

fn2: This is in the appendix to their study