Today’s looming disaster

Where I live in Japan mask-wearing is now pretty much universal – almost no one goes out in public and to see someone without a mask on in public is a kind of shock. The economy reopened after lockdown, in Tokyo, on 23rd May, on which date the number of cases had dropped to 5. Today the Tokyo Governor’s office released the daily update on COVID-19 (pictured above), and we have now returned to 107 cases, with the 7-day smoothed average hitting 65. Depending on how charitable you’re feeling that’s either a 21-fold or 13-fold increase in cases in 5-6 weeks. At its most charitable then we can say that cases have been doubling every 7 days. Today’s peak of 107 cases comes pretty much 5 days after the Tokyo government allowed bars and night clubs to reopen. All of the personal measures we have been asked to adopt – maintaining social distancing, wearing masks in public, and reducing our social interactions, have amounted to a hill of beans. In particular I think mask-wearing has been a completely useless strategy, and worse than that, I think the misguided possibility that widespread mask use will prevent transmission has led many countries to take unnecessary and stupid risks with reopening their economies. This is particularly tragic in the case of Tokyo, because Japan had a very good early response to the epidemic and Tokyo was down to just 5 cases when the government ended the lockdown early. One or two more weeks of actually effective strategies would have ended the epidemic in Japan but instead the government chose to begin reopening the economy early and rely on personal behavior change to prevent its spread.

This was a disaster, and anyone who understands public health should have seen how disastrous this idea is. Infectious diseases are never stopped by individual behavioral change or personal responsibility: they are only ever affected by social changes and policy. We know this from 40 years of responding to HIV, and in this blog post I want to explain how the terrible failures of the early response to HIV should have served as a warning about relying on barrier methods and personal responsibility for preventing the spread of the disease. What is happening in America was entirely predictable based on 70 years of public health knowledge, and it’s a depressing indictment of public health policy-makers that they did not do more to stop it.

The narrative of mask use and economic reopening

First let us examine the history of moves to reopen economies from lockdown and the heavy dependence on mask use to achieve this reopening. Some academics at Stanford University recommended mask use as a way to prevent further shutdowns after reopening in late April. In an April 22 news report the governor of Louisiana made clear that mask use was a key part of his reopening strategy:

It’s just like opening a door for them, or saying good morning or whatever it’s being kind and being courteous, and when others wear masks they protect you. So we’re all in this together. When we all wear masks we’ll effectively protect one another which is why I’m calling upon Louisiana to mask-up.

The governor of Georgia suggested mask use could help with reopening that state in mid-May. The governing.com website lists individual state’s reopening plans and makes clear that almost every state mandated, requested or advised face covering and mask use as a form of protection in sites that were considered high risk but were now slated for reopening. For example California has moved to Stage 2 of its resilience roadmap, and recommends

Crowded settings increase your risk of exposure to COVID-19. Wear a face covering or cloth mask, stay 6 feet away from others, avoid touching your face, and wash your hands when you get home.

Rather than limit access to crowded settings, the government simply advises people to cover themselves and take individual actions to protect themselves and others.

On 1st July Louisiana saw 2083 cases, a five-fold increase on the number it saw on April 22nd; Georgia saw 2,946, probably a 4-fold increase on mid-May; and California saw 6,497, a 3-fold increase over the number it saw when it moved to stage 2 of its “resilience roadmap”. All these states are now at the inflection point of a major upward surge in cases. All the personal responsibility and individual actions they advised to prevent the spread of the virus have done very little to protect their citizens from this epidemic.

The scientific evidence for masks and social distancing

On 1st June the Lancet published a systematic review of the evidence for face masks as a protection against coronaviruses. It found only 3 studies with quantifiable evidence of the effect of masks in non-health-care settings, and pooling the results of these studies found a 44% reduction in risk, which is shown in the figure above. While mask use in health care settings has a very large protective effect (70% reduction in infection, with a narrow range of effect from 57 – 78%), it is nowhere near as effective in non-healthcare settings, and there is little evidence to support it. This is why at the time of writing the CDC still does not suggest there is any evidence for the effectiveness of surgical masks, and why the WHO was unwilling to recommend their use during the early stages of the epidemic.

Why is there so little evidence and why would masks not work in public when they’re so effective in hospitals? The lack of evidence is because most countries don’t use masks in any disease-prevention way in public, and so it is very hard to conduct studies. The lack of effectiveness probably arises from the fact we aren’t trained to use them: we don’t know how to take them off properly or even which side to place on our face, we don’t treat them as single-use items, we often don’t carry spare ones so we need to lower them in public to eat and drink and then raise them again, they get damp and become ineffective because we wear them too long, we wear the wrong masks for settings with high infection risk, and we don’t combine their use with the regular, intensive and disciplined hand hygiene that medical personnel use. I have recently spent a week in hospital during lockdown for surgery, and the aggressive and disciplined pursuit of hand hygiene was noticeable and completely different to community life. If you don’t know how to use a mask and don’t practice proper hand hygiene it is not much use. Here are some examples of mask use I have seen in Japan, when commuting or wandering my suburb (in a mask):

  • A man pulling his mask down on the train so he can pick his nose and wipe it on the poles people hold
  • People wearing their mask pulled down so their nose is uncovered (so common)
  • People folding their mask up and putting it in their pocket or a bag
  • People putting their mask on a table or other unwashed surface and then putting it back on again
  • People putting their mask on backwards
  • People taking their mask off to use a shared microphone in a public meeting
  • People wearing masks to karaoke and taking them off to sing

It is of course also impossible to maintain social distance on commuter trains in Japan. I have also noticed that everyone complains that when they wear a mask their breath steams up their glasses, which means constantly fiddling with the mask and wearing it too loose. If your breath is getting out of your mask rather than through it, you are not protecting anyone and you aren’t protected.

Even if masks were 90-100% effective though, we still know that a strategy of mask wearing will not work. We know this because we tried the exact same strategy for HIV and failed.

The failure of barrier methods for HIV prevention

HIV first entered western consciousness in the early 1980s. It was initially identified in men who have sex with men (MSM) in America but the pandemic really took off in heterosexual people in sub-Saharan Africa, probably because it was already widespread by the 1980s. The first treatment was introduced in 1987 but the first really effective treatments, highly active antiretroviral therapy (HAART), were only introduced in 1997. In the early 2000s HAART was discovered to reduce the transmissibility of HIV, meaning that people taking HAART were less likely to pass the infection to others even if they were having unprotected sex. This discovery came at about the same time as George W Bush introduced PEPFAR, a massive program of HIV testing and treatment in sub-Saharan Africa, and this widespread testing plus availability of a treatment that could render people non-infectious led to some gains in the battle against HIV.

Now that HAART is available the fight against HIV is almost exclusively based on testing and treatment, but until the mid 1990s the only effective strategy we had for prevention was condom use. Condoms are 90-100% effective in preventing the spread of HIV, and we ran aggressive condom promotion and distribution schemes in the 1980s and 1990s to encourage safer sex and prevention of HIV. Despite dumping huge amounts of money and resources into these programs in the 1980s and 1990s HIV continued to spread rapidly in both heterosexual communities in Africa and MSM and some other at-risk communities in the rest of the world. Condom promotion strategies did not work to prevent the spread of HIV even though we knew that they were highly effective tools for prevention. Barrier methods were all we had – our entire strategy was based on behavioral change and personal actions – and it failed miserably.

The same is also true of all the other STIs: gonorrhea, chlamydia, and syphilis are all still widespread in heterosexual and MSM communities despite the sure knowledge that they are easily prevented by condoms. Indeed, these diseases are much more prevalent in communities that have easy access to condoms but poor access to testing and rapid treatment, such as indigenous populations in Australia or very poor communities in the USA. It is the structural factors of access to testing and treatment that determine the spread of these diseases, not the ability of individuals to take individual action to protect themselves or others.

Why is this possible? How did this program fail so monumentally when the individual preventive action it was based on is so well known to be highly effective? The reason is that sex is a social act, and social acts are mediated by complex social forces that it is difficult for us to navigate and control on our own. When people have sex they choose to flout social rules, they don’t always plan ahead, they are sometimes under the influence of drugs or alcohol or in a rush or not quite sure of exactly what is safe. Power relations are common in sex and can lead to people not being able or willing to negotiate condom use. Just as masks interfere with the ease and enjoyment of basic social interactions, so condoms interfere with the ease and enjoyment of sex, and people sometimes choose not to use them for this and other personal reasons. People also often make judgments about who and what is “safe”, and make these decisions with partial information in very emotionally fraught circumstances. And of course if you want children – a fundamental consequence of and reason for this social interaction – you can’t wear a condom. And so HIV spreads.

There are communities where condom distribution has worked but this is rare. It was probably partially successful among MSM in Australia, but probably because the campaign to use protection and beat HIV was explicitly tied in with the campaign for rights for MSM. It has been successful among sex workers, but this is because sex workers have no social incentive not to use condoms and have powerful tools at their disposal to enforce their own protection, and this is only true in some communities of sex workers who are strongly protected by cultural, social and legal norms that give them the social power to control their sexual interactions. There are many communities of sex workers in the world who cannot negotiate condom use precisely because these structural factors are aligned against their personal protective choices.

In contrast, we can identify a group of people who are at very high risk of HIV but have very low rates and among whom outbreaks of HIV are quickly identified and shut down: porn actors. Porn actors have large amounts of completely unprotected and often high-risk sex with multiple partners regularly, but have low risk of HIV. This is because they work in an industry with rigorous, regular testing policies that ensure that HIV cases are caught before they can become widespread. This is an example of how high-risk behavior can be safe if it is regularly tested and treated, but low risk behavior (for example among heterosexual people in Africa) can be dangerous if it is forced to rely on personal protective actions without the support of a health infrastructure.

Against infectious diseases, social and policy actions are always more powerful than individual actions, because infectious diseases are a consequence of our social interactions, not our personal decisions.

The difference between strategies and individual actions

Public health strategies obviously always rely on individual actions: we need people to report symptoms, to attend clinics for medical care, to comply with test and trace strategies, and to cooperate with the health system. Many of these actions can be guaranteed to happen under the right circumstances because they benefit the individual: if you can afford care, getting care is good for you, so you are likely to do it. But any policy which requires people to do the right thing in a burdensome way runs up against a huge problem: many people do not want to, or are not able to, do the right thing. This is why states have to mandate seatbelt wearing and introduce random breath testing to prevent drunk driving: the action they request of individuals is burdensome and unpleasant, so people won’t do it if they aren’t forced. The same is true of mask-wearing and social distancing, which is fundamentally against all of our social and cultural norms and obviously, objectively makes social interactions worse. Any policy based on requiring (or expecting) people to perform these actions is bound to fail, especially if no one is trained in how to do these actions safely and is not receiving the correct equipment. The policy is particularly likely to fail because the people who don’t conform will spread their virus in ways that people who are conforming cannot see and prevent (such as touching surfaces that mask-wearers touch).

A good public health strategy needs to take into account what people are willing and able to do, and not assume everyone will act correctly and in good faith. A policy which plans to increase risk in other ways – by reopening the economy – while relying on people doing these difficult and unpleasant individual actions to offset the risk is guaranteed to fail. And as we see in America, and now increasingly in Japan, that is exactly what has happened.

What does this say about the future of COVID-19 policy

There is only one safe and reliable way to control this epidemic: lockdown your cities until there are 0 cases, then reopen slowly and carefully with immediate and aggressive lockdowns as soon as outbreaks happen. Coupled with rigorous control of national (and sometimes sub-national) borders, this will ensure that states can get to 0 cases and stay there with minimal future risk. If every country proceeds on this basis we can slowly reconnect countries that have eliminated the virus, and reopen the global economy. But so long as governments think they can reopen the economy provided that individual citizens take reasonable actions to protect themselves in the presence of remnant cases, the epidemic will restart and countries will continually bounce between lockdown and tragic, fatal reopening. This does not mean that you should not wear a mask – as we saw above, they probably have some mild protective effect. But you should not – and your government should not expect you to – use it as the only defense against this virus just so that economies can reopen. In the face of a virus this transmissible and deadly, there is no way your individual actions will make any difference. We need to work together through collective action to destroy this thing. Until a vaccine comes along, our individual effort is meaningless: we rely on policy and social action to end this scourge. Whenever a government asks you to wear a mask to protect yourself and your friends, that government is asking you to take the blame for its failures. Don’t let it happen. Demand real collective action to end this epidemic and restart our lives.

 

On Tuesday 26th May Japan’s COVID-19 state of emergency ended, five days earlier than expected and with deaths down to low double digits every day. The state of emergency was accompanied by a voluntary lockdown that started on 8th April for Tokyo and six other prefectures, extending to the rest of Japan a week later and ending in the rest of Japan a week before the lockdown ended in Tokyo. This means that the lockdown affected Tokyo for just 7.5 weeks, and the rest of Japan for about 6 weeks. At its peak the epidemic generated about 1200 cases in one day (on 17th April), dropping from 1200 to 30 in just 5 weeks.

In contrast, the UK essentially introduced its lockdown on 23rd March and is still slowly relaxing the lockdown. The UK lockdown was stricter than that in Japan, with enforceable restrictions on movement and activities[1], it involved the complete closure of many businesses, and it effectively lasted 3 weeks longer than Japan’s. At its peak the UK saw 8700 cases in one day (on 10th April, a week before Japan’s peak) and dropped much slower, only going below 2000 cases on 25th May – the same day Japan reached 30 cases. This is a quite remarkable difference in pace of decline: dropping by 97.5% in 5 weeks for Japan, compared to 75% in 6 weeks for the UK. These differences show very starkly when plotted, as I have done in Figure 1. This figure shows daily new cases in the two countries by day since the 10th confirmed case, using data obtained from the Johns Hopkins School of Public Health coronavirus tracker[2]. From this figure it is clear that Japan saw its 10th case much earlier than the UK (on 30th January compared to 24th February) yet experienced a much more gradual increase and a much more rapid decline than did the UK.

Figure 1: Daily new COVID-19 cases in the UK and Japan by day since the 10th confirmed case

Why was Japan’s response to the coronavirus so much more effective than that of so many other high-income countries? In this post I will explore a little the key factors that affected the Japanese response, what made the numbers grow so slowly and why the lockdown was more effective than in many other countries. In particular I will compare Japan with the UK, as a model of the differences between an effective and an ineffective response.

Figure 2: Health education materials are essential to good pandemic prevention

A timeline of interventions

Japan saw its first case on the 16th January, compared to 31st January in the UK. However, Japan took action sooner and more aggressively. Here are some key actions and when they were taken by each country.

The difference in public response to the issue of mass events is a key example of the quality of the response in the two countries. While the UK was faffing about with discussion about which responses to take, Japan was already canceling and closing events. My own work events began to be postponed in the last week of February, but so did major public events:

  • J league (soccer) halted all games on 25th February (170 cases)
  • Japan National Pro Baseball league held all preseason games without an audience from 26th February (189 cases)
  • Japan boxing commission and pro-boxing association canceled or postponed all bouts from 26th February
  • Rise kickboxing was canceled on 26th February
  • Sumo was held without an audience from 8th March (502 cases) (5 days after Boris Johnson bragged about “shaking hands with everybody” (51 cases))

In contrast in the UK:

  • An England-Wales Rugby match was held on 7th March with a live audience and the PM in attendance (206 cases)
  • Premier league events were held on 8th March with a live audience (283 cases)
  • Cheltenham races were held on 10th – 14th March (382 – 1140 cases)
  • League one games were held on 10th March (382 cases)
  • UEFA champions league games were held on 12th March (in Scotland) (456 cases)

The UEFA champions league match brought a large number of German fans to Scotland, and a week earlier I think Liverpool visited Spain and another team visited Italy, where the epidemic was already booming. These events had huge numbers of fans – 81,000 people attended the England-Wales rugby match, and many soccer games host tens of thousands of fans. In contrast, the only major event to be held in March in Japan that I know of, with an audience, was K1 on 22nd March, which attracted 6500 fans who were all given a mask at the door (and this event still attracted huge controversy and anger in Japan).

Because of the slow growth of the epidemic the lockdowns also happened at different stages of the epidemic. Japan’s lockdown came on 8th April, when there were 5120 cases; the UK’s, on the 23rd March, when the UK had reached 6600 cases and was already on a much more rapid upward trajectory. It took 4 days from the announcement of lockdown for the UK’s case load to double, whereas it took Japan 8 days. The next doubling took the UK another 4 days, and never happened for Japan.

Finally of course there is the attitude of the leadership: on 3rd March Sadiq Khan announced no risk of catching coronavirus on the London Underground, the same day that Boris Johnson was bragging about shaking everyone’s hand at a hospital (and thus caught coronavirus himself).

It should be clear from this that while in some cases the UK government acted with about the same speed as the Japanese government, in general the Japanese government acted when it had much lower numbers of cases than the UK, and implemented more far-reaching and aggressive strategies that were likely to have greater impact. But beyond basic actions on mass events and action plans, there was one additional major difference in the Japanese government’s response: case isolation.

Contact tracing and case isolation

From the very beginning of the epidemic, Japan introduced a system of “test, trace and isolate” that follows WHO guidelines for emerging infectious diseases. Under this system, once someone was identified as a likely COVID-19 case and tested positive, they were immediately moved to a nominated hospital into a special management ward designed for highly infectious diseases, to have their condition managed by specialist medical teams. This case isolation reduces the risk that they will infect their family, and prevents them from spreading the disease through basic daily functions like shopping if they live alone and cannot be helped by others. This strategy was also used in China and Vietnam, and it is a core part of the reason why the lockdowns in these countries were so much more effective than they were in the UK, USA or much of Europe. When a confirmed case of COVID-19 self-isolates at home they are highly likely to infect family or housemates, who will then continue to spread the virus amongst themselves and to others. This is particularly bad in cities with high levels of inequality like London, where essential workers live in cramped share houses and lack the resources to stop working even if infected. These people infect their housemates, who must continue working as bus drivers, cleaners, care workers or shop assistants, and cannot help but infect others. If the first case is quickly isolated, this reduces the risk that subsequent cases will be infected. As stressed by the WHO, case isolation is key to cracking this highly infectious virus. Case isolation early in the epidemic slows the growth of the epidemic and buys more time to scale up testing and other responses, while case isolation once the lockdown is in place helps to push down the number of infections more rapidly, reducing both the severity and length of the lockdown.

Case isolation was key to Japan’s successful management of this epidemic, but many people have suggested that the epidemic was controlled also because of cultural and social factors that make Japan more successful at managing infectious diseases. I do not think these played a major role in Japan’s response.

Japan’s “unique” social and cultural factors

Some have suggested that Japan’s culture of hygiene, its long-standing mask-wearing habits, and high quality public infrastructure might have played a role in slowing the growth of the epidemic. It is certainly true that Japanese people have a tradition of washing their hands when they get home (and gargling), wear masks when they are sick, and have remarkably clean and hygienic public spaces, with readily available public toilets throughout the country. The trains are super clean and stations are also very hygienic, and it is never difficult to find somewhere to wash your hands. Japanese people also don’t wear shoes in the house (and in some workplaces!) and often have a habit of changing out of “outside clothes” when they come home. But I think these cultural benefits need to be stacked against the many disadvantages of Japanese life: Japan’s trains are incredibly crowded, and everyone has to use them (unlike say California, which was much worse hit than Japan); Japanese shops and public accommodations in general are very cramped and crowded, so it is not possible to socially distance in e.g. supermarkets or public facilities; because Japan’s weather is generally awful and its insects are the worst things you have seen outside of anime specials, most of Japan’s restaurants and bars are highly enclosed and poorly ventilated; and Japanese homes are often very cramped and small. When viewed like this, Japan is a disease breeding facility, a veritable petri dish for a rapidly spreading and easily-transmissible disease. Japan’s population is also very much older than the UK’s, which should suggest further high rates of transmission, and from mid-February we have terrible hay fever which turns half the country into snot cannons. Not to mention the huge outdoor party that is held at the end of March, where everyone gets drunk and nobody socially distances. Japan’s work culture also does not support home working, in general, and everyone has to stamp documents by the hour and we still use fax machines, so I really don’t think that this is a strong environment to resist the disease. I think these social and cultural factors balance out to nothing in the end.

Differences in Personal Protective Equipment

I do not know what the general situation for PPE was in Japan, but certainly the hospital attached to my university, which is a major nominated infectious disease university, sent around a circular in mid-February describing our state of readiness, and at that time we had 230 days’ supply of COVID-rated gowns at the current infection rate, as well as ample stocks of all other PPE and plans in place to secure more. There was a shortage of masks for public use in March, which was over by April, but I do not get the impression that there was such a shortage in the designated hospitals. Japan also has a very large number of hospital beds per capita compared to other high-income countries, but this figure is misleading: most of these beds are for elderly care and not ICU, and in fact its ICU capacity is not particularly large. However, by keeping the new cases low and moving isolated patients to hotels once the hospitals became full, Japan managed to mostly avoid shortages of ICU beds (though it was touch and go for a week or two in Tokyo). I think in the Japanese hospital system the lack of ventilators and ICU beds would have become a major problem long before the country ran out of PPE.

Inequality and disease transmission

One way that Japan differs from a lot of other high-income countries is its relatively low levels of inequality. In particular it is possible for young people to live alone in Tokyo even if they do not have high incomes, which means share housing does not really exist here, and all the young people who move to the big cities for work mostly live by themselves where they cannot infect anyone. Although it is a very densely-populated country and houses are much smaller than in the UK, there is less overcrowding because housing is affordable and there is a lot of it. Most people can afford health care and have ready access to it (waiting times are not a thing here). This low inequality plays an important role in elderly care homes, where staff are better paid and treated than in the UK care sector, and less likely to move between facilities on zero-hour contracts as they do in the UK. There is a higher level of care paid to basic public facilities like hospitals, railway stations, public toilets and other facilities which ensures they are relatively hygienic, and cleaning staff here tend to be paid as part of a standard company structure rather than through zero-hours contracts, with good equipment and basic working rights. Also there is a much lower level of obesity here, and obesity is not as class-based, so there is less risk of transmission and serious illness through this risk factor. There is a very high level of smoking, which is a major risk factor for serious illness and death from COVID-19, but it is the only risk factor that is comparable to or higher than those in the UK. In general I think Japan’s low level of inequality helped in the battle against this disease, by preventing the country from developing communities where the disease would spread like wildfire, or having strata of the population (like young renters) at increased risk, or forcing increased risk onto the poor elderly as we saw in the UK.

A note on masks

I think masks are a distraction in the battle against this disease. I think most people don’t know how to wear them properly and use them in risky ways – touching them a lot, reusing them, wearing them too long, storing them unsafely, and generally treating them as part of their face rather than a protective barrier. I think that this can create a false sense of security which leads people to think that opening up the economy and dropping lockdown can be safely done because everyone is protected by masks. This is a dangerous mistake. That is not to say one shouldn’t wear them, but one should not see them as a solution to the more basic responsibility of social distancing and isolation, and one definitely should not drop one’s hand hygiene just because one is wearing a mask: hand hygiene is much more important for protecting against this disease. It’s worth remembering that on the days that Japan was seeing 300 or 500 or 1000 cases a day everyone was wearing masks, but somehow the disease was still spreading. They are not a panacaea, and if treated as an alternative to really effective social measures they may even be dangerously misleading.

Conclusion: Early, sensible action and strong case isolation are the key

Japan took an early, rapid response to the virus which saw it screening people at airports, educating the population, and implementing sensible measures early on in the epidemic to prevent the spread of the disease. The first measures at airports and in case isolation were taken early in February, major events were cancelled and gatherings suspended from mid- to late-February, and additional social distancing measures introduced in March. Throughout the growth of the epidemic the Japanese response focused on the WHO guideline of testing, tracing, and isolating, with case isolation a routine strategy when cases were confirmed. This case isolation slowed the growth of the epidemic and once lockdown was in place helped to crush it quickly. This in clear contrast to the countries experiencing a larger epidemic, which typically reacted slowly, introduced weak measures, and did not implement case isolation at all or until it was too late. Lockdowns with self-isolation will work, but as Figure 1 shows, they are much less effective, causing more economic damage and much slower epidemic decline, than lockdowns with case isolation.

Finally I should say I think Japan ended its lockdown a week early, when cases in Tokyo were still in the 10s, and we should have waited another week. I fear we will see a resurgence over the next month, and another lockdown required by summer if our contact tracing is not perfect. But it is much better to end your lockdown prematurely on 10 cases a day than on 2000 a day, which is where the UK is now!


fn1: With certain notably rare exceptions, of course…

fn2: I have had to do a little cleaning with the data, which contains some errors, and I think the JHSPH data doesn’t quite match that of national health bodies, but it is much more easily accessible, so that is the data I have used here. All case numbers are taken from that dataset, unless otherwise stated.

As I write this many countries are beginning to end their lockdowns and make plans to reopen. The UK has already begun to reopen, the US is opening state by state and much of Europe is beginning to return to work and play. Japan has ended its state of emergency in 40 prefectures, leaving the 7 hardest hit prefectures another two weeks of lockdown before they can resume normal activity. Different countries and states have different guidelines and rules about how to reopen, and are reopening at different stages of the epidemic. Let’s look at the circumstances in some of them.

  • United Kingdom: 2,400 new cases on 19th May, down from a peak of about 6,000 a day. A major epidemic still seems to be raging in elderly care homes, but people have begun returning to work. There is debate about whether to reopen schools, but some universities have decided to conduct the entire 2020/21 academic year online. Quarantine rules will be introduced for inbound overseas travelers from early June. Still recruiting staff to do contact tracing.
  • Germany: 513 daily cases on 19th May, down from a peak of 6,000 a day. Shops have reopened, Bundesliga has restarted without crowds and schools will soon reopen. The end of lockdown began on about May 10th, when there were about 670 cases a day
  • USA: 19,662 daily cases on 19th May, down from a peak of about 35,000 a day. States are reopening at their own pace with some being strict and some being very relaxed. Most states have ongoing daily cases in the hundreds, and there are signs that the decline in daily cases has stopped in states like New Jersey and Washington, or that case numbers are rising in states like Maryland, after seeming to plateau. In some states like Texas the number has been constantly increasing and the state is reopening after completely failing to stop the growth of the virus. Major problems with the testing infrastructure and large state-by-state differences in public health infrastructure.
  • Japan: 31 daily cases on 19th May, down from a peak of about 700 cases, with 5 in Tokyo. Only some prefectures are reopening, rules remain regarding mass events, schools have not yet reopened, and things aren’t going straight back to normal. Full reopening of the country is currently planned for 31st May but could be postponed if the trajectory changes

New Zealand, of course, began to reopen only when there were 0 cases. These countries seem to have starkly different ideas about when and how to reopen, with the USA and UK really nowhere near the bottom of their incidence curves, and still huge numbers of cases being discovered every day. Most of these countries claim to have pushed the reproduction number of the virus below 1, which means that they think the epidemic is under control. But what is the best metric for determining when to end a lockdown?

Metric 1: Daily number of cases

One way to judge whether to exit lockdown is the daily number of cases. You can calculate this as a percentage of your total active cases and from that estimate the amount of time it takes to double the number of cases, and if you think this is low enough you can reopen. Under this metric New York is ready to reopen, since it saw 1,474 new cases yesterday out of 353,000 total cases, which suggests a growth rate of 0.4%, which in theory should mean it will take another 100 days or more for case numbers to double.  By this metric Arkansas should be okay too – it had 110 new cases yesterday out of 4,923 existing cases, giving a 2% growth rate that suggests about a month or more to double. You need to show a little caution with this calculation though, because many states that have experienced slow growth in long epidemics have a large number of recovered cases. In fact in Arkansas there are only 1,184 active cases, so basically yesterday it saw a 10% increase in case numbers, which means the number of cases will double in a week. It should probably stay closed by that metric! But a lot of states don’t seem to be recording or reporting recovered cases. Also if we use the metric of not opening if your cases will take a week or less to double (say, a 10% increase per day), then New York now could open even if it had 30,000 daily cases, since that is less than 10% increase a day. But I think everyone would agree a single city opening when it still has 30,000 cases a day would be a bit silly.

Metric 2: Reproduction number

Everyone is becoming familiar with the effective reproduction number, Rt, now that the epidemic is all the news we can read about. Rt is the number of cases that will be generated by a single infected person. Rt measures this number over time, so it can change as policies change, and is slightly different to R0, the basic reproduction number. R0 measures Rt at the beginning of the outbreak, when there is only 1 new case and the population has no special measures in place. I estimated R0 for COVID19 to be 4.4, meaning that each case will generate 4.4 new cases. Because the disease has an incubation period when people are asymptomatic of about 4-5 days, we can expect those 4.4 new cases to occur between 4 days and two weeks after the initial infection, so we might expect that an approximate rule for this virus is that 100 cases today will generate 400 cases after a week, suggesting that unrestrained it doubles every 3-4 days. That’s nasty! But after policies are put in place we can drive Rt down to 1, and once it’s below 1 we should expect that the epidemic will begin to die out. This seems to be the primary metric the UK government is using – their politicians are always on TV talking about “the R number” and everyone is eager to get it below 1. The big problem with using Rt is that if you have enough daily cases, an Rt below 1 will still mean you generate a lot of new cases. For example, the US has 20,000 cases a day and most Rt values are near 1. If Rt is 0.8 then given the incubation time we should expect 16,000 daily cases after a week, 12,800 after two weeks, and so on. That suggests a half-life for the disease of perhaps 2-3 weeks, and it will take another two months to disappear. A lot of deaths will happen in that time.

Another problem with Rt is that once the economy opens we should expect it will go up. If Rt keeps fluctuating above and below 1, are we to keep closing and opening the economy? What if it’s 0.8 for a week, then goes up to 1.2? Do we close down? Or wait as the epidemic begins to spread again? If it is fluctating like this we may end up with an epidemic that is constantly varying around 20,000 cases a day: one week it’s 15,000 a day, then we loosen our measures and it’s 22,000 a day, and so on. Also there is a lot of uncertainty in estimates of Rt – if it’s 0.9 then in theory we are in epidemic elimination territory, but actually if the confidence interval is 0.7 to 1.1 there’s some chance we aren’t there at all.

Metric 3: Health system capacity

Unless we do as NZ has done and exterminate the virus completely before we reopen, we can be confident there will still be some cases when we reopen. In this case we will need to deal with them by testing, contact tracing, and if possible isolating the cases. Contact tracing one case when they’re in lockdown is easy – you just test the people they live with. But once they’re working and socializing one positive case will likely mean tracking down and testing 5 or 10 more people. This is hard work and it needs to be done quickly with a disease like this, especially if even a small number of people are asymptomatic but able to spread. Basically you need to find and test all 10 contacts and get their results back to them – and if necessary isolate them – within 4 days of the onset of symptoms in the index case, and even less time if the index case delayed presentation to hospital. This means if you have 500 cases a day you need to track 2500 to 5000 people daily, and potentially have to isolate 2000 of them. To do this requires a lot of boots on the ground and a lot of hotel rooms. Furthermore, the more cases you have the less room there is for error. If you have 5 cases a day and a 10% error rate in contact tracing you’ll miss 5 people, 1 of whom might be infected. With 500 cases a day you’ll miss potentially 500 people, of whom 100 might be infected. Those slip ups will help the virus continue to spread until it finds a super spreader like the Korean bar scenario (or in America, a meat packing plant).

To me this is the best guide for when to open: do you have the logistics to cope with cases as people begin to socialize and spread the disease again? If you have 50 cases a day and 500 contact tracers then you can probably handle it; if you have 500 cases a day and 500 contact tracers then it’s not going to work, and you’re going to lose control of the epidemic. Rather than judging by the rate at which the virus might double, or the reproduction number, you should look at whether you can rigorously and effectively stamp out every single case that could be generated after you reopen, and not ease your lockdown until you’re well within the logistic capacity to do so. That means looking at testing capacity, the number of people able to contact trace, your population’s willingness to share contacts and engage with health workers, your hotel capacity for case isolation, and your hospital bed capacity (and in-hospital infection risk!) for those you miss. If any aspect of that process could break, you need to wait.

Unfortunately, a lot of policy makers and politicians have been focused on the reproduction number, as if crossing the reproduction threshold will automatically end the epidemic. It’s an easy number to focus and gives an easy story to tell the press and the public, and it’s nice to have a target to aim for, but although a scientifically valid measure of the epidemic’s dynamics it is of little use in deciding how to deal with the epidemic. Much more important is the ability to control the cases you have, and a long term plan for getting rid of them, than a spot judgment about whether you “have the epidemic under control” based on a number that is both uncertain and ultimately not very practically informative.

The consequences of losing control a second time

The big problem with losing control of the epidemic a second time is that you have a lot more cases floating around than the first time it happened. It took the UK two weeks to rise from 152 cases a day to 4,500 cases[1], so if the UK opens up on 2,500 cases and loses control the consequences will be dire. If the week after opening up there are 2,000 cases, and the contact tracing misses 152 of them (<10%!), then in theory within two weeks the UK will be back to 4,500 cases a day. Furthermore, it will be much harder to go back into lockdown a second time, because the population will no longer see it as an effective strategy and it will be political suicide for any government contemplating it. Socially and politically, you can’t let this genie back out of its bottle. And although we like to hope that the population will observe social distancing rules and other niceties, in reality this will slide quickly, and if the cases aren’t under control by the time people return to their normal ways, another explosion will follow. This is without considering unknown and potentially catastrophic risks, such as school openings. The UK government is pushing to reopen schools because they say there is little risk of spread among children, but the ONS survey found much higher proportions of young people with antibodies in the community than are recorded in confirmed hospital cases. If the virus was quietly spreading in young people when it started at 1 case, how explosive will its growth in this cohort be if it starts from 2000 cases? These low-risk groups are highly likely to have many social contacts and to be an excellent infection vector for high-risk groups such as their parents and teachers.

Watching the data from the USA, I think this is already happening in some states in the USA now. Texas, Maryland, Minnesota, maybe New Jersey, North Carolina, maybe Tennessee are already beginning to see either growth or a distinct flattening of previous downward curves, and other states that are reopening like Florida and Wisconsin will likely see this in a week or two. I don’t think any of these states have the contact tracing capacity for the cases they are currently seeing, and they don’t have any plan to isolate cases, nor do they have well-functioning or affordable health systems. The same is true in the UK, which is nowhere near having its contact tracing infrastructure in place, and is playing with all kinds of deadly scenarios (like reopening schools and soccer games). I think this is partly because they’re fixated on Rt as the metric for reopening, partly because they’re incompetent, and partly because of political and economic pressure, but regardless, a disaster is in their near future if their health system capacity is not ready – and I think it’s not. In two weeks we are going to see the second wave hit these unready countries, and it’s going to make the first wave seem like a bad cold.


fn1: But it has taken 5 weeks to get from 4,500 back to below 4,000. This shows the incredible urgency of stopping this epidemic during its upward rise, not once it has really spread. The government’s faffing in the early days has made every subsequent decision harder, less effective and more deadly. The entire crew should resign immediately and hand government over to some adults to manage the place properly.

In early March, when COVID-19 was starting to spread in the UK, the government announced a strategy of “herd immunity” in which they would shield vulnerable people (such as older people and people with pre-existing conditions) from the disease, and aim to slowly allow the rest of the country to be infected up to some proportion of the population. This policy was based on the idea that once the disease had infected a certain proportion of the population then this would mean it had naturally been able to achieve herd immunity, and after that would die out. The basics of the strategy and its timeline are summarized here. This strategy was an incredibly dangerous, stupid and reckless strategy that was built on a fundamental failure to understand what herd immunity is, and some really bad misconceptions about the dynamics of this epidemic. Had they followed this policy the entire UK population would have been infected, and everyone in the UK would have lost at least one of their grandparents. Here I want to explain why this policy is incredibly stupid, and make a desperate plea for people to stop talking about achieving herd immunity by enabling a certain portion of the population to become infected. This idea is a terrible misunderstanding of the way infectious diseases work, and if it takes hold in the public discourse we are in big trouble next time an epidemic happens.

I will explain here what herd immunity is, and follow this with an explanation of what the UK’s “herd immunity” strategy is and why it is bad. I will call this “herd immunity” strategy “Johnson immunity”, because it is fundamentally not herd immunity. I will then present a simple model which shows how incredibly stupid this policy is. After this I will explain what other misconceptions the government had that would have made their Johnson Immunity strategy even more dangerous. Finally I will present a technical note explaining some details about reproduction numbers (the “R” being bandied about by know-nothing journalists at the moment). There is necessarily some technical detail in here but I’ll try to keep it as simple as possible.

What is herd immunity?

Herd immunity is a fundamental concept in infectious disease epidemiology that has always been applied to vaccination programs. Herd immunity occurs when so many people in the population are immune to a disease that were a case of the disease to arise in the population, it would not be able to infect anyone else and so would die out before it could become an epidemic. Herd immunity is linked to the concept of the Basic Reproduction Number, R0. R0 tells us the number of cases that will be generated from a single case of a disease, so for example if R0 is 2 then every person who has the disease will infect 2 other people. Common basic reproduction numbers range from 1.3 (influenza) to about 18 (measles). The basic reproduction number of COVID-19 is probably 4.5, and definitely above 3.

There is a simple relationship between the basic reproduction number and the proportion of the population that need to be vaccinated to ensure herd immunity. This proportion, p, is related to the basic reproduction number by the formula p=1/(1-1/R0). For smallpox (R0~5) we need 80% of the population to be vaccinated to stop it spreading; for measles (R0~18) it is safest to aim for 95%. The reason this works is because the fundamental driver of disease transmission is contact with vulnerable people. If the disease has a basic reproduction number of 5, each case would normally infect 5 people; but if 4 of every 5 people the infected person meets are immune, then the person will only likely infect 1 person before they recover or die (or get isolated). For more infectious diseases we need to massively increase the number of people who are immune in order to ensure that the infection doesn’t spread.

If we vaccinate the correct proportion of the population, then when the first case of a disease enters the population, it’s chances of meeting an infectable person will be so low that it won’t spread – effectively by vaccinating 1-1/R0 people we have reduced its effective reproduction number to 1, at which point each case will only produce 1 new case, and the virus will not spread fast enough to matter. This is the essence of herd immunity, but note that the theory applies when we vaccinate a population before a case enters the population.

What is Johnson Immunity?

There is a related concept to the basic reproduction number, the effective reproduction number Rt, which tells us how infectious the virus currently is. This is tells us how many people each case is infecting at the current state of the epidemic. Obviously as the proportion of the population who have been infected and recovered (and become immune) increases, Rt must drop, since the chance that they will have contact with an infectious person goes down. Eventually the proportion of the population infected will become so large that Rt will hit 1, meaning that now each case is only infecting another case. The idea of Johnson Immunity was that we would allow the virus to spread among only the low-risk population until it naturally reached the proportion of the population required to achieve an Rt value of 1. Then, the virus would be stifled and the epidemic would begin to die. If the required proportion to achieve Rt=1 is low enough, and we can shield vulnerable people, then we can allow the virus to spread until it burns out. This idea is related to the classic charts we see of influenza season, where the number of new infections grows to a certain point and then begins to go down again, even in the absence of a vaccine.

This idea is reckless, stupid and dangerous for several reasons. The first and most serious reason it is dangerous is that the number of daily new infections will rise as we head towards Rt=1, and by the time we reach the point where, say, 60% of the population is infected, the number of daily cases will be huge. At this point Rt=1, so each case is only infecting 1 other case. But if we have 100,000 daily new cases at this point, then the following generation of infections will spawn 100,000 new infections, and so on. If, for example, the virus has an R0 of 2, and takes 5 days to infect the next generation, then the number of new cases doubles every 5 days. After a month we have 64 cases, after two months we have 4100 cases, and so on. By the time we get to 30 million cases, we’ll likely be seeing 100,000 cases in one generation. So yes, now the virus is going to start to slow its spread, but the following generation will still generate 100,000 cases, and the generation after that 90,000, and so on. This is an incredible burden on the health system, and even if death rates are very low – say 0.01% – we are still going to be seeing a huge mortality rate.

The second reason this idea is reckless and stupid is that it is basically allowing the disease to follow its natural course, and for any disease with an R0 above about 1.5, this means it will infect the entire population even after it has achieved its Rt of 1. This happens because the number of daily cases at this point is so large that even if each case only infects 1 additional case, the disease will still spread at a horrific rate. There is an equation, called the final size equation, which links R0 to the proportion of the population that will be infected by the disease by the time it has run its course, and basically for any R0 above 2 the final size equation tells us it will infect the entire population (100% of people) if left unchecked. In practice this means that yes, after a certain period of time the number of new cases will reach a peak and begin to go down, but by the time it finishes its downward path it will have infected the entire population.

A simple model of Johnson Immunity

I built a very simple model in Excel to show how this works. I imagined a disease that lasts two days. People are infected from the previous generation on day 1, infect the next generation and then recover by the end of day 2. This means that if I introduce 1 case on day 1, it will infect R0 cases on day 2, R0*R0 cases on day 3, and so on. This is easy to model in Excel, which is why I did it. Most actual diseases have incubation periods and delayed infection, but modeling these requires more than 2 minutes work in a real stats program, and this is a blog post, so I didn’t bother with such nuance. Nonetheless, my simple disease shows the dynamics of infection. I reclaculated Rt each day for the disease, so that it was reduced by the proportion currently infected or immune, so that for example once 100,000 people are infected and recovered, in a population of 1 million people, the value of Rt becomes 90% of the value of R0. This means that when it reaches its Johnson Immunity threshold the value of Rt will go below 1 and the number of cases will begin to decline. This enables us to see how the disease will look when it reaches the Johnson Immunity threshold, so we can see what horrors we are facing. I assumed no deaths and no births, so I ran the model in a closed population of 1 million people. I ran it for a disease with an R0 of 1.3, 1.7, and 2.5, to show some common possible scenarios. Figure 1 shows the results. Here the x-axis is the number of days since the first case was introduced, and the y-axis is the number of daily new cases. The vertical lines show the day at which the proportion of the population infected, Pi, crosses the threshold 1-1/R0. I put this in on the assumption that the Johnson Immunity threshold will be close to the classical herd immunity threshold (it turns out it’s off by a day or two). The number above the line shows the final proportion of the population that will be infected for this particular value of R0.

Figure 1: Epidemic paths for three different reproduction numbers, with Johnson Immunity threshold

As you can see, when R0 is 1.3 (approximately seasonal influenza), we cross the approximate Johnson Immunity threshold at 44 days after the first case, and at this point we have a daily number of cases of about 40,000 people. This disease will ultimately infect 49% of the population. Note how slowly it goes down – for about a week after we hit the Johnson Immunity threshold we are seeing 40,000 or so cases a day.

For a virus with an R0 of 1.7 the situation is drastically worse. We hit the Johnson Immunity threshold after 23 days, and at this point about 140,000 cases a day are being infected. Three days later the peak is achieved, with nearly 200,000 cases a day being infected, before the disease begins a rapid crash. It dies out within a week of hitting the Johnson immunity threshold, but by the time it disappears it has infected 94.6% of the population. That means most of our grandparents!

For a disease with an R0 of 2.5 we hit the Johnson Immunity threshold at day 13, with about 140,000 cases a day, and the disease peaks two days later with 450,000 cases a day. It crashes after that, hitting 0 a day later because it has infected everyone in the population and has no one left to infect.

This shows that for any kind of R0 bigger than influenza, when you reach the Johnson Immunity threshold your disease is infecting a huge number of people every day and is completely out of control. We have shown this for a disease with an R0 of 2.5. The R0 of COVID-19 is probably bigger than 4. In a population of 60 million where we are aiming for a herd immunity threshold of 36 million we should expect to be seeing a million new cases a a week at the point where we hit the Johnson Immunity threshold.

This is an incredibly stupid policy!

Other misconceptions in the policy

The government stated that its Johnson Immunity threshold was about 60% of the population. From this we can infer that they thought the R0 of this disease was about 2.5. However, the actual R0 of this disease is probably bigger than 4. This means that the government was working from some very optimistic – and ultimately wrong – assumptions about the virus, which would have been catastrophic had they seen this policy through.

Another terrible mistake the government made was to assume that rates of hospitalization for this disease would be the same as for standard pneumonia, a mistake that was apparently made by the Imperial College modeling team whose work they seem to primarily rely upon. This mistake was tragic, because there was lots of evidence coming out of China that this disease did not behave like classic pneumonia, but for some reason the British ignored Chinese data. They only changed their modeling when they were presented with Italian data on the proportion of serious cases. This is an incredibly bad mistake, and I can only see one reason for it – they either didn’t know, or didn’t care about, the situation in China. Given how bad this disease is, this is an incredible dereliction of duty. I think this may have happened because the Imperial College team have no Chinese members or connections to China, which is really a very good example of how important diversity is when you’re doing policy.

Conclusion

The government’s “herd immunity” strategy was based on a terrible misunderstanding of how infectious disease dynamics work, and was compounded by significantly underestimating the virulence and deadliness of the disease. Had they pursued the “herd immunity” strategy they would have reached a point where millions of people were being infected daily, because the point in an epidemic’s growth where it reaches Rt=1 is usually the point where it is at its most rapidly spreading, and also its most dangerous. It was an incredibly reckless and stupid policy and it is amazing to me that anyone with any scientific background supported it, let alone the chief scientific adviser. Britain is facing its biggest crisis in generations, and is being led by people who are simply not competent to manage it in any way.

Sadly, this language of “herd immunity” has begun to spread through the pundit class and is now used routinely by people talking about the potential peak of the epidemic. It is not true herd immunity, and there is no sense in which getting to the peak of the epidemic to “immunize” the population is a good idea, because getting to the peak of the epidemic means getting to a situation where hundreds of thousands or millions of people are being infected every week.

The only solution we have for this virus is to lockdown communities, test widely, and isolate anyone who tests positive. This is being done successfully in China, Vietnam, Japan, Australia and New Zealand. Any strategy based on controlled spread will be a disaster, and anyone recommending it should be removed from any decision-making position immediately.

Appendix: Brief technical note

R0 (and Rt) are very important numerical qualities of an infectious disease but they are not easily calculated. They are numbers that emerge from the differential equations we use to describe the disease, and not something we know in advance. There are two ways to calculate them: Empirically from data on the course of disease in individuals, or through dynamic analysis of disease models.

To estimate R0 empirically we obtain data on individuals infected with the disease, so we know when they were infected and when they recovered down to the narrowest possible time point. We then use some statistical techniques related to survival analysis to assess the rate of transmission and obtain statistical estimates for R0.

To estimate R0 from the equations describing the disease, we first establish a set of ordinary differential equations that describe the rates of change of uninfected, infected, and recovered populations. From this system of equations we can obtain a matrix called the Next Generation Matrix, which describes all the flows in and out of the disease states, and from this we can obtain the value of R0 through a method called spectral analysis (basically it is the dominant eigenvalue of this matrix). In this case we will have an equation which describes R0 in terms of the primary parameters in the differential equations, and in particular in terms of the number of daily contacts, the specific infectiousness of the disease when a contact occurs, and the recovery time. We can use this equation to fiddle with some parameters to see how R0 will change. For example, if we reduce the recovery time through treatment, will R0 drop? If we reduce the infectiousness by mask wearing, how will R0 drop? Or if we reduce the number of contacts by lockdowns, how will R0 drop? This gives us tools to assess the impact of various policies.

In the early period of a new infectious disease people try to do rough and ready calculations of R0 based on the data series of infection numbers in the first few weeks of the disease. During this period the disease is still very vulnerable to random fluctuation, and is best described as a stochastic process. It is my opinion that in this early stage all diseases look like they have an R0 of 1.5 or 2, even if they are ultimately going to explode into something far bigger. In this outbreak, I think a lot of early estimates fell into this problem, and multiple papers were published showing that R0 was 2 or so, because the disease was still in its stochastic stage. But once it breaks out and begins infecting people with its full force, it becomes deterministic and only then can we truly understand its infectious potential. I think this means that early estimates of R0 are unreliable, and the UK government was relying on these early estimates. I think Asian governments were more sensible, possibly because they were in closer contact with China or possibly because they had experience with SARS, and were much more wary about under-estimating R0. I think this epidemic shows that it is wise to err on the side of over-estimation, because once the outbreak hits its stride any policies built on low R0 estimates will be either ineffective or, as we saw here, catastrophic.

But whatever the estimate of R0, any assumption that herd immunity can be achieved by allowing controlled infection of the population is an incredibly stupid, reckless, dangerous policy, and anyone advocating it should not be allowed near government!

There is a lot of pressure at present for the expansion of testing for COVID-19 to enable better understanding of the spread of the virus and possibly to help with reopening of the economy. Random population surveys have also been conducted in many countries, with a recent antibody survey in California, for example, finding 50 times more people infected than official estimates report. The WHO recognizes testing as a key part of the coronavirus response, and some countries are beginning to discuss the idea of “immunity passports”, in which people are given an antibody test and enabled to return to work if they test positive to antibodies and are well (since this indicates that they have been infected and gained immunity). The WHO advises against this approach because there is no evidence yet that people who have experienced COVID-19 and recovered are actually immune. But in addition to this virological concern, there is a larger, statistical concern about COVID-19 tests (especially antibody tests) and the consequence of widespread use of these tests as a policy guide: how reliable are they, and what are the consequences of deploying poor-quality tests?

My reader(s) may be familiar with my post on the use of Bayesian statistics to assess the impact of anti-trans bathroom laws on natal women. This study found that, since being transgender is a very low prevalence phenomenon, if we tried to actually enforce birth-gender bathroom laws almost everyone we kicked out of a woman’s toilet would actually be a cis woman. This is a consequence of Bayes’ Law, which basically tells us that when a condition has very low prevalence, any attempt to test for that condition will largely produce false positives unless the test is a very very accurate test. This applies to any attempt to discriminate between two classes of things (e.g. trans women vs. natal women, or coronavirus vs. no coronavirus). It is a universal mathematical theory, and there is no escaping it.

So what happens with testing for coronavirus. There are a couple of possible policies that can be enacted based on the result of testing:

  1. People testing positive are isolated from the rest of the community in special hospitals or accommodation, to be treated and managed until they recover
  2. People testing positive self-isolated and all their potential contacts are traced and tested, self-isolating as necessary
  3. People testing negative are allowed to return to ordinary life, working and traveling as normal
  4. People testing positive to antibodies with no illness are issued an “immunity passport” and allowed to take up essential work
  5. Health workers testing negative are allowed to return to hospital

Obviously, depending on the policy, mistakes in testing can have significant consequences. This is why the WHO has quite strict diagnostic criteria for the use of testing, which requires multiple tests at different specified time points with rules about test comparison and cautionary notes about low-prevalence areas[1]. Now that some antibody tests have achieved marketing status, I thought I would do a few brief calculations using Bayes’ rule to see how good they are and what the consequences will be. In particular let’s consider policy options 1, 3 and 4. I found a list of antibody tests currently being marketed or used in the USA here, and information on one PCR test, from Quantivirus. I assumed a testing program applied to a million people, and for each test under this program I calculated the following information:

  • The number of people testing positive and the number who are actually negative
  • The proportion of positive tests that are actually positive
  • The number of people testing negative and the number who are actually positive
  • The estimated prevalence of COVID-19 obtained from each of these tests

I used the current number of cases in the USA on 24th April (870,000), multiplied by 10 to include asymptomatic/untested cases and a US population of 330 million to estimate the true prevalence of coronavirus in USA at 2.6%.  Note that with 2.6% prevalence the true situation is 26,000 cases of COVID-19 and 974,000 people negative. I then compared the estimated prevalence for each test against this. Here are the results

Beckton-Dickinson/Biomedomics Covid-19 IgM/IgG Rapid Test

This test has 88.7% sensitivity and 90.6% specificity, and has been given emergency use authorization by the FDA. If used to test a million people in the context of disease prevalence of 2.6%, we would find the following results:

  • 114,906 people testing positive of whom 91,521 are actually negative
  • Only 20.4% of tests positive
  • 885,903 people testing negative, of whom 2,979 are positive
  • An estimated coronavirus prevalence of 11.4%

This would mean that under policy 1 (isolation of all positive cases) we would probably increase prevalence by a factor of 5, since 80% of the people we put into isolation with positive cases would be negative (and would then be infected). If we followed policy 3 or 4, we would be releasing 2,979 people into the community to work, get on trains etc., and infect others. We would also recalculate the case fatality rate of the virus to be 50 times lower than the actual observed estimate, because we had observed deaths among 870,000 cases (prevalence 0.26%) but were now dividing the confirmed deaths by a prevalence of 11.4%. This would make us think the disease is not much worse than influenza, while we were spreading it to five times as many people. Not good! Curing that epidemic is going to need a lot of bleach injections.

Cellex qSars-CoV-2 IgG/IgM Cassette Rapid Test

This test has also received emergency use authorization, and has 93.8% sensitivity and 95.6% specificity, which sounds good (very big numbers! Almost as good as Trump’s approval rating!) But if used to test 1,000,000 Americans with prevalence of 2.6% it still performs very poorly:

  • 67,569 people testing positive of whom 42840 are actually negative
  • Only 36.5% of tests positive
  • 932,430 people testing negative, of whom 1,635 are positive
  • An estimated coronavirus prevalence of 6.8%

This is still completely terrible. Isolating all the positive people (policy 1) would likely increase prevalence by a factor of 3, and we would allow 1,635 people to run around infecting others blithely assuming they were negative. Not a good outcome.

CTK Biotech OnSite Covid-19 IgG/IgM Rapid Test

This test has not yet received emergency use authorization, but has 96.9% sensitivity and 99.4% specificity. With this test:

  • 31,338 people test positive of whom 5,841 are actually negative
  • About 81% of tests are actually positive
  • 968,611 people test negative, of whom 817 are positive
  • An estimated coronavirus prevalence of 3.1%

This is much better – most people testing positive are actually positive, we aren’t releasing so many people into the wild to infect others, and our prevalence estimate is close to the true prevalence. But it still means a lot of people are being given incorrect information about their status, and are taking risks as a result.

Conclusion

Even slightly inaccurate tests have terrible consequences in epidemiology. As testing expands the ability to conduct it carefully and thoroughly – with multiple tests, sequenced tests, and clinical confirmation – drops, and the impact of even small imperfections in the testing regime grows rapidly. In the case of a highly contagious virus like COVID19 this can be catastrophic. It will expose uninfected people to increased risk of infection through hospitalization or isolation alongside positives, and if used for immunity passports significantly raises the risk of positive people returning to work in places where they can infect others. In comparison to widespread testing with low-quality tests, non-pharmaceutical interventions (e.g. lockdowns and social distancing) are far more effective, cheaper and less dangerous. It is very important that in our desire to reopen economies and restart our social lives we do not rush to use unreliable tests that will increase, rather than reduce, the risk to the community of social interactions. While testing early and often is a good, strong policy for this pandemic, this is only true when testing is conducted rigorously and using good quality tests, and not used recklessly to end social interventions that, while painful, are guaranteed to work.

 


fn1: It’s almost as if they know what they’re doing, and we should listen to them!

Tokyo Zombie Movie

The novel coronavirus (COVID-19) continues to spread globally, and at this point in its progress very few high-income countries have escaped its grip. On a per-capita basis Spain has 38 times the rate of infection of China, the US 10 times and Australia 3 times, but plucky Japan has only 0.3 times the infection rate of China. Until now the rate of growth has been low, with only tens of cases per day being recorded over much of February and March, but since last week the alarm has been sounding, and the government is beginning to worry. We had our first lockdown on the weekend, a voluntary two days of 自粛 in which everyone was supposed to stay inside, and this week discussion of lockdown began. This is because the previous week was a bright, sunny weekend with the cherry blossoms blooming, and all of Tokyo turned out to see them despite the Governor’s request for everyone to be cautious. Over the two weeks leading up to that weekend, and for perhaps two days afterwards, the train system returned to normal and Tokyo was being its normal bustling, busy uncaring self. But then on the week after that event the numbers began to climb, and now the government is worried as it begins to watch the numbers slide out of control. I am also now hearing for the first time stories of doctors having to find alternative ICU beds for COVID patients – still not a huge deal, because any one hospital does not have a large supply, but enough cases are now appearing to force doctors to seek empty hospitals elsewhere.

It is possible to see the effect of this party atmosphere in the data, and it offers a strong example of how important social distancing is. Using the data from the Johns Hopkins Coronavirus tracker (and making a few tiny adjustments for missing data in their downloadable file), I obtained and plotted the number of new cases each day, shown in Figure 1 below. Here the x axis is the number of days since the first infection was identified, and the y-axis is the number of new cases. Day 70 is the 1st April. The red line is a basic lowess smooth, not a fancy model.

Figure 1: Daily new cases by time since the first case

It is clear from this figure that things changed perhaps a week ago. New case numbers were up and down a lot but generally clustered together, representing slow growth, but since about a week ago the gaps between each dot are growing, and more dots are above than below the line. This is cause for concern.

However, it is worth remembering that each day the total number of cases is increasing, which means also that if you add the same number of new cases on any day, it will have a proportionately smaller effect on the total. We can estimate this by calculating the percentage change each day due to the new cases added on that day. So for example if there are 10 cases in total and 10 new cases are detected we see a 100% change; but 10 new cases with 100 existing cases will lead to only a 10% change. From this we can calculate the daily doubling time: the time required for the number of cases to double if we keep adding cases at the same percentage increase that we saw today. So, for example, if there are 100 cases on day 9 and on day 10 there are 10 more cases, the percentage change is 10%, and from that I can estimate that the number of cases will double after 7.2 days if that 10% daily change continues. This gives a natural estimate of the rate at which the disease is growing, adjusting for its current size. Figure 2 shows the doubling time each day for Tokyo, again with the number of days since the first infection on the x-axis. I have trimmed the doubling time at 20 days, so a few early points are missing because they had unrealistically high doubling times, and added a lowess smooth to make the overall pattern stand out. The vertical red line corresponds with Friday March 20th, a national holiday and the first day of the long weekend where everyone went cherry blossom viewing.

Figure 2: Daily time required for case numbers to double in Japan

Since the infection hit Japan the doubling time has been growing slowly, so that in February it would take almost two weeks for the number of cases to double. The doubling time dropped in March[1], which was also the time that the government began putting in its first social distancing guidelines (probably about late February); work events were being canceled or postponed by early March, probably in response to government concern about the growing number of cases, and this appears after two weeks to have worked, bringing the doubling times back up to more than two weeks. And that was when the sunny weather came and everyone went to hanami, marked on the red line, at which point the doubling time dropped like a stone. Back in the middle of March we were seeing between 10 and 40 cases a day, slow changes; but then after that weekend the number of cases exploded, to 100 or 200 a day, pretty much 4-6 days after the long weekend started. The following weekend was when the government demanded everyone stay in, and the city shut up shop; but we won’t begin to see the effect of those measures until tomorrow or this weekend, and right now the number of new cases is still hovering around 200 a day.

It’s worth noting that not all of these cases are community transmission. About 10% are without symptoms, and another 20% are having symptoms confirmed (probably because they’re very mild), which indicates the effectiveness of contact tracing in tracking down asymptomatic contacts. A lot of these cases are foreigners (something like 20-25%), and this is likely because they’re residents returning from overseas, and likely identified during quarantine/self-isolation (so not especially risky to the community). But still, even 70% of 200 is a lot of cases.

It’s instructive to compare this doubling time with some heavily-affected countries. Figure 3 shows the smoothed doubling times for Japan, the US, Italy and Australia. It has the same axes, but I have dropped the data points for clarity (I make no promises about the quality of these hideous smooths). The legend shows which country has which colour. Italy and Australia start slightly later in this data because their first imported case was not at day 0.

Figure 3: Doubling times for four affected countries

As you can see, Italy’s doubling time was almost daily in the first week of its epidemic, but has been climbing rapidly since they introduce social distancing. Australia’s doubling time was consistently a week, but began to increase in the last two weeks as people locked in. The US tracked Japan for a couple of weeks and then took a nose dive, so that at one point the daily doubling time was 3 days. Italy provides a really instructive example of the power of social distancing, which was introduced in some areas on February 28th and nationally in increasingly serious steps from 1st March to 9th March. Figure 4 shows Italy’s doubling time over the epidemic.

Figure 4: Doubling time for Italy

 

It is very clear that as measures stepped up the doubling time gradually increased. In this figure day 40 is the first of March, the first day that national measures were announced. Despite this, we can see from Figure 3 that it took Italy about a month and a half from the first case to slow the spread enough that further doubling might take a week, and early inaction meant that a month of intensely aggressive measures were needed to slow the epidemic, at huge cost.

It is my hope that Japan’s early measures, and aggressive investigation of clusters at the beginning of the outbreak, will mean that we don’t need to go into a month-long lockdown. But if Japan’s population – and especially Tokyo’s – don’t take it seriously now, this week and this weekend, Tokyo will go the same way as London and Italy. It’s time for Tokyo to make a two week sacrifice for its own good. Let’s hope we can do it!


fn1: Which the smooth doesn’t show, by the way, it’s an awful smooth and I couldn’t improve it by fiddling with the bandwidth[2]

fn2: A better model would be a slowly increasing straight line with a peak at the hanami event and then a rapid drop, but I couldn’t get that to work and gave up[3].

fn3: Shoddy jobs done fast is my motto!

The 2019 novel coronavirus (COVID-19) has now escaped China and taken a firm grip on the rest of the world, with Italy in a complete lockdown, most of Europe shuttered and the UK and the US spaffing their response up a wall. A few weeks ago I wrote a short post assessing the case fatality rate of the disease and assessing whether it is a global threat, and I think now is time to write an update on the virus. In this post I will address the mortality rate, some ways of looking at the total disease burden, discuss its infectiousness, and talk about what might be coming if we don’t get a grip on this. In the past few weeks I have been working with Chinese collaborators on this virus so I am going to take the unusual step of referencing some of my meat life work, though as always I won’t name collaborators, so as to avoid their names being associated with a blog that sometimes involves human sacrifice.

As always, what COVID-19 is doing can be understood in terms of infectious disease epidemiology and the mathematics that underlies it, but only to the extent that we have good quality data. Fortunately we now do have some decent data, so we can begin to make some strong judgments – and the conclusions we will draw are not pretty.

How deadly is this disease?

The deadliness of an infectious disease can be assessed in terms of its case fatality ratio (CFR), which is the proportion of affected cases who die. In my last post I estimated the CFR for COVID-19 to be about 0.4% (uncertainty range 0.22 – 1.7%), and suggested it was between 2 and 10 times as deadly as influenza. The official CFR in China has hovered around 2%, but we know that many mild cases were not diagnosed, and the true CFR must be lower. Since then, however, the Diamond Princess cruise ship hove into view, was quarantined off Yokohama, and carefully monitored. This is a very serendipitous event (for those not on the ship, obviously) since it means we have a complete case record – every case on that ship was diagnosed, symptomatic or not. On that ship we saw 700 people infected and 7 deaths, so a CFR of 1%. I used a simple Bayesian method to use that confirmed mortality rate, updated by the deaths in China, to estimate the under reporting rate in China to be at least 50%, work which is currently available as a preprint at the WHO’s COVID-19 preprint archive. I think a decent estimate of the under reporting rate is 90%, indicating that there are 10 times as many cases as are being reported, and the true CFR is therefore 10 times lower. That puts the CFR in China at 0.2%, or probably twice as deadly as the seasonal flu. However, we also have data from South Korea, where an extensive testing regime was put in place, that suggests a CFR more in the range of 1%.

It’s worth noting that the CFR depends on the age distribution of affected people, and the age distribution in the cruise ship was skewed to very old. This suggests that in a younger population the CFR would be lower. There is also likely to be a differential rate of underreporting, with probably a lower percentage of children being reported than elderly people. It is noteworthy that only 1% of confirmed cases in China were children, which is very different to influenza. As quarantine measures get harsher and health systems struggle, it is likely that people will choose to risk not reporting their virus, and this will lead to over estimates of mortality and underestimates of total cases. But it certainly appears this disease is at least twice as dangerous as influenza.

CFRs also seem to be very different in the west, where testing coverage has been poor in some countries. Today California reported 675 cases and 16 deaths, 2.5 times the CFR rate on the Diamond Princess in probably a younger population. Until countries like the US and UK expand their testing, we won’t know exactly how bad it is in those countries but we should expect a large number of infected people to die.

On the internet and in some opinion pieces, and from the mouths of some conservative politicians, you will hear people say that it “only” kills 1% of people and so you don’t need to worry too much. This is highly misleading, because it does not take into account that in a normal year less than 1% of the population dies, and a disease that kills 1% of people will double your nation’s total death rate if it is allowed to spread uncontrolled. It is important to understand what the background risk is before you assess small numbers as “low risk”!

What is the burden of the disease?

The CFR tells you how likely an affected person is to die, but an important question is what is the burden of the disease? Burden means the total number of patients who need to be hospitalized, and the final mortality rate as a proportion of the population. While the CFR tells us what to expect for those infected, estimates of burden tell us what society can expect this disease to do.

First, let us establish a simple baseline: Japan, with 120 million people, experiences 1 million deaths a year. This is the burden of mortality in a peaceful, well-functioning society with a standard pattern of infectious disease and an elderly population. We can apply this approximately to other countries to see what is going on, on the safe assumption that any estimates we get will be conservative estimates because Japan has one of the highest mortality rates in the world[1]. Consider Wuhan, population 12 million. It should expect 100,000 deaths a year, or about 8,000 a month. Over two months it experienced about 3000 COVID-19 deaths, when it should have seen about 15,000 deaths normally. So the virus caused about 20% excess mortality. This is a very large excess mortality. Now consider Italy, which has seen 3500 deaths in about one month. Italy has a population of 60 million so should see 500,000 deaths a year, or about 40,000 a month. So it has seen about 10% excess mortality. However, those 3500 deaths have been clustered in just the Northern region, which likely only has a population similar to Wuhan – so more likely it has seen 40% excess mortality. That is a very high burden, which is reflected in obituaries in the affected towns.

Reports are also beginning to spread on both social media and in the news about the impact on hospitals in Italy and the US. In particular in Northern Italy, doctors are having to make very hard decisions about access to equipment, with new guidance likening the situation to medical decisions made after disasters. Something like 5% of affected people in Wuhan needed to be admitted to intensive care, and it appears that the symptoms of COVID-19 last longer than influenza. It also appears that mortality rates are high, and there are already predictions that Italy will run out of intensive care facilities rapidly. The situation in northern Italy is probably exacerbated by the age of the population and the rapid growth of the disease there, but it shows that there is a lot of potential for this virus to rapidly overwhelm health systems, and when it does you can expect mortality rates to sky-rocket.

This is why the UK government talked about “flattening the curve”, because even if the same total number of people are affected, the more slowly they are affected the less risk that the care system breaks down. This is particularly true in systems like the US, where hospitals maintain lean operating structures, or the UK where the health system has been stripped of all its resources by years of Tory mismanagement.

Who does it affect?

The first Chinese study of the epidemiology of this disease suggested that the mortality rate increases steeply, from 0% in children to 15% in the very elderly. It also suggested that only a very small number of confirmed cases are young people, but this is likely due to underreporting. This excellent medium post uses data from an Italian media report to compare the age distribution of cases in Italy with those in South Korea, and shows that in South Korea 30% of cases were in people aged 20-29, versus just 4% in Italy. This discrepancy arises because South Korea did extensive population-level testing, while Italy is just doing testing in severe cases (or was, at the time the report was written). Most of those young people will experience COVID-19 as a simple influenza-like illness, rather than the devastating respiratory disease that affects elderly people, and if we standardize the Chinese CFR to this Korean population we would likely see it drop from 2% to 1%, as the Koreans are experiencing. This South Korean age distribution contains some important information:

  • The disease does not seem to affect children much, and doesn’t harm them, which is good
  • Young people aged 20-39 are likely to be very efficient carriers and spreaders of the disease
  • Elderly people are at lower risk of getting the disease than younger people but for them it is very dangerous

This makes very clear the importance of social distancing and lockdowns for preventing the spread of the disease. Those young people will be spreading it to each other and their family members, while not feeling that it is very bad. If you saturate that young population with messages that people are overreacting and that there is not a serious risk and that “only” the elderly and the sick will die, you will spread this disease very effectively to their parents and grandparents – who will die.

It’s worth noting that a small proportion of those young people do experience severe symptoms and require hospitalization and ventilation. In health workers in China there was a death rate among health workers of about 0.2%, and we could probably take that as the likely CFR in young people with good access to care. If the disease spreads fast enough and overwhelms health systems, we can expect to see not insignificant mortality in people aged 20-39, as their access to intensive care breaks down. This is especially likely in populations with high prevalence of asthma (Australia) or diabetes (the US and the UK) or smoking (Italy, and some parts of eastern Europe). So it is not at this stage a good idea for young people to be complacent about their own risk, and if you have any sense of social solidarity you should be being very careful about the risk you pose to others.

How fast does it spread?

The speed at which an infectious disease spreads can be summarized by two numbers: the generation time and the basic reproduction number (R0). Generation time is the time it takes for symptoms to appear in a second case after infection by the first case, and the basic reproduction number is the number of additional cases that will be caused by one infection. For influenza the generation time is typically 2-4 days, while for COVID-19 it is probably 4-6 days. The basic reproduction number of influenza is between 1.3 – 1.5, while the initial estimates for COVID-19 were 2.5, meaning that each case of COVID-19 will affect 2.5 people. Unfortunately I think these early estimates were very wrong, and my own research suggests the number is more likely between 4 and 5. This means that each case will infect 4-5 other cases before it resolves. This is a very fast-spreading disease, much more effective at spreading than influenza, and this high R0 explains why it was able to suddenly explode in Italy and the US. A disease with an R0 over 2 is scary and requires special efforts to control.

Those early estimates of R0 at 2 to 2.5 had a significant negative impact on assessment of the global threat of this disease. I believe they led the scientific community to be slightly complacent, and to think that the disease would be relatively easy to contain and would not be as destructive as it has become. In my research our figures for projected infection numbers show clearly that these models with lower R0 simply cannot predict the future trend of the virus – they undershoot it significantly and fit the epidemic curve poorly. Sadly governments are still acting on the basis of these estimates: the UK government’s estimate that the disease will stop spreading once 60% of people are affected is based on an R0 of 2.5, when an R0 of 4 suggests 75% of people need to be infected. An early R0 estimate of 4 would have rung alarm bells throughout the world, and would have been much more consistent with the disaster we saw unfolding in Hubei. Fortunately the Chinese medical establishment were not so complacent, and worked hard to buy the world time to prepare for this virus’s escape. Sadly many western countries did not take advantage of that extra month, and are paying the price now as they see what this disease really is like.

Because this disease is so highly infectious, special measures are needed to contain it. For a mildly dangerous disease with an R0 of 1.3 (like influenza), vaccination of the very vulnerable and sensible social distancing among infected people is sufficient to contain it without major economic disruption. Above 2, however, things get dicey, and at 4 we need to consider major measures – social distancing, canceling mass gatherings, quarantining affected individuals and cities, and travel restrictions. This is everything that China did in the second month of the outbreak once they understood what they were dealing with, and is also the key to South Korea, Japan and Singapore’s success. Because some western governments did not take this seriously, they are now going to have to take extreme measures to stop this.

How many people will be infected?

The total proportion of the population that will be affected is called the final size of the epidemic, and there is an equation linking the final size to the basic reproduction number. This equation tells us that for influenza probably 40% of the population will be affected, but it also tells us that for epidemics with basic reproduction number over 2 basically the entire population will be affected. In the case of Japan that will mean 120 million people affected with a mortality rate of probably 0.4% (assuming the health care system handles such a ridiculous scenario), or about 500,000 deaths – 50% of the total number of deaths that occur in one year. The Great East Japan Earthquake and tsunami killed 16,000 people and was considered a major disaster. It’s also worth considering that those 500,000 deaths would probably occur over 3-4 months, so over the time period they would be equivalent to probably doubling or tripling the normal mortality rate. That is a catastrophe by any measure, and although at the end of the epidemic “only” half a percent of the population will be dead, the entire population will be traumatized by it.

For a virus of this epidemicity with this kind of fatality rate, we need to take extreme measures to control it, and we need to take it very seriously as soon as it arrives in our communities. This virus cannot be contained by business as usual.

Essential supplies ready

What’s going on in Japan?

The number of cases and deaths in Japan remains quite small, and there has been some discussion overseas that Japan’s response has been poor and it is hiding the true extent of the problem. I don’t think this is entirely correct. Japan introduced basic counter-measures early on, when China was struggling and well before other countries, including cancelling events, delaying the start of the school year, introducing screening at airports and testing at designated facilities, working from home and staggering commuter trips to reduce crowding on trains. For example, work events I was planning to attend were cancelled 2-3 weeks ago, and many meetings moved online back then. Japan has a long history of hygiene measures during winter, and influenza strategies are in place at most major companies to reduce infection risk. Most museums, aquariums and shopping malls have always had hand sanitizer at the entrance, and Japan has an excellent network of public toilets that make hand washing easy. Many Japanese have always maintained a practice of hand-washing and gargling upon returning home from any outside trip, and mask wearing is quite common. Japan’s health system also has a fair amount of excess capacity, so it is in a position to handle the initial cases, isolate them and manage them. This has meant that the growth of the epidemic was slow here and well contained, although it was a little out of control in Hokkaido, where the governor declared a state of emergency (now ended). It is true that many cases are not being tested – hospitals do not recommend mild cases to attend for treatment, but to stay home and self isolate, and it is likely that mild cases will not be tested – but this is not a cover-up situation, rather an attempt to ration tests (which are not being fully utilized at the moment). There are not yet reports of emergency rooms or hospitals being overwhelmed, and things are going quite smoothly. I expect at some point the government will need to introduce stricter laws, but because of that early intervention with basic measures the epidemic appears to be under control here.

My self-isolation plan was kind of forced on me at the end of February, because I dislocated my kneecap at kickboxing in a sadly age-related way, will probably require reconstruction surgery, and am spending a lot of time trapped at home as a result. Actually that was the day that everyone else was panic buying toilet paper and so I was stuck at home with a dwindling supply of the stuff until my friends stepped up. I think most people in Japan have reduced their social activities (probably not as much as me!), and are spending less time in gatherings and events (almost of all which are canceled now), and so through that reduction in contacts plus aggressive contact tracing, the disease is largely controlled here.

Is the world over-reacting?

No. You will have heard no doubt various conservatives on Fox news and in some print outlets complaining about how the world has over-reacted and we should all be just going to the pub, perhaps you’ve seen some Twitter bullshit where a MAGA person proudly declares that they ate out in a crowded restaurant and they’ll do whatever they want because Freedumb. Those people are stupid and you shouldn’t trust them. This virus spreads easily and kills easily, and if it gets a stranglehold on your health system it will be an order of magnitude more deadly than it is right now. If you live in a sensible country (i.e. not the UK or the USA) your government will have consulted with experts and developed a plan and you should follow their recommendations and guidelines, because they have a sense of what is coming down the pipeline and what you need to do to stop it. Do the minimum you are asked to do, and perhaps prepare for being asked to do more. Don’t panic buy, but if you feel like strict isolation is coming you should start laying in supplies. Trust your friends and neighbours to help you, and don’t assume your government is bullshitting you (unless you’re in the UK or the USA, obviously). This is serious, and needs to be taken seriously.

When HIV hit the world our need to wear a condom was presented to us as a self-preserving mechanism. If you choose to circumcise your baby boy you’re probably doing so as a service to future him, not to all the women or men he might spread STIs to. But this virus isn’t like HIV. Your responsibility here isn’t to yourself, it’s to the older, frailer and less healthy members of your community who are going to die – and die horribly, I might add, suffocating with a tube in their throat after days of awful, stifled struggle – if this disease is allowed to spread. We all need to work together to protect the more vulnerable members of our community, and if we don’t react now we will lose a lot of the older people we grew up with and love.

So let’s all hunker down and get rid of this virus together!


fn1: This is a weird and counter-intuitive aspect of demography. Japan has the longest life expectancy in the world’s healthiest population, and one of the world’s highest mortality rates. Iraq, in contrast, would see half as many deaths in a normal year (without American, ah, visitors). This is because healthy populations grow old, and then die in huge numbers.

Uhtred son of Uhtred, regular ale drinker, who I predict will die of injury (but will go to Valhalla, unlike you you ale-sodden wretch)

There has been some fuss in the media recently about a new study showing no level of alcohol use is safe. It received a lot of media attention (for example here), reversed a generally held belief that moderate consumption of alcohol improves health (this is even enshrined in the Greek food pyramid, which has a separate category for wine and olive oil[1]), and led to angsty editorials about “what is to be done” about alcohol. Although there are definitely things that need to be done about alcohol, prohibition is an incredibly stupid and dangerous policy, and so are some of its less odious cousins, so before we go full Leroy Jenkins on alcohol policy it might be a good idea to ask if this study is really the bees knees, and does it really show what it says it does.

This study is a product of the Global Burden of Disease (GBD) project, at the Institute for Health Metrics and Evaluation (IHME). I’m intimately acquainted with this group because I made the mistake of getting involved with them a few years ago (I’m not now) so I saw how their sausage is made, and I learnt about a few of their key techniques. In fact I supervised a student who, to the best of my knowledge, remains the only person on earth (i.e. the only person in a population of 7 billion people, outside of two people at IHME) who was able to install a fundamental software package they use. So I think I know something about how this institution does its analyses. I think it’s safe to say that they aren’t all they’re cracked up to be, and I want to explain in this post how their paper is a disaster for public health.

The way that the IHME works in these papers is always pretty similar, and this paper is no exception. First they identify a set of diseases and health conditions related to their chosen risk (in this case the chosen risk is alcohol). Then they run through a bunch of previously published studies to identify the numerical magnitude of increased risk of these diseases associated with exposure to the risk. Then they estimate the level of exposure in every country on earth (this is a very difficult task which they use dodgy methods to complete). Then they calculate the number of deaths due to the conditions associated with this risk (this is also an incredibly difficult task to which they apply a set of poorly-accredited methods). Finally they use a method called comparative risk assessment (CRA) to calculate the proportion of deaths due to the exposure. CRA is in principle an excellent technique but there are certain aspects of their application of it that are particularly shonky, but which we probably don’t need to touch on here.

So in assessing this paper we need to consider three main issues: how they assess risk, how they assess exposure, and how they assess deaths. We will look at these three parts of their method and see that they are fundamentally flawed.

Problems with risk assessment

To assess the risk associated with alcohol consumption the IHME used a standard technique called meta-analysis. In essence a meta-analysis collects all the studies that relate an exposure (such as alcohol consumption) to an outcome (any health condition, but death is common), and then combines them to obtain a single final estimate of what the numerical risk is. Typically a meta-analysis will weight all the risks from all the studies according to the sample size of the study, so that for example a small study that finds banging your head on a wall reduces your risk of brain damage is given less weight in the meta-analysis than a very large study of banging your head on a wall. Meta-analysis isn’t easy for a lot of reasons to do with the practical details of studies (for example if two groups study banging your head on a wall do they use the same definition of brain damage and the same definition of banging?), but once you iron out all the issues it’s the only method we have for coming to comprehensive decisions about all the studies available. It’s important because the research literature on any issue typically includes a bunch of small shitty studies, and a few high quality studies, and we need to balance them all out when we assess the outcome. As an example, consider football and concussion. A good study would follow NFL players for several seasons, taking into account their position, the number of games they played, and the team they were in, and compare them against a concussion free sport like tennis, but matching them to players of similar age, race, socioeconomic background etc. Many studies might not do this – for example a study might take 20 NFL players who died of brain injuries and compare them with 40 non-NFL players who died of a heart attack. A good meta-analysis handles these issues of quality and combines multiple studies together to calculate a final estimate of risk.

The IHME study provides a meta-analysis of all the relationships between alcohol consumption and disease outcomes, described as follows[2]:

we performed a systematic review of literature published between January 1st, 1950 and Dec 31st 2016 using Pubmed and the GHDx. Studies were included if the following conditions were met. Studies were excluded if any of the following conditions were met:

1. The study did not report on the association between alcohol use and one of the included outcomes.

2. The study design was not either a cohort, case-control, or case-crossover.

3. The study did not report a relative measure of risk (either relative risk, risk ratio, odds-ratio, or hazard ratio) and did not report cases and non-cases among those exposed and un-exposed.

4. The study did not report dose-response amounts on alcohol use.

5. The study endpoint did not meet the case definition used in GBD 2016.

There are many, many problems with this description of the meta-analysis. First of all they seem not to have described the inclusion criteria (they say “Studies were included if the following conditions were met” but don’t say what those conditions were). But more importantly their conditions for exclusion are very weak. We do not, usually, include case-control and case-crossover studies in a meta-analysis because these studies are, frankly, terrible. The standard method for including a study in a meta-analysis is to assess it according to the Risk of Bias Tool and dump it if it is highly biased. For example, should we include a study that is not a randomized controlled trial? Should we include studies where subjects know their assignment? The meta-analysis community have developed a set of tools for deciding which studies to include, and the IHME crew haven’t used them.

This got me thinking that perhaps the IHME crew have been, shall we say, a little sloppy in how they include studies, so I had a bit of a look. On page 53-55 of the appendix they report the results of their meta-analysis of the relationship between atrial fibrillation and alcohol consumption, and the results are telling. They found 9 studies to include in their meta-analysis but there are many problems with these studies. One (Cohen 1988) is a cross-sectional study and should not be included, according to the IHME’s own exclusion criteria. 6 of the remaining studies assess fribillation only, while 2 assess fibrillation and fibrial flutter, a pre-cursor of fibrillation. However most tellingly, all of these studies find no relationship between alcohol consumption and fibrillation at almost all levels of consumption, but their chart on page 54 shows that their meta-analysis found an almost exponential relationship between alcohol consumption and fibrillation. This finding is simply impossible given the observed studies. All 9 studies found no relationship between moderate alcohol consumption and fibrillation, and several found no relationship even for extreme levels of consumption, but somehow the IHME found a clear relationship. How is this possible?

Problems with exposure assessment

This problem happened because they applied a tool called DISMOD to the data to estimate the relationship between alcohol exposure and fibrillation. DISMOD is an interesting tool but it has many flaws. Its main benefit is that it enables the user to incorporate exposures that have many different categories of exposure definition that don’t match, and turn them into a single risk curve. So for example if one study group has recorded the relative risk of death for 2-5 drinks, and another group has recorded the risk for 1-12 drinks, DISMOD offers a method to turn this into a single curve that will represent the risk relationship per additional drink. This is nice, and it produces the curve on page 54 (and all the subsequent curves). It’s also bullshit. I have worked with DISMOD and it has many, many problems. It is incomprehensible to everyone except the two guys who programmed it, who are nice guys but can’t give decent support or explanations of what it does. It has a very strange response distribution and doesn’t appear to apply other distributions well, and it has some really kooky Bayesian applications built in. It is also completely inscrutable to 99.99% of people who use it, including the people at IHME. It should not be used until it is peer reviewed and exposed to a proper independent assessment. It is application of DISMOD to data that obviously shows no relationship between alcohol consumption and fibrillation that led to the bullshit curve on page 54 of the appendix, that does not have any relationship to the observed data in the collected studies.

This also applies to the assessment of exposure to alcohol. The study used DISMOD to calculate each country’s level of individual alcohol consumption, which means that the same dodgy technique was applied to national alcohol consumption data. But let’s not get hung up on DISMOD. What data were they using? The maps in the Lancet paper show estimates of risk for every African and south east Asian country, which suggests that they have data on these countries, but do you think they do? Do you think Niger has accurate estimates of alcohol consumption in its borders? No, it doesn’t. A few countries in Africa do and the IHME crew used some spatial smoothing techniques (never clearly explained) to estimate the consumption rates in other countries. This is a massive dodge that the IHME apply, which they call “borrowing strength.” At its most egregious this is close to simply inventing data – in an earlier paper (perhaps in 2012) they were able to estimate rates of depression and depression-related conditions for 183 (I think) countries using data from 97 countries. No prizes to you, my astute reader, if you guess that all the missing data was in Africa. The same applies to the risk exposure estimates in this paper – they’re a complete fiction. Sure for the UK and Australia, where alcohol is basically a controlled drug, they are super accurate. But in the rest of the world, not so much.

Problems with mortality assessment

The IHME has a particularly nasty and tricky method for calculating the burden of disease, based around a thing called the year of life lost (YLL). Basically instead of measuring deaths they measure the years of your life that you lost when you died, compared to an objective global standard of life you could achieve. Basically they get the age you died, subtract it from the life expectancy of an Icelandic or Japanese woman, and that’s the number of YLLs you suffered. Add that up for every death and you have your burden of disease. It’s a nice idea except that there are two huge problems:

  • It weights death at young ages massively
  • They never incorporate uncertainty in the ideal life expectancy of an Icelandic or Japanese woman

There is an additional problem in the assessment of mortality, which the IHME crew always gloss over, which is called “garbage code redistribution.” Basically, about 30% of every country’s death records are bullshit, and don’t correspond with any meaningful cause of death. The IHME has a complicated, proprietary system that they cannot and will not explain that redistributes these garbage codes into other meaningful categories. What they should do is treat these redistributed deaths as a source of error (e.g. we have 100,000 deaths due to cancer and 5,000 redistributed deaths, so we actually have 102500 plus/minus 2500 deaths), but they don’t, they just add them on. So when they calculate burden of disease they use the following four steps:

  • Calculate the raw number of deaths, with an estimate of error
  • Reassign dodgy deaths in an arbitrary way, without counting these deaths as any form of uncertainty
  • Estimate an ideal life expectancy without applying any measure of error or uncertainty to it
  • Calculate the years of life lost relative to this ideal life expectancy and add them up

So here there are three sources of uncertainty (deaths, redistribution, ideal life expectancy) and only one is counted; and then all these uncertain deaths are multiplied by the number of years lost relative to the ideal life expectancy.

The result is a dog’s breakfast of mortality estimates, that don’t come even close to representing the truth about the burden of disease in any country due to any condition.

Also, the IHME apply the same dodgy modeling methods to deaths (using a method that they (used to?) call CoDMoD) before they calculate YLLs, so there’s another form of arbitrary model decisions and error in their assessments.

Putting all these errors together

This means that the IHME process works like this:

  • An incredibly dodgy form of meta-analysis that includes dodgy studies and miscalculates levels of risk
  • Applied to a really shonky estimate of the level of exposure to alcohol, that uses a computer program no one understands applied to a substandard data set
  • Applied to a dodgy death model that doesn’t include a lot of measures of uncertainty, and is thus spuriously accurate

The result is that at every stage of the process the IHME is unreasonably confident about the quality of their estimates, produces excessive estimates of risk and inaccurate measures of exposure, and is too precise in its calculations of how many people died. This means that all their conclusions about the actual risk of alcohol, the level of exposure, and the magnitude of disease burden due to the conditions they describe cannot be trusted. As a result, neither can their estimates of the proportion of mortality due to alcohol.

Conclusion

There is still no evidence that moderate alcohol consumption is bad for you, and solid meta-analyses of available studies support the conclusion that moderate alcohol consumption is not harmful. This study should not be believed and although the IHME has good press contacts, you should ignore all the media on this. As a former insider in the GBD process I can also suggest that in future you ignore all work from the Global Burden of Disease project. They have a preferential publishing deal with the Lancet, which means they aren’t properly peer reviewed, and their work is so massive that it’s hard for most academics to provide adequate peer review. Their methods haven’t been subjected to proper external assessment and my judgement, based on having visited them and worked with their statisticians and their software, is that their methods are not assessable. Their data is certainly dubious at times but most importantly their analysis approach is not correct and the Lancet doesn’t subject it to proper peer review. This is going to have long term consequences for global health, and at some point the people who continue to associate with the IHME’s papers (they have hundreds or even thousands of co-authors) will regret that association. I stopped collaborating with this project, and so should you. If you aren’t sure why, this paper on alcohol is a good example.

So chill, have another drink, and worry about whether it’s making you fat.


fn1: There are no reasons not to love Greek food, no wonder these people conquered the Mediterranean and developed philosophy and democracy!

fn2: This is in the appendix to their study

No this really is not “the healthy one”

Today’s Guardian has a column by George Monbiot discussing the issue of obesity in modern England, that I think fundamentally misunderstands the causes of obesity and paints a dangerously rosy picture of Britain’s dietary situation. The column was spurred by a picture of a Brighton Beach in 1976, in which everyone was thin, and a subsequent debate on social media about the causes of the changes in British rates of overweight and obesity in the succeeding half a decade. Monbiot’s column dismisses the possibility that the growth in obesity could be caused by an increase in the amount we eat, by a reduction in the amount of physical activity, or by a change in rates of manual labour. He seems to finish the column by suggesting it is all the food industry’s fault, but having dismissed the idea that the food industry has convinced us to eat more, he is left with the idea that the real cause of obesity is changes in the patterns of what we eat – from complex carbohydrates and proteins to sugar. This is a bugbear of certain anti-obesity campaigners, and it’s wrong, as is the idea that obesity is all about willpower, which Monbiot also attacks. The problem here though is that Monbiot misunderstands the statistics badly, and as a result dismisses the obvious possibility that British people eat too much. He commits two mistakes in his article: first he misunderstands the statistics on British food consumption, and secondly he misunderstands the difference between a rate and a budget, which is ironic given he understands these things perfectly well when he comments on global warming. Let’s consider each of these issues in turn.

Misreading the statistics

Admirably, Monbiot digs up some stats from 1976 and compares them with statistics from 2018, and comments:

So here’s the first big surprise: we ate more in 1976. According to government figures, we currently consume an average of 2,130 kilocalories a day, a figure that appears to include sweets and alcohol. But in 1976, we consumed 2,280 kcal excluding alcohol and sweets, or 2,590 kcal when they’re included. I have found no reason to disbelieve the figures.

This is wrong. Using the 1976 data, Monbiot appears to be referring to Table 20 on page 77, which indicates a yearly average of 2280 kCal. But this is the average per household member, and does not account for whether or not a household member is a child. If we refer to Table 24 on page 87, we find that a single adult in 1976 ate an average of 2670 kCal; similar figures apply for two adult households with no children (2610 kCal). Using the more recent data Monbiot links to, we can see that he got his 2,130 kCal from the file of “Household and Eating Out Nutrient Intakes”. But if we use the file “HC – Household nutrient intakes” and look at 2016/17 for households with one adult and no children, we find 2291 kCal, and about 2400 as recently as 10 years ago. These are large differences when they accrue over years.

This is further compounded by the age issue. When we look at individual intake we need to consider how old the family members are. If an average individual intake is 2590 kCal in 1976 including alcohol and sweets, as Monbiot suggests, we need to rebalance it for adults and children. In a household with three people we have 7700 kCal, which if the child is eating 1500 kCal means that the adults are eating close to 3100 kCal each. That’s too much food for everyone in the house, even using the ridiculously excessive nutrient standards provided by the ONS.  It’s also worth remembering that the age of adults in 1976 was on average much younger than now, and an intake of 2590 might be okay for a young adult but it’s not okay for a 40-plus adult, of which there are many more now than there were then. This affects obesity statistics.

Finally it’s also worth remembering that obesity is not evenly distributed, and an average intake of 2100 kCal could correspond to an average of 2500 in the poorest 20% of the population (where obesity is common) and 1700 kCal in the richest, which is older and thinner. An evenly distributed 2100 kCal will lead to zero obesity over the whole population, but an unevenly distributed 2100 kCal will not. It’s important to look carefully at the variation in the datasets before deciding the average is okay.

Misunderstanding budgets and rates

Let’s consider the 2590 kCal that Monbiot finds as the average intake of adults in 1976, including alcohol and sweets. This is likely wrong, and the average is probably more like 3000 kCal including alcohol and sweets, but let’s go with it for now. Monbiot is looking to see what has changed in our diet over the past 40 years to lead to current rates of obesity, because he is looking for a change in the rate of consumption. But he doesn’t consider that all humans have a budget, and that a small excess of that budget over a long period is what drives obesity. The reality is that today’s obesity rates do not reflect today’s consumption rates, but the steady pattern of consumption over the past 40 years. What made a 55 year old obese today is what they ate in 1976 – when they were 15 – not what the average person eats today. So rather than saying “we eat less today than we did 40 years ago so that can’t be the cause of obesity”, what really matters is what people have been eating for the past 40 years. And the stats Monbiot uses suggest that women, at least, have been eating too much – a healthy adult woman should eat about 2100 kCal, and if the average is 2590 then a woman in 1976 has been at or above her energy intake every year for the past 40 years. It doesn’t matter that a woman’s intake declined to 2100 kCal in 2016, because she has been eating too much for the past 35 years anyway. It’s this budget, not changes over time, which determine the obesity rate now, and Monbiot is wrong to argue that it’s not overeating that has caused the obesity epidemic. Unless he accepts that a woman can eat 2590 kCal every year for 40 years and stay thin, he needs to accept that the problem of obesity is one of British food culture over half a century.

What this means for obesity policy

Somewhat disappointingly and unusually for a Monbiot article, there are no sensible policy prescriptions at the end except “stop shaming fat people.” This isn’t very helpful, and neither is it helpful to dismiss overeating as a cause, since everyone in public health knows that overeating is the cause of obesity. For example, Public Health for England wants to reduce British calorie intake, and the figures on why are disturbing reading. Reducing calorie intake doesn’t require shaming fat people but it does require acknowledgement that British people eat too much. This comes down not to individual willpower but to the food environment in which we all make choices about what to eat. The simplest way, for example, to reduce the amount that people eat is not to give them too much food. But there is simply no way in Britain that you can eat out or buy packaged food products without buying too much food. It is patently obvious that British restaurants serve too much food, that British supermarkets sell food in packages that are too large, and that as a result the only way for British people not to eat too much is through constant acts of will – leaving half the food you paid for, buying only fresh food in small amounts every day (which is only possible in certain wealthy inner city suburbs), and carefully controlling where, when and how you eat. This is possible but it requires either that you move in a very wealthy cultural circle where the environment supports this kind of thing, or that you personally exert constant control over your life. And that latter choice will inevitably end in failure, because constantly controlling every aspect of your food intake in opposition to the environment where you purchase, prepare and consume food is very very difficult.

When you live in Japan you live in a different food environment, which encourages small serving sizes, fresh and raw foods, and low fat and low sugar foods. In Japan you live in a food environment where you are always close to a small local supermarket with convenient opening hours and fresh foods, and where convenience stores sell healthy food in small serving sizes. This means that you can choose to buy small amounts of fresh food as and when you need them, and avoid buying in bulk in a pattern that encourages over consumption. When your food choices fail (for example you have to eat out, or buy junk food) you will have access to a small, healthy serving. If you are a woman you will likely have access to a “woman’s size” or “princess size” that means you can eat the smaller calorific food that your smaller calorific requirements suggest is wisest. It is easy to be thin in Japan, and so most people are thin. Overeating in Japan really genuinely is a choice that you have to choose to make, rather than the default setting. This difference in food environment is simple, obvious and especially noticeable when (as I just did) you hop on a plane to the UK and suddenly find yourself confronted with double helpings of everything, and super markets where everything is “family sized”. The change of food environment forces you to eat more. It’s as simple as that.

What Britain needs is a change in the food environment. And achieving a change in food environment requires first of all recognizing that British people eat too much, and have been eating too much for way too long. Monbiot’s article is an exercise in denialism of that simple fact, and he should change it or retract it.

The journal Molecular Autism this week published an article about the links between Hans Asperger and the Nazis in world war 2 Vienna, Austria. Hans Asperger is the paediatric pscyhiatrist on whose work Asperger’s syndrome is based, and after whom the syndrome is known. Until recently Asperger was believed to have been an anti-Nazi, someone who resisted the Nazis and risked his own career to protect some of his developmentally delayed patients from the Nazi “euthanasia” program, which killed or sterilized people with certain developmental disabilities for eugenics reasons.

The article, entitled Hans Asperger, National Socialism, and “race hygiene” in Nazi-era Vienna, is a thorough, well-researched and extensively documented piece of work, which I think is based on several years of detailed examination of primary sources, often in their original German. It uses these sources – often previously untouched – to explore and rebut several claims Asperger made about himself, and also to examine the nature of his diagnostic work during the Nazi era to see whether he was resisting or aiding the Nazis in their racial hygiene goals. In this post I want to talk a little about the background of the paper, and ask a few questions about the implications of these findings for our understanding of autism, and also for our practice as public health workers in the modern era. I want to make clear that I do not know much if anything about Asperger’s syndrome or autism, so my questions are questions, not statements of opinion disguised as questions.

What was known about Asperger

Most of Asperger’s history under the Nazis was not known in the English language press, and when his name was attached to the condition of Asperger’s syndrome he was presented as a valiant defender of his patients against Nazi racial hygiene, and as a conscientious objector to Nazi ideology. This view of his life was based on some speeches and written articles translated into English during the post war years, in particular a 1974 interview in which he claims to have defended his patients and had to be saved from being arrested by the Gestapo twice by his boss, Dr. Hamburger. Although some German language publications were more critical, in general Asperger’s statements about his own life’s work were taken at face value, and seminal works in 1981 and 1991 that introduced him to the medical fraternity did not include any particular reference to his activities in the Nazi era.

What Asperger actually did

Investigation of the original documents shows a different picture, however. Before Anschluss (the German occupation of Austria in 1938), Asperger was a member of several far right Catholic political organizations that were known to be anti-semitic and anti-democratic. After Anschluss he joined several Nazi organizations affiliated with the Nazi party. His boss at the clinic where he worked was Dr. Hamburger, who he claimed saved him twice from the Gestapo. In fact Hamburger was an avowed neo-nazi, probably an entryist to these Catholic social movements during the period when Nazism was outlawed in Vienna, and a virulent anti-semite. He drove Jews out of the clinic even before Anschluss, and after 1938 all Jews were purged from the clinic, leaving openings that enabled Asperger to get promoted. It is almost impossible given the power structures at the time that Asperger could have been promoted if he disagreed strongly with Hamburger’s politics, but we have more than circumstantial evidence that they agreed: the author of the article, Herwig Czech, uncovered the annual political reports submitted concerning Asperger by the Gestapo, and they consistently agreed that he was either neutral or positive towards Nazism. Over time these reports became more positive and confident. Also during the war era Asperger gained new roles in organizations outside his clinic, taking on greater responsibility for public health in Vienna, which would have been impossible if he were politically suspect, and his 1944 PhD thesis was approved by the Nazis.

A review of Asperger’s notes also finds that he did send at least some of his patients to the “euthanasia” program, and in at least one case records a conversation with a parent in which the child’s fate is pretty much accepted by both of them. The head of the institution that did the “euthanasia” killings was a former colleague of Asperger’s, and the author presents pretty damning evidence that Asperger must have known what would happen to the children he referred to the clinic. It is clear from his speeches and writings in the Nazi era that Asperger was not a rabid killer of children with developmental disabilities: he believed in rehabilitating children and finding ways to make them productive members of society, only sending the most “ineducable” children to institutional care and not always to the institution that killed them. But it is also clear that he accepted the importance of “euthanasia” in some instances. In one particularly compelling situation, he was put in charge – along with a group of his peers – of deciding the fate of some 200 “ineducable” children in an institution for the severely mentally disabled, and 35 of those ended up being murdered. It seems unlikely that he did not participate in this process.

The author also notes that in some cases Asperger’s prognoses for some children were more severe than those of the doctors at the institute that ran the “euthanasia” program, suggesting that he wasn’t just a fairweather friend of these racial hygiene ideals, and the author also makes the point that because Asperger remained in charge of the clinic in the post-war years he was in a very good position to sanitize his case notes of any connection with Nazis and especially with the murder of Jews. Certainly, the author does not credit Asperger’s claims that he was saved from the Gestapo by Hamburger, and suggests that these are straight-up fabrications intended to sanitize Asperger’s role in the wartime public health field.

Was Asperger’s treatment and research ethical in any way?

Reading the article, one question that occurred to me immediately was whether any of his treatments could be ethical, given the context, and also whether his research could possibly have been unbiased. The “euthanasia” program was actually well known in Austria at the time – so well known in fact that at one point allied bombers dropped leaflets about it on the town, and there were demonstrations against it at public buildings. So put yourself in the shoes of a parent of a child with a developmental disability, bringing your child to the clinic for an assessment. You know that if your child gets an unfavourable assessment there is a good chance that he or she will be sterilized or taken away and murdered. Asperger offers you a treatment that may rehabilitate the child. Obviously, with the threat of “euthanasia” hanging over your child, you will say yes to this treatment. But in modern medicine there is no way that we could consider that to be willing consent. The parent might actually not care about “rehabilitating” their child, and is perfectly happy for the child to grow up and be loved within the bounds of what their developmental disability allows them; it may be that rehabilitation is difficult and challenging for the child, and not in the child’s best emotional interests. But faced with that threat of a racial hygiene-based intervention, as a parent you have to say yes. Which means that in a great many cases I suspect that Asperger’s treatments were not ethical from any post-war perspective.

In addition, I also suspect that the research he conducted for his 1944 PhD thesis, in addition to being unethical, was highly biased, because the parents of these children were lying through their teeth to him. Again, consider yourself as the parent of such a child, under threat of sterilization or murder. You “consent” to your child’s treatment regardless of what might be in the child’s best developmental and emotional interests, and also allow the child to be enrolled in Asperger’s study[1]. Then your child will be subjected to various rehabilitation strategies, what Asperger called pedagogical therapy. You will bring your child into the clinic every week or every day for assessments and tests. Presumably the doctor or his staff will ask you questions about the child’s progress: does he or she engage with strangers? How is his or her behavior in this or that situation? In every situation where you can, you will lie and tell them whatever you think is most likely to make them think that your child is progressing. Once you know what the tests at the clinic involve, you will coach your child to make sure he or she performs well in them. You will game every test, lie at every assessment, and scam your way into a rehabilitation even if your child is gaining nothing from the program. So all the results on rehabilitation and the nature of the condition that Asperger documents in his 1944 PhD thesis must be based on extremely dubious research data. You simply cannot believe that the research data you obtained from your subjects is accurate when some of them know that their responses decide whether their child lives or dies. Note that this problem with his research exists regardless of whether Asperger was an active Nazi – it’s a consequence of the times, not the doctor – but it is partially ameliorated if Asperger actually was an active resister to Nazi ideology, since it’s conceivable in that case that the first thing he did was give the parent an assurance that he wasn’t going to ship their kid off to die no matter what his diagnosis was. But since we now know he did ship kids off to die, that possibility is off the table. Asperger’s research subjects were consenting to a research study and providing subjective data on the assumption that the study investigator was a murderer with the power to kill their child. This means Asperger’s 1944 work probably needs to be ditched from the medical canon, simply on the basis of the poor quality of the data. It also has implications, I think, for some of his conclusions and their influence on how we view Asperger’s syndrome.

What does this mean for the concept of the autism spectrum?

Asperger introduced the idea of a spectrum of autism, with some of the children he called “autistic psychopaths” being high functioning, and some being low functioning, with a spectrum of disorder. This idea seems to be an important part of modern discussion of autism as well. But from my reading of the paper [again I stress I am not an expert] it seems that this definition was at least partly informed by the child’s response to therapy. That is, if a child responded to therapy and was able to be “rehabilitated”, they were deemed high functioning, while those who did not were considered low functioning. We have seen that it is likely that some of the parents of these children were lying about their children’s functional level, so probably his research results on this topic are unreliable, but there is a deeper problem with this definition, I think. The author implies that Asperger was quite an arrogant and overbearing character, and it seems possible to me that his assumption that he is deeply flawed in assuming his therapy would always work and that if it failed the problem was with the child’s level of function. What if his treatment only worked 50% of the time, randomly? Then the 50% of children who failed are not “low-functioning”, they’re just unlucky. If we compare with a pharmaceutical treatment, it simply is not the case that when your drugs fail your doctor deems this to be because you are “low functioning”, and ships you off to the “euthanasia” clinic. They assume the drugs didn’t work and give you better, stronger, or more experimental drugs. Only when all the possible treatments have failed do they finally deem your condition to be incurable. But there is no evidence that Asperger considered the possibility that his treatment was the problem, and because the treatment was entirely subjective – the parameters decided on a case-by-case basis – there is no way to know whether the problem was the children or the treatment. So to the extent that this concept of a spectrum is determined by Asperger’s judgment of how the child responded to his entirely subjective treatment, maybe the spectrum doesn’t exist?

This is particularly a problem because the concept of “functioning” was deeply important to the Nazis and had a large connection to who got selected for murder. In the Nazi era, to quote Negan, “people were a resource”, and everyone was expected to be functioning. Asperger’s interest in this spectrum and the diagnosis of children along it wasn’t just or even driven by a desire to understand the condition of “autistic psychopathy”, it was integral to his racial hygiene conception of what to do with these children. In determining where on the spectrum they lay he was providing a social and public health diagnosis, not a personal diagnosis. His concern here was not with the child’s health or wellbeing or even an accurate assessment of the depth and nature of their disability – he and his colleagues were interested in deciding whether to kill them or not. Given the likely biases in his research, the dubious link between the definition of the spectrum and his own highly subjective treatment strategy, and the real reasons for defining this spectrum, is it a good idea to keep it as a concept in the handling of autism in the modern medical world? Should we revisit this concept, if not to throw it away at least to reconsider how we define the spectrum and why we define it? Is it in the best interests of the child and/or their family to apply this concept?

How much did Asperger’s racial hygiene influence ideas about autism’s heritability?

Again, I want to stress that I know little about autism and it is not my goal here to dissect the details of this disease. However, from what I have seen of the autism advocacy movement, there does seem to be a strong desire to find some deep biological cause of the condition. I think parents want – rightly – to believe that it is not their fault that their child is autistic, and that the condition is not caused by environmental factors that might somehow be associated with their pre- or post-natal behaviors. Although the causes of autism are not clear, there seems to be a strong desire of some in the autism community to see it as biological or inherited. I think this is part of the reason that Andrew Wakefield’s scam linking autism to MMR vaccines remains successful despite his disbarment in the UK and exile to America. Parents want to think that they did not cause this condition, and blaming a pharmaceutical company is an easy alternative to this possibility. Heritability is another alternative explanation to behavioral or environmental causes. Asperger of course thought that autism was entirely inherited, blaming it – and its severity – on the child’s “constitution”, which was his phrase for their genetic inheritance. This is natural for a Nazi, of course – Nazis believe everything is inherited. Asperger also believed that sexual abuse was due to genetic causes (some children had a genetic property that led them to “seduce” adults!) Given Asperger’s influence on the definition of autism, I think it would be a good idea to assess how much his ideas also influence the idea that autism is inherited or biologically determined, and to question the extent to which this is just received knowledge from the original researcher. On a broader level, I wonder how many conditions identified during the war era and immediately afterwards were influenced by racial hygiene ideals, and how much the Nazi medical establishment left a taint on European medical research generally.

What lessons can we learn about public health practice from this case?

It seems pretty clear that some mistakes were made in the decision to assign Asperger’s name to this condition, given what we now know about his past. It also seems clear that Asperger was able to whitewash his reputation and bury his responsibilities for many years, including potentially avoiding being held accountable as an accessory to murder. How many other medical doctors, social scientists and public health workers from this time were also able to launder their history and reinvent themselves in the post-war era as good Germans who resisted the Nazis, rather than active accomplices of a murderous and cruel regime? What is the impact of their rehabilitation on the ethics and practice of medicine or public health in the post-war era? If someone was a Nazi, who believed that murdering the sick, disabled and certain races for the good of the race was a good thing, then when they launder their history there is no reason to think they actually laundered their beliefs as well. Instead they carried these beliefs into the post war era, and presumably quietly continued acting on them in the institutions they now occupied and corrupted. How much of European public health practice still bears the taint of these people? It’s worth bearing in mind that in the post war era many European countries continued to run a variety of programs that we now consider to have been rife with human rights abuse, in particular the way institutions for the mentally ill were run, the treatment of the Roma people (which often maintained racial-hygiene elements even decades after the war), treatment of “promiscuous” women and single mothers, and management of orphanages. How much of this is due to the ideas of people like Asperger, propagating slyly through the post-war public health institutional framework and carefully hidden from view by people like Asperger, who were assiduously purging past evidence of their criminal actions and building a public reputation for purity and good ethics? I hope that medical historians like Czech will in future investigate these questions.

This is not just a historical matter, either. I have colleagues and collaborators who work in countries experiencing various degrees of authoritarianism and/or racism – countries like China, Vietnam, Singapore, the USA – who are presumably vulnerable to the same kinds of institutional pressures at work in Nazi Germany. There have been cases, for example, of studies published from China that were likely done using organs harvested from prisoners. Presumably the authors of those studies thought this practice was okay? If China goes down a racial hygiene path, will public health workers who are currently doing good, solid work on improving the public health of the population start shifting their ideals towards murderous extermination? Again, this is not an academic question: After 9/11, the USA’s despicable regime of torture was developed by two psychologists, who presumably were well aware of the ethical standards their discipline is supposed to maintain, and just ignored them. The American Psychological Association had to amend its code in 2016 to include an explicit statement about avoiding harm, but I can’t find any evidence of any disciplinary proceedings by either the APA or the psychologists’ graduating universities to take action for the psychologists’ involvement in this shocking scheme. So it is not just in dictatorships that public policy pressure can lead to doctors taking on highly unethical standards. Medical, pscyhological and public health communities need to take much stronger action to make sure that our members aren’t allowed to give into their worst impulses when political and social pressure comes to bear on them.

These ideas are still with us

As a final point, I want to note that the ideas that motivated Asperger are not all dead, and the battle against the pernicious influence of racial hygiene was not won in 1945. Here is Asperger in 1952, talking about “feeblemindedness”:

Multiple studies, above all in Germany, have shown that these families procreate in numbers clearly above the average, especially in the cities. [They] live without inhibitions, and rely without scruples on public welfare to raise or help raise their children. It is clear that this fact presents a very serious eugenic problem, a solution to which is far off—all the more, since the eugenic policies of the recent past have turned out to be unacceptable from a human standpoint

And here is Charles Murray in 1994:

We are silent partly because we are as apprehensive as most other people about what might happen when a government decides to social-engineer who has babies and who doesn’t. We can imagine no recommendation for using the government to manipulate fertility that does not have dangers. But this highlights the problem: The United States already has policies that inadvertently social-engineer who has babies, and it is encouraging the wrong women. If the United States did as much to encourage high-IQ women to have babies as it now does to encourage low-IQ women, it would rightly be described as engaging in aggressive manipulation of fertility. The technically precise description of America’s fertility policy is that it subsidizes births among poor women, who are also disproportionately at the low end of the intelligence distribution. We urge generally that these policies, represented by the extensive network of cash and services for low-income women who have babies, be ended. [Emphasis in the Vox original]

There is an effort in Trump’s America to rehabilitate Murray’s reputation, long after his policy prescriptions were enacted during the 1990s. There isn’t any real difference between Murray in 1994, Murray’s defenders in 2018, or Asperger in 1952. We now know what the basis for Asperger’s beliefs were. Sixty years later they’re still there in polite society, almost getting to broadcast themselves through the opinion pages of a major centrist magazine. Racial hygiene didn’t die with the Nazis, and we need to redouble our efforts now to get this pernicious ideology out of public health, medicine, and public policy. I expect that in the next few months this will include some uncomfortable discussions about Asperger’s legacy, and I hope a reassessment of the entire definition of autism, Asperger’s syndrome and its management. But we should all be aware that in these troubled times, the ideals that motivated Asperger did not die with him, and our fields are still vulnerable to their evil influence.

 


fn1: Note that you consent to this study regardless of your actual views on its merits, whether it will cause harm to your child, etc. because this doctor is going to decide whether your child “rehabilitates” or slides out of view and into the T4 program where they will die of “pneumonia” within 6 months, and so you are going to do everything this doctor asks. This is not consent.