The UK General Election has just finished, with Labour losing badly to Boris Johnson’s Brexit-fetishizing Tories. The media are describing Labour’s loss as the worst since 1935, which is true if you look at seats lost but not at the vote – with 32.2% of the vote share it’s the best result for a losing Labour party since 1992, and although the swing against Labour was very large – 7.2% – this is partly because the previous share of the vote that Labour achieved, at 40%, was phenomenally high – Blair only beat 40% once, and many post-war Labour governments have ruled a majority with a much lower share of the vote than Corbyn achieved in 2017.

In the early post-election recriminations people are laying the fault entirely at the feet of Corbyn and the Labour manifesto, but I’m not convinced that a different leader or a less radical manifesto could have helped. The 2019 election was a historic election, similar to the 1945 election, with a huge decision about the future of the UK to be made and a major recent event hanging over the election. In the 1945 election the huge decision to be taken was the establishment of the modern welfare state, and the recent event was the war. In the 2019 election the decision is Brexit, and the EU referendum is the major event overshadowing the election.

This Brexit issue overshadowed the whole election, and in this blog post I will show that it had a huge impact on the Labour vote, which made this election almost impossible for the Labour party to lose. I will show this using a statistical analysis of 2019 election results.

Methods

I obtained 2017 election results and 2015 EU Referendum results from the UK Parliamentary Research Briefings website. I merged these data sets together using ONS ID (the unique number that identifies parliamentary constituencies) so that I had the percentage of each constituency that voted leave in 2016, and their 2017 election results, in one dataset. I then conducted a semi-random sample of the 2019 General Election results using the BBC Election results website. The sample was semi-random because there is no publicly available official dataset at this stage, so I had to enter them by hand looking at each constituency in turn on the BBC website[1]. I started by ordering my dataset by constituency name from A-Z and working sequentially through them on the assumption that I have limitless patience and 10 hours of my life to give to this, but gave up somewhere around “D” and took a random sample of another 100 or so constituencies. Because names are approximately random, this means I have 200 or so approximately random samples from the first stage, and another 100 or so genuinely random samples from the second stage. I may have had a hangover, but there are limits to how much time and effort I am willing to put into rescuing the UK Labour party from bad analysis!

I dropped Northern Ireland from my analysis because a) I don’t understand their political parties b) Sinn Fein’s decision not to enter parliament is weird and c) Northern Ireland should be part of Ireland, not the UK. I kept Scotland but excluded it from some of my figures (see specific figures for more details). I excluded the Speaker’s seat (which was Labour) from analyses of the Labour swing because there was no opponent so the swing was weird; I also excluded another Labour seat with a very high positive swing from these analyses, and dropped one Conservative seat (Buckingham) with the same problem.

Once I had done this I then calculated the swing against Labour, Lib Dems and Tories by subtracting their 2017 result from their 2019 result. I confirmed this works by comparing calculated Labour swing with actual Labour swing from the BBC website (which I entered as I went through my semi-random sampling). I obtained Brexit party vote shares from the BBC website, leaving this field blank if the Brexit party did not stand a candidate[2].

I then conducted several linear regressions of the swing:

  • A linear regression of conservative party swing as a function of leave vote in the EU Referendum
  • A linear regression of Labour party swing as a function of leave vote in the EU Referendum
  • A linear regression of Lib Dem swing as a function of leave vote in the EU Referendum
  • A linear regression of Labour party swing as a function of Brexit party vote

For all regressions I tested a quadratic term in leave vote, and I included a term for whether the constituency was in Scotland or Wales. I included a term for whether or not the constituency experienced a Brexit party challenge in the first three regressions, and tested an interaction with leave vote. I dropped any non-significant terms in order of their non-significance to get the best model. I also centered the EU referendum vote at its median (53.5% of people voting to leave), so that the constant term in all linear regressions measured the swing against the party in question in the median leave-voting seat.

I then obtained predicted values from all regressions to include in the plots of the swing against the leave vote or the Brexit party vote. Brexit party vote is effectively being used here as a proxy for Labour voters’ decision to abandon Labour over Brexit. I did not model the relationship between swing against Labour and Brexit vote because I think this swing is the Brexit party’s fault, but because I expect it represents the likelihood that Labour voters abandoned Labour over Brexit. One might suppose they abandoned Labour for Tory over general policy, or because they respect BoJo, but the only reason for abandoning to the Brexit party is Brexit, and so this acts a proxy for the possibility that they also jumped ship to the Tory party over Brexit. Because the Brexit party only stood candidates in Labour-held constituencies it is impossible to test what might have happened if the Brexit party stood against a Tory incumbent.

Results

I had data on 341 constituencies, just over half of all eligible constituencies. Among these 341 constituencies 146 (43%) had a Brexit party challenger. Of the 142 seats that were Labour held in 2017, 112 (79%) survived to be Labour-held in 2019. None of the 199 non Labour-held seats in my dataset switched to Labour in 2019. The mean swing against Labour in seats it held was 8.7%, and the mean swing against it in seats it did not hold was 7.5%.

Let us consider the relationship between the swing against Labour and the Brexit vote in seats where it was challenged by this party. Figure 1 shows the swing against Labour in England plotted against the proportion of the vote that the Brexit party won, with the predicted trend in the swing from my final regression model. The final regression model explained 54% of the variation in the swing against Labour, included a quadratic term for the Brexit vote, and included significant terms for Scotland (a 3.5% larger swing against Labour) and Wales (a 2.2% smaller swing against Labour). The intercept term in this model was -3.8, which indicates that in the absence of a Brexit party challenge these seats would have seen a mean swing against Labour of about 3.8% (95% confidence interval 2.6% to 5.0%). In this counter-factual[3], most of these seats would not have changed hands if there was no Brexit party challenge.

Figure 1: Russian Ratfuckery, in its most exquisite form

It is very clear from Figure 1 that the Brexit party had a massive impact on the Labour vote, pulling it down by a huge amount in the seats where they ran a candidate. The Brexit party did not win a single seat in this election, but they cost Labour a lot of seats. Once again, Farage had a huge impact on British politics without ever sitting in parliament. In some of the northern seats the Brexit party got a huge share of the vote, and it is very likely that almost all of it came from Labour. In the seats with a middling Brexit vote, between perhaps 5 and 15% of the total vote, Labour lost between 10 and 20% of the vote share. I think this is a strong indicator that Labour was bleeding votes due to Brexit.

We can confirm this by examining the relationship between the swing against Labour and the proportion of the electorate who voted for leave in the 2016 EU referendum. Figure 2 shows the swing against Labour in England and Wales plotted against the leave vote, separately for constituencies with a Brexit party challenger (red) and those without a Brexit party challenger (blue). Most seats with a Brexit party challenger were Labour seats, while those without a challenger were mostly Tory. The blue and red lines show the predicted swing against Labour from my linear regression model, which explains 39% of the variation in the swing against Labour. This model had a term for Scotland (which had a 2.4% larger swing against Labour), a quadratic term for the leave vote, and an additional effect on the swing due to the leave vote in areas with a Brexit challenger. In the median 2016 EU referendum leave-voting constituency, the swing against Labour was 6.5%, and this was 2.1% higher in seats with a Brexit challenger.

Figure 2: Relationship between the swing against Labour and the leave vote

It is clear from Figure 2 that the swing against Labour was smaller in seats with a Brexit party challenger that voted to remain in the EU. In seats not held by Labour, the swing against Labour was larger in seats that were either strong leave-voting seats or strong remain seats. In these seats – the seats Labour had to win to win government – Labour was being squeezed in both strongly remain and strongly leave seats. In the seats Labour already held (mostly with a Brexit party challenger), or that it did not hold but faced a Brexit party challenger and a Lib Dem or SNP incumbent, the party faced intense pressure due to brexit. In seats it held that had a large leave vote Labour was completely smashed. These seats are mostly in the famed “red wall”, the northern seats that Labour has always been able to rely on. Note the largest positive swing to Labour occurred at about the median leave vote, between 45 and 55.

An interesting phenomenon in this election is the failure of the conservatives to gain a large swing from Labour. The national swing against Labour was 7.9%, but the conservatives only gained a 1.2% swing. The primary beneficiaries of that swing were the Lib Dems and the Brexit party. Of course, these national figures hide major variations within constituencies, which are easy to see if we look at the swing to the Tories at the constituency level. Figure 3 shows the swing to/against the conservatives in England, plotted against the leave vote in the 2016 EU referendum. Red points are points where there is a Brexit party challenger, and blue points are those without a Brexit party challenger (mostly Tory-held seats) At the median leave vote my model estimated a swing to the Tories of 1.3%, with a further swing to them of 1.4% in Wales. This model included a quadratic term in the leave vote, and explained 57% of the variance in the Tory swing.

Figure 3: Observed and predicted swing to the conservative party, by EU referendum leave vote

It is noteworthy that in the seats that voted to remain the Tories experienced a swing against them of as much as 10%, but in the strong leave-voting seats they experienced a huge swing to them. This swing was larger in seats without a Brexit party challenger, presumably because there was no Brexit party to absorb the leave sentiment, but even in pro-leave constituencies with a Brexit party challenger the Tories gained a very large swing. Note, however, that in some pro-leave seats there was a swing against the Tories where there was a Brexit party challenger. These were Labour seats that saw all their pro-leave vote go to the Brexit party. But in pro-leave seats with no Brexit party challenger – the seats that Labour needed to win to form government – there was a consistent large swing to the Tories. We again see here the value of Farage’s decision to stand candidates only in Labour seats.

Finally let us consider the role of the Liberal Democrats, the greatest frauds in modern politics, in destroying the UK. Figure 4 shows the swing to or against the Lib Dems in England, plotted against the 2016 EU referendum leave vote, with the predicted swing from my regression model. Again, red points are for seats with a Brexit party challenger (Labour- or Lib Dem-held seats) and blue points are for seats without a Brexit party challenger (mostly Tory-held seats). This model has no quadratic term for the leave vote: in non-Brexit party seats every percentage point increase in the leave vote was associated with a swing against Lib Dems of 0.2%, but in seats with a Brexit party challenger this swing was only 0.1%. At the median leave vote the Lib Dems experienced a swing towards them of 6.0%, reducing to 2.8% in seats without a Brexit party challenger. Scotland and Wales saw large reductions in this swing to the Lib Dems, of 5.3% in Scotland and 2.1% in Wales. Basically, the Lib Dems performed best in Labour seats in England that voted to remain in 2016. In these seats the swing against the Labour party was often almost entirely towards the Lib Dems. This model explained only 29% of the variance in the swing, probably because the Lib Dems win by very local-specific campaigns, not so strongly affected by national factors.

Figure 4: Look at these arseholes spoiling the Labour vote

Note that in some Tory-held remain seats (the blue dots to the left of the figure) the Lib Dems had huge swings to them, but in many seats they did not win. A good example of this is Cities of London and Westminster, which was Tory before this election and did not have a Brexit Party challenger. The Lib Dems fielded Chuka Umunna, a class traitor who abandoned Labour to join TIG, then jumped ship from them to join the Lib Dems, natural home of fickle and untrustworthy people. He won 30.7% of the vote, scoring a swing to the Lib Dems of 19.6%. This enabled the Tories to hold this seat with just 39.9% of the vote, against Labour’s 27.2%. Had he not stood, it is possible that a large proportion of that vote might have gone to Labour. In the seat he used to represent for Labour, Streatham, Labour held on despite a surge of 17.0% in the Lib Dem vote (this seat is not in my data set so you can’t find it in Figure 4). Cities of London and Westminster voted 28.1% leave in the EU referendum, making it one of the least leave-voting seats in the country; Streatham voted 20.5% leave, making it the second least Brexity in the country. Thanks to Chuka’s “efforts”, the citizens of both these seats will now have to leave the EU.

What it all means

These figures and the associated regression models should make very clear that Labour was screwed by Brexit. The Tories scored huge swings in pro-leave seats, which shored up their vote in seats that Labour had to win and forced Labour to defend seats it could normally rely on. Worse still, Farage’s decision to stand Brexit party candidates only in Labour seats meant that Labour lost large numbers of voters to this no-good Russian con-job, while also facing defection to the Tories. At the remain end the Lib Dems were stealing their votes, so they were bleeding votes at both ends of the leave spectrum. The only way they could have averted this problem would have been to go to the election with a full-throated Brexit strategy – a Lexit manifesto – which would have shored up the red wall and ensured they didn’t lose many of those seats. However, even if this had been successful in the North, it would have cost seats in the cities, where the Lib Dems would have stolen many seats. This is worse than useless, since we know from experience that if they have the choice the Lib Dems will betray the country to the Tories, and will never form a government with Labour.

I don’t think a Lexit strategy would even have been that successful. Just as when Labour goes full racist, the people they’re trying to win back just don’t believe it, and vote for the Tories anyway. Had Corbyn gone to the election with a full-throated Lexit manifesto a lot of the people he was trying to convince would have assumed he was lying, and he would have lost the northern voters anyway, at the cost of the cities and the youth vote. Jo Swinson truly could have become PM!

Given this squeeze I think Corbyn made a sensible decision to run on a big left-wing manifesto and try to make the election about something other than Brexit. This was especially important given the Labour position on Brexit was consistently misrepresented by the media. I saw multiple media figures on Twitter claiming Labour did not support free movement (they did; it was in the manifesto) and saying their position on Brexit was “too complicated” (it wasn’t: they were going to negotiate a good deal and put it to a referendum). Given this their best bet was to try and turn the debate to one on honesty, the NHS, poverty and inequality. I think this is wise messaging and important: the UK is heading into the abyss, and at some point the Labour party is going to have to save the UK from ruin, so why not make this point at a time when you can’t win the Brexit debate?

I think it’s also important to consider what would happen if the party had made a choice to go full Remain or full Lexit. In the former case they’re abandoning their northern seats, telling them that they don’t care about their concerns and won’t listen to their democratic voice. In the latter case they’re abandoning young people, who are much more likely to be Labour supporters, and telling them they will destroy their future. Given that the future is all that young people in the UK have, this is political suicide. The only way to square this circle is to present a policy that offers hope to both these core groups. Labour is the party of the urban poor, industrial labour and young people, but when these three constituencies have radically different demands on the overwhelming issue of the time it’s impossible for Labour to win.

If Labour failed in this election I think it was in failing to convince the electorate of the value of their Brexit policy. But given they weren’t able to express it without the media mangling it and misrepresenting it, and given how dishonest and vicious the campaign was, I don’t see how any other leader could have done better. Even if you credit the notion that Corbyn is hugely unpopular, and assume some part of the swing was hatred of this genuinely decent guy, it makes no difference: the figures I’ve shown here make clear that Labour were fucked no matter who their leader was and what their manifesto was. This was a Brexit election, and the Tories are obviously the party of Brexit.

Three years after he exploded the Brexit bomb, and 30 years after he face-fucked a dead pig so he could win Johnson’s approval, David Cameron has achieved what he originally intended: the destruction of the Labour party by unleashing a racist monster in the UK. History will not judge any of these awful men well.

Where to now for the Labour party?

I think the Labour party should keep Corbyn and keep his manifesto. They aren’t going to win with another Blairist monstrosity – Ed Milliband tried that in 2015 and was sunk by a viciously anti-semitic media campaign that portrayed him as a Jewish communist with dual loyalties[4] who can’t eat a bacon sandwich. By the time the next election comes around the world is going to be desperate, trapped in the throes of global warming and looking for new ways out. Why throw away what the country needs? This election Labour’s manifesto was the best and most inspiring left-wing project in the UK for 30 years, and it was right. Jeremy Corbyn is right – he won the arguments. He just couldn’t beat Brexit.

I have seen rumours that some on the Labour right were cheering when MPs lost their seats. I have seen in the media and on Twitter Corbyn’s old enemies in the Labour party gloating over the Tory victory, laughing at the Labour movement’s disappointment and salivating at their chance to retake control of the party. Perhaps they envisage another illegal war, where they can kill another million muslims? Or perhaps they look forward to palling around with rich non-doms, being “intensely relaxed about people being filthy rich”. Oh, the larks! These people are not part of the labour movement. They’re scabs, and their obvious joy at this defeat is disgusting. They need to leave the movement, and leave it to those British people who actually want to save the country from ruin. During this period of reflection, we should be clear: it was Brexit that defeated Labour at this election, and the direction it was headed under Corbyn is the only future for Britain other than ruin. So these scabs need to get out of the party and leave it for people who actually care about the future of the UK and the future of the world.

Once Brexit is past, and these class traitors are out of the labour movement, we can hold the Tories responsible for what they have done. We couldn’t beat Brexit, but we can hold its architects responsible for the great evil they have perpetrated on ordinary British people.


fn1: It’s okay, I had a hangover and nothing better to do on Saturday

fn2: Or “chump”, to use the preferred terminology for these sad-sacks

fn3: Which is bullshit

fn4: Oh the irony …

Two days ago I wrote a scathing review of Star Wars: The Last Jedi, and since then I have been digging around for others’ views on the matter. The Guardian has an article giving some fans’ reviews, and the below the line comments are suitably critical of this awful movie. Meanwhile Vox has a pathetic, self-serving article by a film critic attempting to explain why so many people have such different views to the critics. This article includes such great insights as “critics don’t really care about plot” which is dismissed as a “nitty gritty detail” of a movie – they’re more interested in themes and emotional struggles, apparently, which suggests they’d be more at home at a My Chemical Romance gig than a decent movie. How did they get the job?

In amongst the complaints on the Guardian‘s article, and at the centre of the Vox piece, is a particularly vicious little dismissive claim: That a lot of the negative reaction to the movie arises from long term fans[1], who cannot handle what Rian Johnson did with their cherished childhood movie, and are unrepresentative of the broader movie-going public. In the more vernacular form of some of the BTL comments on the Guardian article, fanboys are pissed off because Rian Johnson didn’t make the movie exactly the way they wanted. This, apparently, explains the difference between the critics’ view of the movie and the people giving a review on the Rotten Tomatoes website.

I thought this sounded fishy, so I decided to collect a little bit of data from the Rotten Tomatoes website and have a look at just how far fanboys typically deviate from critics. I figured that if fanboys’ disappointment with not getting a movie exactly as they wanted it was the driver of negative reactions to this movie, we should see it in other Star Wars movies. We should also see it in other movies with a strong fanboy following, and maybe we wouldn’t see it in movies that don’t have strong preconceptions. I collected data on critics’ and fans’ aggregated review statistics for 35 movies from the Rotten Tomatoes website. For each movie I calculated a score, which I call the Odds Ratio of Critical Acceptance (ORCA). This is calculated as follows:

1. Calculate an odds for the critics’ aggregate score, O1, which is (score)/(1-score)

2. Calculate an odds for the viewers’ aggregate score, O2, which is (score)/(1-score)

3. Calculate their ratio, ORCA=O1/O2

I use this score because it accounts for the inherent limits on the value of a critical score. The Last Jedi got a critics’ score of 0.93, which is very close to the upper limit of 1. If the viewers’ score was, for example, 0.83, it is 0.1 lower than the critics’ score. But this 0.1 is a much larger gap than, say, the difference between a critics’ score of 0.55 and a viewers’ score of 0.45. Similarly, if critics give a movie a value of 0.1 and viewers a value of 0.2, this means viewers thought it was twice as good – whereas values of 0.45 and 0.55 are much less different. We use this kind of odds ratio in epidemiology a lot because it allows us to properly account for small differences when one score is close to 1, as (inexplicably) it is for this horrible movie. Note that ORCA scores above 1 indicate that the critics gave the movie a higher score than the viewers, and scores below 1 indicate that the viewers liked the movie more than the critics.

I collected scores for all the Star Wars movies, all three Lord of the Rings movies, both Ghost in the Shell movies (the Japanese and the western remake), both Blade Runners, Alien:Covenant, two Harry Potter movies, Fifty Shades of Grey, and Gedo Senki (the (filthy) Studio Ghibli version of A Wizard of Earthsea), as examples of movies with a fanboy following. As readers of my blog are no doubt very aware, the Lord of the Rings fanboys are absolutely filthy, and if anyone is going to sink a movie over trivial shit they will. Ghost in the Shell is a remake of a movie with a very strong otaku following of the worst kind, and also suffers from a huge controversy over whitewashing, and Gedo Senki is based on one of the world’s most popular books, by a woman who has an intense generation-spanning cadre of fans who are obssessed with her work. Harry Potter fans are also notoriously committed. I also gathered a bunch of movies that I like or that I thought would be representative of the kinds of movies that did not have a following before they were released: Mad Max Fury Road, Brokeback Mountain, that new movie about a transgender bull[3], Ferdinand, things like that. I figured that some of these movies would not get a big divergence in ORCA if the fanboy theory is true.

Figure 1: ORCA Scores for a range of movies, none apparently as shit as The Last Jedi.

Results of my calculations are shown in Figure 1 (sorry about the fiddly size). The Last Jedi is on the far left, and is obviously a massive outlier, with an ORCA score of 10.9. This score arises because it has a critics’ score of 93%, but a score from fans of 55%[4]. Next is Mad Max: Fury Road, which was not as successful with fans as with critics but still got a rating of 0.85 from fans. It can be noted that several Star Wars movies lie to the right of the pale blue dividing line, indicating that fans liked them more than did critics – this includes Rogue One and The Phantom Menace, showing that this phenomenon was not limited to the first generation movies. Note that Fellowship of the Ring, the LoTR movie most likely to disappoint fanboys under the theory that fanboys want the director to make the movie in their heads, had an ORCA value of 0.53, indicating fans had twice the odds of liking it than did critics. Gedo Senki also did better with fans than critics despite being a terrible movie that completely pisses all over Ursula Le Guin’s original book.

There’s no evidence at all from this data that fanboys respond badly to movies based on not getting the movie in their head, and there’s no evidence that Star Wars fanboys are particularly difficult to please. The ORCA score for The Last Jedi is at least 12 parsecs removed from the ORCA score for the next worse movie in the series, which (despite that movie also being a pile of shit) is not that high – it’s lower than Dunkirk, in fact, which was an awesome movie with no pre-existing fanbase[5]. Based on this data it should be pretty clear that either the “toxic fandom” of Star Wars has been hiding for the past 10 years as repeated bad movies were made – or this movie is uniquely bad, and the critics were uniquely stupid to give it a good score.

I’m going with the latter conclusion, and I want the movie critics to seriously re-evaluate how they approached this movie. Star Wars clearly gets a special pass from critics because it’s so special, and Star Wars directors can lay any stinking turd on the screen and get a pass from critics for some incomprehensible reason. Up your game, idiots.

A few minor side points about critical reviews of The Last Jedi

I’ve been generally shocked by the way in which this movie is being hailed as a critical masterpiece. I really can’t see how this can be. Even if it’s not as bad as I think, I can’t understand how it can get similar scores to movies like Dunkirk, Mad Max: Fury Road, or Titanic. Those movies are infinitely better crafted than this pile of junk, with tight and carefully designed plots that clearly hold together under extensive criticism. There is nothing extraneous at all in Titanic or Dunkirk, not one moment that you could say isn’t directly relevant to the unfolding story, and the acting in all three of these movies is exemplary. Worse still, the Guardian is now claiming that Star Wars is the most triumphantly feminist movie yet. This is utter bullshit on its face: The main male character, Po Dameron, repeatedly undermines female leaders, and their attempts to discipline him are ignored, ultimately leading to the death of probably 200 people in a completely avoidable catastrophe, and he suffers no consequences for his dishonesty and treachery. Furthermore, he takes over the main role from Finn, the black character, and Rei is sidelined into a supplicant to an aging white man. As a moral story for entitled white men who can’t bear to be told what to do by women it’s exemplary. But this is even more horrific when you consider that Mad Max: Fury Road is a savage eco-feminist masterpiece, and undoubtedly the most triumphantly feminist movie ever made. This is another example of the weird special pass that Star Wars movies get: they make piss poor tokenistic gestures towards diversity and the critics are claiming they’re the most woke movie ever made.

There’s a strange irony in this. Star Wars fanboys are being blamed for obstinately marking this movie down on the basis of silly stereotypes about nerds, when in fact it’s the critics themselves who are acting like Star Wars sycophants, giving one of the worst movies of the millenium sterling marks for trying. Unless of course the conspiracy theories are true, and they’re all paid by Disney.

I won’t be so cynical. They’re just stupid and wrong, and in future I recommend not listening to reviewers before going to see any movie. Trust the viewers, they have much better judgment!

UPDATE: I have swapped my shoddy figure with a figure supplied by reader frankelavsky, who apparently actually knows how to do visual stuff, so it’s now much easier to see how terribly wrong the reviewers were.


fn1: Which, inexplicably, the Vox article seems to view as Baby Boomers, which is weird since most people want to now pretend Star Wars is a kid’s movie (it’s not[2]). Many of the fans saw it as kids, it’s true, but that’s because we were Gen X, not baby boomers. More importantly, Star Wars fandom crosses three generations, and includes a lot of Generation Y. It’s just dumb to even hint that the themes in the movie pissed off the fans because baby boomers don’t like the idea of handing on the baton to a new, more diverse generation. Star Wars fans aren’t baby boomers, and why would baby boomers have a problem with this anyway?

fn2: How fucking stupid is modern pop cultural analysis of pop culture, and how far has it fallen, that people could say this?

fn3: This is a joke. See here for more details.

fn4: It was 56% yesterday. This movie is sinking by the day.

fn5: Barring UKIP, I guess

Recently I wrote a post criticizing an article at the National Bureau of Economic Research that found a legal ivory sale in 2008 led to an increase in elephant poaching. This paper used a measure of illegal poaching called commonly the PIKE, which is measured through counting carcasses discovered in searches of designated areas. This PIKE measures the proportion of deaths that are due to illegal poaching. I argued that the paper was wrong because the PIKE has a lot of zero or potentially missing values, and is bounded between 0 to 1 but the authors used ordinary least squares (OLS) regression to model this bounded number, which creates huge problems. As a result, I wasn’t convinced by their findings.

Since then I have read a few reports related to the problem. I downloaded the data and decided to have a look at an alternative, simple way of estimating the PIKE in which the estimate of the PIKE emerges naturally from a standard Poisson regression model of mortality. This analysis better handles the large number of zeros in the data, and also the non-linear nature of death counts. I estimated a very simple model, but I think a stronger model can be developed using all the standard properties of a well-designed Poisson regression model, rather than trying to massage a model misspecification with the wrong distributional assumptions to try and get the correct answer. This is an outline of my methods and results from the most basic model.

Methods

I obtained the data of legal and illegal elephant carcasses from the Monitoring Illegal Killing of Elephants (MIKE) website. This data indicates the region, country code, site number, and number of natural deaths and illegally killed elephants at each site in each year from 2002 – 2015. I used the full data set, although for comparing with the NBER paper I also used the 2003-2013 data only. I didn’t exclude any very large death numbers as the NBER authors did.

The data set contains a column for total deaths and a column for illegal deaths. I transformed the data into a data set with two records for each site-year, so that one row was illegal killings and one was total deaths. I then defined a covariate, we will call it X, that was 0 for illegal killings and 1 for total killings. We can then define a Poisson model as follows, where Y is the count of deaths in each row of the data set (each observation):

Y~Poisson(lambda)                               (1)

ln(lambda)=alpha+beta*X                   (2)

Here I have dropped various subscripts because I’m really bad at writing maths in wordpress, but there should be an i for each observation, obviously.

This simple model can appear confounding to people who’ve been raised on only OLS regression. It has two lines and no residuals, and it also introduces the quantity lambda and the natural logarithm, which in this case we call a link function. The twiddle in the first line indicates that Y is Poisson distributed. If you check the Poisson distribution entry in Wikipedia you will see that the parameter lambda gives the average of the distribution. In this case it measures the average number of deaths in the i-th row. When we solve the model, we use a method called maximum likelihood estimation to obtain the relationship between that average number of deaths and the parameters in the second line – but taking the natural log of that average first. The Poisson distribution is able to handle values of Y (observed deaths) that are 0, even though this average parameter lambda is not 0 – and the average parameter cannot be 0 for a Poisson distribution, which means the linear equation in the second line is always defined. If you look at the example probability distributions for the Poisson distribution at Wikipedia, you will note that for small values of lambda values of 0 and 1 are very common. The Poisson distribution is designed to handle data that gives large numbers of zeros, and carefully recalibrates the estimates of lambda to suit those zeros.

It’s not how Obama would do it, but it’s the correct method.

The value X has only two values in the basic model shown above: 0 for illegal deaths, and 1 for total deaths. When X=0 equation (2) becomes

ln(lambda)=alpha              (3)

and when X=1 it becomes

ln(lambda)=alpha+beta    (4)

But when X=0 the deaths are illegal; let’s denote these by lambda_ill. When X=1 we are looking at total deaths, which we denote lambda_tot. Then for X=1 we have

ln(lambda_tot)=alpha+beta

ln(lambda_tot)-alpha=beta

ln(lambda_tot)-ln(lambda_ill)=beta     (5)

because from equation (3) we have alpha=ln(lambda_ill). Note that the average PIKE can be defined as

PIKE=lambda_ill/lambda_tot

Then, rearranging equation (5) slightly we have

ln(lambda_tot/lambda_ill)=beta

-ln(lambda_ill/lambda_tot)=beta

ln(PIKE)=-beta

So once we have used maximum likelihood estimation to solve for the value of beta, we can obtain

PIKE=exp(-beta)

as our estimate of the average PIKE, with confidence intervals.

Note that this estimate of PIKE wasn’t obtained directly by modeling it as data – it emerged organically through estimation of the average mortality counts. This method of estimating average mortality rates is absolutely 100% statistically robust, and so too is our estimate of PIKE, as it is simply an interpretation of a mathematical expression derived from a statistically robust estimation method.

We can expand this model to enable various additional estimations. We can add a time term to equation (2), but most importantly we can add a term for the ivory sale and its interaction with poaching type, which enables us to get an estimate of the effect of the sale on the average number of illegal deaths and an estimate of the effect of the sale on the average value of the PIKE. This is the model I fitted, including a time term. To estimate the basic values of the PIKE in each year I fitted a model with dummy variables for each year, the illegal/legal killing term (X) and no other terms. All models were fitted in Stata/MP 14.

Results

Figure 1 shows the average PIKE estimated from the mortality data with a simple model using dummy variables for each year and illegal/total killing term only.

Figure 1: Estimated Annual PIKE values

Figure 1: Estimated Annual PIKE values

The average PIKE before the sale was 0.43 with 95% Confidence Interval (CI) 0.40 to 0.44. Average PIKE after increased by 26%, to 0.54 (95% CI: 0.52 to 056). This isn’t a very large increase, and it should be noted that the general theory is that poaching is unsustainable when the PIKE value exceeds 0.5. The nature of this small increase can be seen in Figure 2, which plots average total and illegal carcasses over time.

Figure 2: Trend in illegal killings and total deaths

Figure 2: Trend in illegal killings and total deaths

Although numbers of illegal carcasses increased over time, so did the number of legal carcasses. After adjusting for the time trend, illegal killings appear to have increased by a factor of 2.05 (95% CI 1.95 – 2.16) after the sale, and total kills by 1.58 (95% CI 1.53 – 1.64). Note that there appears to have been a big temporary spike in 2011-2012, but my results don’t change much if the data is analyzed from only 2003-2013.

Conclusion

Using a properly-specified model to estimate the PIKE as a side-effect of a basic model of average carcass counts, I found only a 25% increase in the PIKE despite very large increases in the number of illegal carcasses observed. This increase doesn’t appear to be just the effect of time, since after adjusting for time the post-2008 step is still significant, but there are a couple of possible explanations for it, including the drought effect mentioned by the authors of the NBER paper; some kind of relationship between illegal killing and overall mortality (for example because the oldest elephants are killed and this affects survival of the whole tribe); or an increase in monitoring activity. The baseline CITES report identified a relationship between the number of person-hours devoted to searching and the number of carcasses found, and the PIKE is significantly higher in areas with easy monitoring access. It’s also possible that it is related to elephant density.

I didn’t use exactly the same data as the NBER team for this model, so it may not be directly comparable – I built this model as an example of how to properly estimate the PIKE, not to get precise estimates of the PIKE. This model still has many flaws, and a better model would use the same structure but with the following adjustments:

  • Random effects for site
  • A random effect for country
  • Incorporation of elephant populations, so that we are comparing rates rather than average carcasses. Population numbers are available in the baseline CITES report but I’m not sure if they’re still available
  • Adjustment for age. Apparently baby elephant carcasses are hard to find but poachers only kill adults. Analyzing within age groups or adjusting for age might help to identify where the highest kill rates are and their possible effect on population models
  • Adjustment for search effort and some other site-specific data (accessibility and size, perhaps)
  • Addition of rhino data to enable a difference-in-difference analysis comparing with a population that wasn’t subject to the sale

These adjustments to the model would allow adjustment for the degree of searching and intervention involved in each country, which probably affect the ability to identify carcasses. It would also enable adjustment for possible contemporaneous changes in for example search effort. I don’t know how much of this data is available, but someone should use this model to estimate PIKE and changes in mortality rates using whatever is available.

Without the correct models, estimates of the PIKE and the associated implications are impossible. OLS models may be good enough for Obama, but they aren’t going to help any elephants. Through a properly constructed and adjusted Poisson regression model, we can get statistically robust estimates of the PIKE and trends in the PIKE, and we can better understand the real effect of interventions to control poaching.

Today the Guardian reported on a new study that claims a large sale of legal ivory in 2008 actually led to an increase in illegal elephant poaching. Basically in 2008 China and Japan were allowed to pay for a large stockpile of legally-obtained ivory, in the hopes that this would crash the market and drive ivory traders out of business. Instead, the study claims, the sale led to a big increase in poaching – approximately a 66% increase in elephants killed, according to the study. This is interesting because it appears to put a big dent in a common libertarian idea for preserving endangered species – that allowing a regulated trade in them would lead to their preservation. It is also one of those cute findings that puts a hole in the standard just-so story of “Economics 101” that everything is driven by supply and demand. We all know that in reality there are many factors which moderate the effect of supply and demand on crucial markets, and on the surface this study appears to suggest a quite contradictory supply and demand relationship in illegal poaching markets, in which increasing supply boosts poaching. But is it true?

The Guardian report links to the original study, which is held at the National Bureau of Economic Research behind a paywall, but which I managed to get a copy of through my work. I thought I would check the statistical methods and see if the study really did support this conclusion. My judgment is that this study is quite poor, and that the data doesn’t support that conclusion at all, due primarily to three causes:

  • A poor choice of measure for illegal poaching that doesn’t clearly measure illegal poaching
  • The wrong choice of statistical method to analyze this measure
  • The wrong experimental design

I will go through each of these reasons in turn. Where equations are needed, I have used screenshots from the original paper because I’m terrible at writing equations in html. Let’s get started.

The PIKE is a terrible measure of illegal poaching

The study is based around analysis of a data set of “legal” and “illegal” carcasses observed at search sites in 40 countries. Basically a “legal” carcass is an elephant that died on its own, while an illegal one is one that was shot and looted. Apparently poachers don’t bother to clean up the corpse, they just cut off the ivory and run, so it’s easy to see when an elephant has been poached. However, because no one knows the full details of elephant populations, the authors study an outcome variable called the PIKE, which is defined as the ratio of illegal carcasses to total carcasses. In their words (screenshot):

PIKE equation

They say that this enables them to remove the unknown population from the outcome by “normalizing” it out in top and bottom of the ratio. They justify this with a little proof that I am not convinced by, since the proof assumes that probability of discovering carcasses is independent of the number of carcasses, and that legal mortality and illegal mortality are not related in any way. But even if it factors out population, this PIKE measure doesn’t tell you anything about illegal poaching. Consider the following hypothetical scenario, for example:

Imagine a population of elephants in which all the older elephants have been killed by poachers, so only the pre-adult elephants remain. Every time an elephant becomes mature enough to have decent tusks a poacher kills it and the corpse is found. Further, suppose that the population is not subject to predation or other causes of legal mortality – it is young, and the environment is in good shape so there are large stocks of easier prey animals for lions to target. This population is at high risk of collapse due to adults being killed as they mature; indeed, let’s suppose no babies are born because adults are poached as soon as they reach sexual maturity. Thus every time an elephant is killed, the population drops by one towards its inevitable crash.

In this case, at every time point the PIKE would be 1, because there are no legal carcasses. The PIKE will remain 1 until there are no elephants left to die, at which point it will jump to infinity. It doesn’t tell us anything about the impending population collapse.

Consider now a situation where there are a great many more legal deaths than illegal deaths. Denoting illegal carcasses by y and legal carcasses by x, we have y/(y+x) where y<<x. In this case we can approximate the PIKE by y/x, and if e.g. the number of illegal carcasses suddenly doubles we will see an approximate doubling in the PIKE. But suppose y is approximately the same as x. Then we have that the PIKE is approximately 1/2. Now suppose that the number of illegal carcasses doubles; then the PIKE increases to 2/3, i.e. it nowhere near doubles. If the number of illegal carcasses again doubles, it increases to 4/5. But if all deaths drop to 0 it then increases to infinity … So the magnitude of the increase in PIKE is not a direct reflection of the size of the change in poaching, and in at least one case even the direction is not meaningful. That is not a well-designed measure of poaching. It is also scale free, which in this case is a bad thing because it means we cannot tell whether a value of 1 indicates a single illegal carcass or 10 illegal carcasses. Similarly we don’t know if a value of 1/2 corresponds to 1 or a million illegal carcasses; only that however many there are, they are half of the total.

The authors say that this variable is constrained between 0 and 1, but this is not strictly true; it actually has an additional non-zero probability mass at infinity. This strange distribution of the variable has implications for model choice, which leads us to the second problem with their data.

All the models in this study were poorly chosen

The authors choose to model the PIKE using an ordinary least squares (OLS) model with fixed effects for country and a (separate) fixed effect for each year. An OLS model is only valid if the residuals of the model are normally distributed, which is a very strong assumption to make about a variable that has lots of values of 0 or 1. The authors claim their residuals are normally distributed, but only by pooling them across years – when you look at residuals within individual years you can see that many years have much more normally distributed residuals. They also don’t show us the crucial plot of residuals against predicted values, which is where you get a real idea of whether the residuals are well-behaved.

An additional consequence of using an OLS model is that it is possible to predict values of the PIKE that are unphysical – values bigger than 1 or less than 0 – and indeed the authors report this in 5.6% of their data points. This is indicative of another problem – the PIKE shows a non-linear response to increased illegal kills (see my example from 1/2 to 2/3 to 4/5 above), so that for a fixed number of legal kills each additional illegal kill has a diminishing effect on the value of PIKE, but a linear OLS model assumes that the PIKE changes by a uniform amount across its range. Given that the goal here is to identify increases in the PIKE over time, this runs the risk of the model over- or under-estimating the true effect of the 2008 ivory sale, because it is not properly modeling the response of the PIKE score.

The authors try to test this by fitting a new model that regresses ln(illegal carcasses+1) against a function that includes ln(legal carcasses+1) like so:

PIKE alternative model

This introduces a new set of problems. The “+1” has been added to both variables here because there are many zero-valued observations, and ln(0) doesn’t exist. But if there are lots of zero-valued observations, adding one to them is introducing a big bias – it’s effectively saying there was an illegal carcass where previously there wasn’t one. This distorts low numbers and changes the patterns in the data. The authors claim, furthermore, that “The coefficient on legal carcasses φ will be equal to unity if the ratio of illegal carcasses to legal carcasses is fixed”, but this is both nonsensical and obscures the fact that this model is no longer testing PIKE. It’s nonsensical because that is not how we interpret φ. If φ=1, then we can rewrite their equation (8) so that the left hand side becomes the natural logarithm of (illegal carcasses+1)/(legal carcasses+1). Then we are fitting a linear model of a new variable that is not the PIKE. We are not, however, assuming the ratio of illegal carcasses to legal carcasses is fixed. If φ is not 1, we are modeling the natural logarithm of (illegal carcasses+1)/(legal carcasses+1)^φ. The ratio here is still fixed, but the denominator has been raised to the power φ. What does “fixed” even mean in such a context, and why would we want to model this particular strange construction?

The authors do, finally, propose one sensible model, which is similar to equation (8) (they say) but uses a Poisson distribution for the illegal carcasses, and still fits the same right hand side. This is better but it still distorts the relationship between illegal and legal carcasses by adding a 1 to all the legal (but not the illegal) carcasses. It also doesn’t properly account for elephant populations, which is really what the legal carcasses serve as a proxy for. There is a much better way to use the legal carcass data and this is not it.

Finally there are two other big problems with the model: It uses fixed rather than random effects for country and site, which reduces its power, and also it doesn’t include any covariates. The authors instead chose to model these covariates separately and look for similar spikes in specific possible predictors of ivory usage, such as Chinese affluence. The problem with this is that you might not see a strong spike in any single covariate, but multiple covariates could move together at the same time to cause a jump in poaching. It’s better to include them in the model and report adjusted poaching numbers.

The wrong experimental design

An expert cited in the original article noted this interesting fact:

The Cites spokesman also noted that there had never been a one-off sale of rhino horn: “However, the spike in the number of rhinos poached for horn largely mirrors what has been seen with ivory. The illegal killing of rhino for its horn in South Africa alone increased from 13 in 2007 to close to 1,200 last year.”

This suggests that there has been an upsurge in illegal poaching across Africa that is independent of the ivory sale, and could reflect changing economic conditions in Africa (though it could also reflect different markets for ivory and rhino horn). It’s possible to test this using a difference-in-difference approach, in which rhino poaching data is also modeled, but is treated as not having been exposed to an intervention. The correct model specification then enables the analyst to use the rhino data to estimate a general cross-species increase in poaching; the elephant data identifies an additional, elephant-specific increase that could be said to be due to the ivory sale. The authors chose not to do this, which means that they haven’t rigorously ruled out a common change in poaching practice across Africa. If the CITES spokesman’s point is correct, then I think it likely that we would conclude the opposite to what this study found: that compared to rhinos, elephant poaching did not increase nearly as much, and in fact the ivory sale protected them from the kind of increased poaching observed with rhinos.

Indeed, it’s possible that there were poachers flooding into the market at around that time for other reasons (probably connected to development and increasing demand in Asia), but after the ivory sale most of them switched to killing rhinos. That would suggest the sale was successful, provided you aren’t judging that success from the standpoint of a rhino.

A better model: Bayesian population estimation followed by Poisson regression

It’s possible to build a better model using this data, by putting the legal carcass data to proper use and then using a correctly-specified Poisson regression model on the illegal carcass data. To see how different the results might then look, consider Figure 1, taken from the Appendix of the paper, which shows the actual numbers of illegal carcasses in each year.

Figure 1

Figure 1: Distribution of illegal elephant kills, 2002 – 2013 (year is above its corresponding histogram)

Does it look to you like the number of elephants killed has increased? It certainly doesn’t to me. Note that between 20 and 50% of observed data are 0 kills in all years except 2002 (which the authors say was the start year of the data, and exclude from their analysis). Can you strongly conclude any change from these figures? I haven’t shown the legal kill data but it is broadly similar in scale. Certainly, if there is any upward step in illegal kills in 2008, it could potentially be explained simply by changes in populations of elephants – if even a small change in elephant density leads to an extra 1 or 2 extra kills per site per year, it would lead to distributions like those in Figure 1. To me it seems likely that the single biggest determinant of elephant kills will be the number of elephants and the number of poachers. If we assume the number of poachers (or the pace of their activity) changed after 2008, then surely we need to consider what happened to the population of elephants overall in 2008. If it declined, then poachers might catch the same number as 2007; if it increased, they would catch more.

The best way to analyze this data is to directly adjust for the population of elephants. We can use the legal kill data to do this, assuming that it is mostly reflective of elephant population dynamics. It’s not easy, but if from published sources one can obtain some estimate of the mortality rate of wild elephants (or their life expectancy), a Bayesian model could be built to estimate total population of elephants from carcasses. This would give a credible interval for the population that could then be used as what is called an offset in a Poisson regression model that simply modeled counts of illegal kills directly against time. The advantage of this is that it uses all 0 count events, because a Poisson model allows for zeros, but it adjusts for the estimated population. I think the whole thing could be done in a single modeling process, but if not one could obtain first a distribution of the elephant population, then use this to simulate many different possible regression model coefficients for the effect of the ivory sale. In this model, the effect of the ivory sale would simply represent a direct estimate of the relative increase in mortality of elephants due to poaching.

Then, to complete the process, one would add in the rhino data and use a difference-in-difference approach to estimate the additional effect of the ivory sale on elephant mortality compared to rhinos. In this case one would find that the sale was protective for elephants, but potentially catastrophic for rhinos.

Conclusion

Based on looking at this data and my critical review of the model, I cannot conclude that the ivory sale led to an increase in poaching. I think CITES should continue to consider ivory sales as a tool to reduce elephant poaching, though with caution and further ongoing evaluation. In addition, based on the cited unnamed CITES spokesman, evidence from rhino culling at the time suggests the sale may even have been protective of elephants during a period of increased poaching; if so, a further big sale might actually crush the business, although there would be little benefit to this if it simply drove poachers to kill more rhinos.

With regard to the poor model design here, it shows a lot of what I have come to expect from economics research: poor definition of an outcome variable that seems intuitive but is mathematically useless (in health economics, the incremental cost effectiveness ratio shows a similar set of problems); over-reliance on OLS models when they are clearly inappropriate; poor model specification and covariate adjustment; and unwillingness to use Poisson or survival models when they are clearly most suited to the data.

I think there is lots of evidence that legal markets don’t necessary protect animals from over-exploitation (exhibit A, the fishing industry), but it is also obviously possible that economic levers of supply and demand could be used to kill an illegal industry. I suspect that more effective, sustainable solutions to the poaching problem will involve proper enforcement of sales bans in China and Japan, development in the regions where poaching happens, and better monitoring and implementation of anti-poaching measures. If market-crushing strategies like the 2008 ivory sale are going to be deployed, development is needed to offer affected communities an opportunity to move into other industries. But I certainly don’t think on the evidence presented here that such market-crushing strategies would have the exact opposite of the intended effect, and I hope this poor quality, non-peer-reviewed article in the NBER doesn’t discourage CITES from deploying a potentially effective strategy to stop an industry that is destroying a majestic and beautiful wild animal.

Those bastards!!

Those bastards!!

I ran into more problems trying to run OpenBUGS on a Bayesian spatial regression model today. The problems are laid out pretty simply in the picture above: I ran into an illegal memory write. I know this is not the fault of my model, as it runs smoothly for 50, 500 or 1000 iterations. This illegal memory write happens somewhere between 6000 and 6100 iterations.

This model is a simple version of a larger model I’m building. It has data on about 1800 small areas collected over four time points, with age and sex covariates, with a Poisson distribution for the outcome and an adjacency matrix for the 1800 areas. The data are laid out with the time points and age- and sex-values as separate records, so for one municipality I have 4*6*2=24 observations. I can’t use other packages for this because they assume one observation per municipality (i.e a very simple relationship between data structure and adjacency matrix). So I’m using OpenBUGS or WinBUGS. The ultimate goal is to extend to 30 time points and 18 age categories, with a more complex time and age structure, but I want to get this simple model running first because I’m running into major computational issues.

Today I identified that these computational issues can’t be solved. The reason for the illegal memory write is that OpenBUGS has a 1.6Gb RAM limit. People say this but it’s hard to confirm; however, in digging through critiques of BUGS I discovered it is written in Component Pascal and compiled in BlackBox, and a bit more digging reveals BlackBox has a 1.6Gb RAM limit.

This is frankly ridiculous. It’s understandable that there would be a RAM limit in a 32-bit system, but OpenBUGS has been around for about 9 years now and no one has upgraded it to handle either multi-core computers or 64 bit architecture. Given that OpenBUGS is designed to handle complex, processor-demanding simulation-heavy Bayesian models, the decision to write it in such a restrictive framework is ridiculous. I’m a strong critic of open source systems but if it had been written for R it would at least be able to use the 64-bit architecture, even though R has been late to the multi-core party. I bought a PC with 128 Gb of RAM specifically for this big project, and I might as well have bought a laptop with the minimum RAM. For the model I ultimately aim to build I expect I will need about 100,000 iterations to get stability, which means that OpenBUGS will never get there. The only way to run this model without a workaround is to either 1) write the full adjacency matrix by hand and implement it directly [something I am considering] or 2) recompile the source code (which I think might not be available) in a different Pascal compiler. I have no idea how to even start with that.

I have considered two workarounds, however, though I don’t know whether either of them might work.

  1. Save and rerun: the reason that OpenBUGS hits its RAM limit is that it saves all iterations in RAM, but I think this is possibly bad design. So one option could be to run the model to 5000 iterations, then save the results, shut down OpenBUGS, reopen OpenBUGS, load the model file, and then use the update functions to run another 5000 iterations on what has already been run. I think this can be done (running additional updates on a past model) but I’m not sure. If this works I just need to run one set of runs a night for about 3 weeks, and I’ll get my model.
  2. Rerun with sequential initial values: If method 1 doesn’t work, another option that I am very sure will work is to run the model for 5000 iterations, extract all estimated values from the model, then start a new model of 5000 runs with the past estimated values as the initial values for the next model. I’m pretty sure it will start off where it left off, assuming I correctly specify them all, although there might be small jumps in the trace, but ultimately it’ll get where it needs to go. But restarting the model is going to take a lot of time unless I can find a way to loop through and extract values automatically (I think I can’t). So probably a month to run the model.

Ridiculous! And even if I had a supercomputer I couldn’t speed it up …

Today Stata 14 was released, and it finally has built in functionality for a wide range of Bayesian statistical tasks. I’m hoping that they’ll introduce conditional autoregression and the BYM model in version 15. Really, if you have a grant that can afford a single Stata license, it’s really worth getting. You just can’t rely on open source statistical packages. Only use them when you must!

Today I’m trying to do some simple prediction in R. I have a simple linear model with a single missing data point, and I need to get the predicted values and standard errors for 14 data points including this one missing point. Because I find R’s data handling in linear prediction really hard to use, I decided to use a brutal, simple loop approach to get the job done. And in doing so I have uncovered a really weird piece of behavior.

My linear model predicts a proportion as a function of year, with year valued from 0 to 13 and 2 missing. My outputs are a matrix with the year and the predicted value (con.pred) and a matrix with the year and standard error (con.se). My input is the data (dataValues) and the resulting model (con.lm). So my model runs like this:

con.lm<-lm(y~year,data=dataValues)

for (j in 1:14){
year<-j-1
temp.pred<-predict(con.lm,newdata=year,se.fit=TRUE)
con.pred[j,2]<-temp.pred$fit
con.se[j,2]<-temp.pred$se
}

Now the weird thing here is that this breaks down and delivers the following error:

Error in eval(predvars, data, env) : not that many frames on the stack

However, it didn’t stop working immediately. The first 5 elements of the output con.pred have values – the process failed at j=6. Even weirder, when I ran it two hours ago it failed at j=10. However, it works fine if I make this tiny change:

  for (j in 1:14){
year<-j-1
temp.pred<-predict(con.lm,newdata=data.frame(year),se.fit=TRUE)
con.pred[j,2]<-temp.pred$fit
con.se[j,2]<-temp.pred$se
}

How can it be that the prediction process works for the value year=5, but doesn’t work when year=6? But it works for all values if I redefine the scalar year to be a data frame? Does this mean that R treats scalars with a value below 6 as data frames, but not above 6?

Also, if I type the following commands directly into the command line I get different predicted values:

temp.pred<-predict(con.lm,newdata=(year=1),se.fit=TRUE)

temp.pred<-predict(con.lm,newdata=1,se.fit=TRUE)

However, the latter command doesn’t produce any warnings and does not do the default prediction (which would return 69 fitted values; it returns one value, that appears to correspond with a value of year of about 9. Furthermore, while the first of these two commands works, this command does not, and brings up the same not enough frames error:

temp.pred<-predict(con.lm,newdata=(year=10),se.fit=TRUE)

So I have confirmed in the command line that the commands I have given in script break at some value of year>1.

I don’t know what’s going on here, but it’s completely counter-intuitive. The only error returned is also completely meaningless. R really needs to improve its linear prediction and data handling, because there is something deeply wrong with the way it handles input data.

I’ve recently been building a fairly complex series of Bayesian spatial regression models in BUGS, and thought I’d share some tips based on hard won experience with the models. The various BUGS packages have the most cryptic and incoherent error messages of any stats software I have ever worked with, and although various Bayesian boosters claim that their modeling approach is intuitive, in my opinion it is the exact opposite of intuitive, and it is extremely hard to configure data for use in the packages. Furthermore, online help is hard to find – google an error message and you will find multiple websites with people asking questions that have never been answered, which is rare in the modern world. I take this as a sign that most people don’t understand the error message, and indeed the BUGS manual includes a list of errors with “possible interpretations” that reads more like the I Ching than a software guide. But Confucius say Enlightenment is not to be found in Black Box Pascal, so here is my experience of BUGS.

The models I’m running are complex, with nested conditional autoregressive structures and the higher level having more than 1000 areas with complex neighbour relationships, and millions of observations. I originally ran them on a completely hideous Hewlett Packard laptop, with 4 cores and 8Gb of RAM. I subsequently upgraded to a Dell Workstation (joy in comparison to HP’s clunky root-kitted horror) with 8 cores and 16Gb of RAM; I’m not sure that hardware is the main barrier to performance here though …

The HP machine had a secret administrator account (arseholes!) so I couldn’t install winBUGS[1], so I started off running OpenBUGS called through R’s R2OpenBUGS package running in RStudio. I use R to set up the data and initial values, because I can’t think of any other way to load millions of observations into a text file without going stir crazy. But when I call OpenBUGS it just hangs … no error messages or any other kind of indication of what is going on. I also can’t tell if it is happening at the data loading or compiling or inits stage.

Some digging around online and I found an old post by Andrew Gelman, observing that BUGS does not work well with “large datasets, multivariate structures, and regression coefficients.”

i.e. pretty much every statistical problem worth doing. Gelman also notes that “efficiently-programmed models can get really long, ugly, and bug-prone,” which seems like a contradiction in terms.

Anyway, noting that my data was large, with multivariate structures and regression coefficients, I thought maybe I should tone it down a bit so I tried using a higher level of spatial heirarchy, which reduces the adjacency matrix by an order of magnitude. Still no dice. It was at this point that I upgraded to the bigger computer.

On the bigger computer the smaller model actually worked! But it didn’t work in the sense that anything meaningful came out of it … It worked in the sense that it reported a completely incomprehensible bug, something like a node having an invalid value. I tried multiple different values and nothing worked, but somewhere on the internet I found someone hinting that you should try running BUGS directly rather than calling through R, so I tried this … having created the data in R, I killed OpenBUGS then opened the OpenBUGS interface directly and input the model, then the data, using the text files created by R[2].

When I did this I could step through the process – model was syntatically correct, then model failed to compile! Given that loading inits comes after compilation, an error telling me that I had the wrong initial value seems a bit misleading… in fact I had an “index out of range” error, and when I investigated I found I had made a mistake preparing one part of the data. So where the actual error was “the model can’t compile because you have provided the wrong data,” when called through R the problem was “you have the wrong initial values” (even though I haven’t actually loaded initial values yet).

WTF?! But let’s step back and look at this process for a moment, because it is seven shades of wrong. When you run R2OpenBUGS in R, it first turns the data and inits into a form that OpenBUGS can read; then it dumps these into a directory; then it opens OpenBUGS and gets OpenBUGS to access those files in a stepwise process – at least, that’s what I see R doing. If I decide to do the model directly in the OpenBUGS graphical interface, then what I do is I get R to make the data, then I use the task manager to kill OpenBUGS, then I call OpenBUGS directly, and get OpenBUGS to access the files R made in a stepwise process. i.e. I do exactly the same thing that R does, but I get completely different error messages.

There are various places on the internet where you might stumble on this advice, but I want to stress it: you get different error messages in OpenBUGS run natively than you do in OpenBUGS called through R. Those error messages are so different that you will get a completely different idea of what is wrong with your program.

Anyway, I fixed the index but then I ran into problems after I tried to load my initial values. Nothing seemed to work, and the errors were really cryptic. “Invalid initial value” is not very useful. But further digging on the internet showed me that OpenBUGS and WinBUGS have different approaches to initial values, and winBUGS is not as strict about the values that it accepts. Hmmm … so I installed winBUGS, and reran the model… and it worked! OpenBUGS apparently has some kind of condition on certain variables that they must sum to 0, while winBUGS doesn’t check that condition. A free tip for beginners: setting your initial values so they sum to 0 doesn’t help, but running the same model, unchanged, in winBUGS, works.

So either OpenBUGS is too strict, or winBUGS lets through a whole bunch of dodgy stuff. I am inclined to believe the former, because initial values shouldn’t be a major obstacle to a good model, but as others[3] have observed, BUGS is programmed in a completely opaque system so no one knows what it is doing.

So, multiple misleading errors, and a complex weirdness about calling external software through R, and I have a functioning model. Today I expanded that model back to the original order of magnitude of small areas, and it also worked, though there was an interesting weirdness here. When I tried to compile the model it took about three hours, and produced a Trap. But the weird thing is the Trap contained no warnings about BUGS at all, they were all warnings about windows (something called Windows.AddInteger or similar), and after I killed the Trap my model updated fine. So I think the compile problems I previously experienced may have had something to do with memory problems in Windows (I had no problems with badly designed adjacency matrices in the larger model), but OpenBUGS just doesn’t tell you what’s going on, so you have no idea …

I should also add, for intrepid readers who have got this far, that this dude provides an excellent catalogue of OpenBUGS errors with his plain English explanations of what they actually meant. He’s like this mystical interpreter of the I Ching for Bayesian spatial regressives. Also I want to add that I think the CAR spatial correlation model is super dodgy. I found this article (pdf) by Melanie Wall from the Journal of Statistical Planning and Inference (what a read!) that shows that the way we construct the spatial adjacency matrix is the primary determinant of the correlation structure, and that the correlation structure determined by this adjacency matrix is nothing like what we think we are getting. Today on my whiteboard and with the help of R I imagined a simple industrial process where each stage in the process is correlated with the one before and after it, and I showed very easily based on Wall’s work that the adjacency matrix required to describe this process is completely different to the one that you would naively set up under the framework described for CAR modeling. So I think most of the “spatial correlation” structures described using CAR models have no relationship to what the programmer thinks they’re entering into the model. But I have no proof of this, so I guess like everyone else I’ll just press on, using the adjacency matrix I think works …

So there you have it. Next time you see an opinion formed on the basis of a spatial regression model built in BUGS, remember the problems I had getting to the output, and ask yourself – do you trust that model? Really?

fn1: Well, I could copy winBUGS into the program files folder but I couldn’t patch it or install the immortality key, which, wtf? When I bought Time Series Modelling and Forecasting by Brockwell and Davis, ITSM came as a free disk with the book. When I buy the BUGS book I get to install software that comes with a big message telling me to get stuffed, and years later they finally provide a key that enables you to use it for free …

fn2: If you aren’t aware of how this works, basically when you call OpenBUGS in R, providing data from inside R, R first dumps the data into text files in the directory of your choosing, then OpenBUGS opens those files. So if you aren’t comfortable preparing data for BUGS yourself, use the list and structure commands in R, then kill OpenBUGS and go to OpenBUGS directly … the text files will remain in the directory you chose.

fn3: Barry Rowlingson does a pretty good job of showing how interesting and useful spatial analysis can be: see e.g. his post on mapping the Zaatari refugee camp in Syria.

I have a dataset of about 40 million records covering rare events over a 30 year period, with about 600,000 events in total. The data are clustered in about 50 regions, and in the latter years of the file there are only about 4000 events a year (they increase in frequency in the early years). This means that in one year there may be only 80 events in any one region. I want to conduct a multi-level Poisson regression of the events, with about four or five categorical covariates plus year and a random effect for region. I’m doing this using GLLAMM in Stata on an Apple computer with 32Gb of RAM, 12 cores and Stata/MP optimized for all 12 cores (this is not a cheap setup!) The raw dataset takes about 16Gb but I can collapse it to a dataset of counts of events and populations at risk, which reduces it to (I think, from memory) about 170,000 records. I’m using a Poisson distribution, so essentially a loglinear model, but I should probably ideally use negative binomial for zero-inflated data – unfortunately GLLAMM doesn’t handle negative binomial data and I have read of problems with the xtnegbin procedure, so I just have to suck it up and use the Poisson family. I’m also not adjusting for serial dependence, because that would mean extra random effects and/or correlation structures and it would just get too nasty.

The problem here is that GLLAMM just won’t converge. If you set it running with relatively standard settings you can first run it with the NOEST command to get the coefficients, then feed them as starting values into a model with adaptive quadrature and 12 integration points, but it just won’t converge: after 3 or 4 days and 150 iterations (one estimation cycle takes about an hour) it still hasn’t converged, and the maximum likelihood estimate is not concave. The maximization process seems to be proceeding in some fashion, because every 10 iterations or so the likelihood decreases slightly, but it just won’t seem to stop. Interestingly you can’t set the tolerance in the estimation process in GLLAMM, so I can’t stop it when the MLE is decreasing by less than 0.01 (which it is doing now) but more than 0.00001.

This non-convergence is not a data issue – it works fine in a standard Poisson regression, there are no warnings about combinations of covariates with all zeros or all 1s, and this also seems to be the case when the data is divided up into regions (as best I can tell). I think the problem is just that the events are rare, there are a lot of dimensions over which the likelihood needs to change, and it’s just a very tough optimization problem. I have run into this problem with non-linear multi-level models before, and it appears to be a problem of big data with rare events and large numbers of regions. So what to do?

I have noticed in the past that estimation in GLLAMM is more likely to be successful if there are more integration points, which slow down the time to converge but improve stability. I’ve also noticed that the initial estimates can really kill the model, and also I’ve noticed that a big part of the estimation procedure is about getting standard errors, not estimates. Also, although you can’t control tolerance you can limit the number of iterations (a rough version of the same thing). Each iteration taking an hour, if one wants to get actual results it is helpful to run as few iterations as possible[1]. So today I tried a new method for getting convergence, which works in stages like this:

  • Start with the NOEST option to get initial coefficients for starting values
  • Feed these into a model with 1 integration point (Laplace Approximation) and 1 iteration
  • Feed the coefficients from this as starting values to a model with 2 integration points and 2 iterations
  • Keep running new models, slowly increasing integration points and iterations

If I did this a few times, I found that my models started converging after just two Newton-Raphson cycles. By the fifth model (with nip(12) and iterate(8)) the coefficients had all become stable to 4 decimal places, and the variances were largely fixed: all that remained was small changes in the variance of the random effect. At the fourth model some of the variances were out of whack, but the coefficients close to their final value. The fourth and fifth models converged on the second Newton-Raphson step. I set it running last night and it’s basically now fine-tuning, using a model with 24 integration points and a maximum of 32 iterations. That will be the final model, it’s probably going to be done by tomorrow morning.

Because this modeling process starts with the final coefficients from the previous model, it seems to bypass a lot of the convergence problems that arise from starting with a model that has coefficients very far from the final values and then waiting. Instead, you get rough coefficients halfway through the process and then restart. Since much of the challenge for the estimation process seems to be in calculating variances rather than coefficients, this saves you a lot of time waiting to get values. It seems to have shortcircuited a lot of convergence issues. I don’t know if it would work for every problem (GLLAMM seems to be a temperamental beast, and every time I use GLLAMM a new problem seems to rear its ugly head), but if you are having convergence problems and you are confident that they are not caused by bad data, then maybe give my method a try. And let me know if it works! (Or if it doesn’t!)

fn1: Actually this model process takes so long that there are some sensitivity analyses we just have to skip. There is one method we use in preparing the data that I would like to check through resampling and sensitivity analysis, but if every model is going to take 2 days, then you can’t even do a paltry 100-sample bootstrap and get a meaningful answer in half a year. Crazy!

I’ve complained before about the reliability and quality of the open source statistics package, R. Sometimes I get pushback, with people suggesting that I just don’t understand what R is trying to do, or that there is an obvious way to do things differently that answers my complaints – that R is idiosyncratic but generally trustworthy.

Well, try this exercise, which I stumbled on today while trying to teach basic programming in R:

  • Run a logistic regression model with any reasonable data set, assign the output to an object (let’s call it logit1)
  • Extract the Akaike Information Criterion (AIC) from this object, using the command logit1$aic. What is the value of the AIC?
  • Now extract basic information from the logistic model by typing its name (logit1). What is the value of the AIC?
  • Now extract more detailed information from the logistic model by typing summary(logit1). What is the value of the AIC?

When I did this today my AIC value was 54720.95. From the summary function it was 54721; from the basic output option it was 54720.

That’s right, depending on how you extract the information, R rounds the value of the AIC up, or truncates it. R truncates a numerical value without telling you.

Do you trust this package to conduct a maximum likelihood estimation procedure, when its developers not only can’t adhere to standard practice in rounding, but can’t even be internally consistent in their errors? And how can you convince someone who needs reliability in their numerical algorithms that they should use R, when R can’t even round numbers consistently?

I should point out that a decision to truncate a number is not a trivial decision. That isn’t something that happens because you didn’t change the default. Someone actually consciously programmed the basic output display method in R to truncate rather than round off. At some point they faced a decision between floor() and round() for a basic, essential part of the I/O for a statistics package, and they decided floor() was the better option. And no one has changed that decision ever since. I don’t think it’s a precision error either (the default precision of the summary function is 4 digits!) because the example I stumbled across today ended with the decimal digits .95. This was a conscious programming decision that no one bothered to fix.

The more I work with R, the more I believe that it is only good for automation, and all its output needs to be checked in a system with actual quality control. And that you should never, ever use it for any process anyone else is going to rely upon.

Everyone knows that Obama’s Democrats just got their arses kicked in the mid-terms. The dominant political narrative is that it was a disaster, right? Those stupid Dems, acting on political principle and bringing in a second rate universal healthcare system by running roughshod over the bipartisan Republicans to overrule democracy, no wonder they lost the mid-terms right? This is the dominant narrative amongst political pundits, and all debate seems to be focused on Obama’s repudiation, the coming destruction of Democrats, and the fundamental conservatism of the American electorate. This is how pundits speak.

I’m not American, so although I feel myself qualified to speak to some extent on the theory of American healthcare policy, I don’t feel like I really know anything about US political debates, so I just assumed that the story being presented by these pundits was some kind of biased but basically correct analysis of the situation, I think you know the kind of interpretation I mean, regardless of your politics: accepting the broad truth of the facts presented but maybe having a different interpretation of the implications and the long-term trends and outcomes for your preferred side of the debate. As a non-American who thinks Obama is too right-wing to make it in my own political culture, this is all just academic angels-on-heads-of-pins stuff. However, recently I stumbled on a counter-narrative to the standard view of Obama’s bloody nose, and it got me thinking about the broader issue of how political pundits analyse politics, and whether anything they say is worth anything while they remain ignorant of statistics.

The counter-narrative is that Obama did better than expected, and although the Democrats got a bloody nose in the elections this is normal, and they didn’t do badly at all. This counter-narrative is exemplified by this post I found on Greg Laden’s blog, which is so simple it’s ridiculous. Laden plotted number of seats lost vs. presidential approval rating for all mid-term elections since 1946, using Excel, and showed the line of best fit. The implications are pretty obvious – there is a direct relationship between presidential popularity and number of seats lost in a mid-term, but this mid-term election Barack Obama was not at the bottom of the range of approval ratings, and he did significantly better than presidents would usually do given his unpopularity. The Democrats should have lost about 40 seats in the House, but they actually only lost 10 – a huge victory. A weaker analysis of the Senate suggests he was facing a uniquely hostile senate election, and probably did well given the circumstances. These analyses are very rough and could be done better (e.g. logistic regression adjusted for number of seats, confounders for presidential party etc) but they probably wouldn’t improve on the basic findings: the Democrats actually did uniquely well given their circumstances.

My guess is if you extended these analyses to an analysis of results of the following presidential and congressional elections, you would find that the number of seats lost in the previous mid-term was a big predictor of success, with the implication that even though most political pundits are saying Obama has flumphed it, the data implies Clinton’s job is going to be extra easy thanks to the Big O, and the congressional elections will also be better than normal. But I suspect my reader(s) have not got that impression from the popular press. Where have pundits gone wrong?

If one thinks about the average pundit, it’s pretty obvious what the problem is: they’re a journalist, and journalists are generally thick as shit. Your average political pundit is going to be barely literate and completely innumerate, and they are going to be firmly of the belief that their experience, connections, colleagues and unique insight into the way people in the halls of power think gives them a unique power to analyze political events. I think this is not true, and that it’s physically impossible for modern political pundits to have a broad enough base of experience to provide any insight into the political process: instead, they are simply retailing gossip from their mates. I will prove this by comparing political pundits to actually intelligent sports journalists, and for my example I will choose that most reviled of sports commentators, the UK football (soccer) pundit.

Let us first consider the average political pundit in the USA. He’s a man, obviously (this is serious stuff!), aged in his 40s, with a 20 year career. Given his age he probably graduated from a journalism degree (they were starting to take off 20 years ago) but he may be a graduate of some other field. He probably has some “Econ101” (in the derogatory sense), so he’s swallowed all the typical rubbish about government debt and how the Chinese own America and so on. He reports deficit busses as if they’re serious politics instead of theatre. Of course he’s an idiot. But he has experience! He has 20 years of experience! But what does this 20 years of experience mean, actually? He has seen, at most, 11 congressional elections (they happen every two years). Of these, if he’s really lucky, three might have been elections occurring in the second term of an incumbent president (e.g. Obama and Bush junior) and maybe six are mid-term elections. Precisely one occurred in the aftermath of a major terrorist event. No one he knows or has ever met has seen a black man elected president. At most 3 of those 11 elections happened against a backdrop of internet-based political activism. He has seen a maximum of six presidential elections, probably four, and has seen the presidency change parties at most three times. He has colleagues that he might be able to discuss elections and the politics with, but these colleagues have seen no more federal congressional elections or presidential elections than he has – they all talk about the same 11 events. This is his “experience.” His boss – let’s imagine a man 20 years his senior – might have 20 congressional elections, 4-5 elections in the second term of a presidency, and maybe 10 mid-terms. This boss has had the remarkable experience of seeing a presidency retained by the incumbent party on a change of president – once. Once. The entire cadre of political journalists have this experience to their name. When it comes to second-term congressional elections, our putative political “expert” is like Lieutenant Gorman in Aliens. “How many drops have you done, Lieutenant?” “Three. Simulated.” What does Private Hudson think of that?

Now let us compare with an equivalent British football pundit. He has seen 20 FA Cups, and assuming he has done his job assiduously has watched every team in the Premier league play every other team 20 times every year. Those games he couldn’t watch he has had a chance to review when needed (eg before crucial matches), giving him experience of watching any one team play probably 15 or 20 times a year, for 20 years. Obviously he focuses on the top teams, but he is able to replay hundreds of games for any team that enters the FA Cup. He has only seen 5 world cups, but he has also seen 5 European cups, and watched the players from the European cup in the Champions League, plus been able to watch international friendlies, African cups, etc. Where he is unfamiliar with a part of the football world, he will have access to mulitple colleagues who are familiar with other areas (Europe, Asia, lower divisions, Americas, etc.) and where he has not been following a team he will have access to a huge compendium of commentary from those who have. There is no chance at all that his colleagues have been raised and trained on the same football experience as him, so when they meet to discuss the football world they will be able to bring potentially radically differing views and experiences to the meeting and, if they take the job seriously, will be able to take away as many different views as there were people at the table. This pundit’s boss will have seen 40 FA Cups, has seen football powers come and go, was around before the era of the oligarchs, etc. This is why football critics are able to identify whole movements in football (like tiki taka) and also identify when they fade away. This is why football pundits can speak authoritatively about, for example, the transitions in the German national team.

The football pundit has experience, and collegiate connections. He or she has access to a broad and deep pool of knowledge. The American political pundit, on the other hand, has hearsay and gossip. The American political pundit cannot claim to have anything that rises to the level of “experience.” He and his mates have seen the same five presidential elections, the same 11 congressional elections, the same two second-term mid-terms. They go to the same cocktail parties and talk to the same media reps. They are peddling nothing better than the perceived wisdom of their five mates, and that wisdom is shallower than the gene pool that your average journalist is drawn from. They are in every way inferior to football commentators.

This problem of political punditry could be solved by application of a simple skill – basic data analysis. Greg Laden’s posts show that, in the absence of experience, even basic data analysis skills can be enormously useful. But political pundits are ignorant of anything resembling data, and they will push back strongly against the idea that analysis can improve their work. As a classic example of this, consider the hostility Nate Silver received for his work predicting American elections, or the way the Poll Bludger was treated in Australia (outed by a major newspaper for disagreeing with their pundits). Because the problem is simple – 5 or 10 events is not sufficient to provide anyone with real experience, at least in a world as complex as politics. It’s not that these commentators are incapable of analysing from experience – they just don’t have the experience. In the modern world, we would never take dating advice from someone with 5 conquests, but we routinely listen to men and women who have seen no more than 5 or 10 elections. This is the level of experience sensible people dismiss as “anecdote.” And in such a circumstance – when you or I know we have insufficient experience to inform our decision – what do we do? We go to data. But political pundits can’t go to data, because they are numerically illiterate. So instead they basically peddle gossip.

The problem with idiots peddling gossip from a politically and socially privileged pulpit is twofold. The first is obvious: they’re wrong. Why should I listen to someone with 5 presidential elections under his belt? How can he possibly have enough knowledge of what goes on on the federal stage to be able to provide cogent commentary? But the second is more insidious: they’re vulnerable to manipulation by powerful or moneyed forces. There are lots of thinktanks out there in the wilderness of American politics, peddling lies that are paid for by various big companies. Why would someone who has only ever seen five presidential elections, and who doesn’t have any ability to properly assess the history of elections, be able to properly analyze political theories put forward by bought-and-paid operatives of the big think tanks? How is experience going to protect someone against the lies these organizations put forth, when their experience is so limited it provides them no context for judging any analysis?

This is an example of the importance of data analysis to modern life. There are many problems where we as individuals lack anything resembling a coherent required level of experience, but where data contains the knowledge of multiple generations, there for us to analyze. Even simple data analyses will take us somewhere, help us to get a context for the problems we’re trying to understand. Someone with hundreds of football games and 20 FA Cups under their belt may be wrong about their understanding of the subtle forces at work in football – they may need data analysis to help them see clearly what is going on in there – but they have the experience to draw from, and indeed if we did data analysis we might well be just looking for numerical patterns in the information the pundit already knows. This is an example in which data analysis can help an experienced expert to better understand their work. But in the case of political punditry it’s different. There is no expert, because people die before they can become experts. Instead we have a gaggle of gossip-mongers. In this case data analysis doesn’t complement their experience, because what they have is not experience, but a couple of moments. Data analysis should replace their work. They are irrelevant, and probably rather than analyzing the actual history of political movements in America, they are regurgitating some canned idea from a think tank. And how can they do anything else? They, personally, are ignorant and naive, but they believe themselves to be experts.

Which is why you shouldn’t believe anything political pundits tell you about elections. At best they’re inexperienced; at worst they’re inexperienced, ignorant, and retelling hearsay from someone who probably has a vested interest in crafting hearsay. Ignore them, and look to the data.