In amongst the horror and carnage and corruption that is modern Western Europe, the Guardian has a quaint report on the table soccer game of Subbuteo, which apparently is still going strong and has its own fanatical retro-games following. The world final concludes tonight in Palermo, Sicily, and the report includes reminiscences on the popularity of the game in the 80s and early 90s. Just like D&D, its popularity was driven by its huge popularity amongst primary-school boys. I remember playing in a league at my school (and not coming in the bottom), and arguments with friends at home – I hadn’t thought of this for years, until I read this report.

Note also the photo of the humble abode of a 70s football club manager. You wouldn’t see anything like that now!

The Nameless One has spoken, this time through a team of oracles at the German sea-life aquarium, and the cephalopod cabal predicted a win by Japan in the Women’s Soccer World Cup. The closeness of the decision within the tentacled tribunal led some to question whether the final match might be a closely-fought event, and indeed it was; but in the end Nadeshico Japan won! Ganbare Nippon! I wanted to watch this final match but sadly it was only available on pay TV, so I missed it. But I’m happy that Japan won a well-deserved victory after beating some tough teams (Germany and Sweden!) to get there.

Incidentally, Nadeshico in Japanese is a name taken to refer to a classical Japanese vision of feminity. To say someone is “a Nadeshico” is to compliment their feminity as both beautiful and traditional. I think this is an excellent name for a women’s national soccer team. Well done Nadeshico!

I’m not the first person to have considered the possibility that Paul the Octopus is the spawn of Cthulhu, based on his “remarkable” predictive powers. However, being unconvinced, I presented the possibility that he is not a normal octopus to my students last week, as an example of a basic non-parametric test (the runs test). I thought I’d present a couple of results here, and contemplate some of the complexities of hypothesis testing against a backdrop of crawling chaos.

Introduction

So the basic tale is that Paul predicted the outcome of 5 German games and then one Holland/Spain game successfully, and he had an 80% success rate in the European cup (4 games out of 5 predicted correctly). We will present some statistical tests of this situation, and finish up with a few discussion details.

Aim

To test whether or not the Oberhausen Sea Life aquarium is housing one of the gibbering dark ones from beyond time and space, or whether, in fact, Paul is just a normal octopus who happens to be lucky. Additionally, are the cult of the Ancient Ones who surround him actually a bunch of charlatans making money from our credulous belief in the crawling abominations of the netherworld? Should we sacrifice Paul, perhaps lightly-battered with a slice of lemon, for the good of all humanity; or should we accept his fundamental normality and get on with our lives safe in the knowledge that the Nameless Ones do not, in fact, inhabit our mortal realm?[1]

Method

We can posit the fundamental question as to Paul’s normality or infinite evil in terms of the null and alternative hypothesis of a non-parametric statistical test, as follows. Let the random variable X measure the outcome of Paul’s attempt to guess the result of the next Germany match. Then let X=1 if Paul is successful in his prediction, and X=0 if he fails. Define the probability that Paul successfully predicts a soccer match as p=P(X=1). Then, we can write the null and alternative hypotheses as:

H0: Paul is a normal octopus (p=1/2)

H1: Paul is a crawling abomination from the pits of hell (p>1/2)

In this case we can test the possibility that p>1/2 by means of a runs test. That is, under the null hypothesis, is the chance that Paul would predict 5 games correctly in a row unusually low, such that we might reject the null hypothesis with some confidence? We will choose a confidence of 95% and reject the null hypothesis if the probability of 5 games predicted correctly in a row is less than 5%. Note that we are using a runs test here, requiring sequential successes; we might want to allow the possibility that he can make a mistake at any point in the process, in which case we are interested in the probability that he gets 5 games out of 5 correct in any order.

This second test is important because in 2008 Paul predicted 4 games out of 5, for 80% accuracy. I’m not sure whether this happened sequentially or not, but it seems reasonable to suppose that his mistake could occur at any point in the chain of games, so then we need to calculate the probability of 4 games correct out of 5, in any order, and identify whether this is less than 5% (for a one-sided test), in order to reject the null hypothesis in favour of the terrible omens of destruction and chaos.

Results

So, the probability that he correctly predicts 5 games in a row under the null is (1/2)^5, because the predictions are independent events and the probability is thus the product of their separate probabilities. This gives a probability of 1/32=3%, or less than 5%. We reject the null hypothesis of normality, and conclude that in fact the Elder Gods stalk the (aquariums of the) Earth.

However, the probability of 4 out of 5 correct in any order is (5 4) (1/2)^5 under the null hypothesis, where (5 4) is my crappy non-latex way of writing “5 choose 4”. This gives us 5/32=1/6=16% (approximately) so we retain the null hypothesis, that Paul is a normal octopus. Note the probability of 4 predictions in a row is 1/16 (exactly) or 6%,so no dice…

So, we have contradictory results concerning the nature of evil. Having proven statistically that British people are idiots and the Australian government didn’t burn the house down, I’m a little disappointed at this mixed result. I’m sure no priest of Sigmar would accept such equivocation where the agents of chaos are concerned. What to do?

Discussion

We could combine the results of the two football matches, to get a total of 10 games with 9 correct results, but we don’t really have 10 games, because the 5 predictions of each series are correlated – Paul was a younger, and presumably less infinitely evil, octopus 2 years ago, and maybe had a different predictive method/ ritual, plus of course his cult followers were probably making different/smaller human sacrifices. So we need to consider the possibility that those 5 games are more similar to each other than they are to the next 5 games. Without any knowledge of the degree of correlation in the octopus’s predictions under the null hypothesis, we can’t make a judgement.

There is also a question of inter-rater agreement here. It’s possible that Paul always goes for the same box, and the staff don’t randomly assign flags to boxes, or just by luck the Germany box is more likely to be on the side Paul favours. We should probably consider the randomization sequence of the boxes in some way. A variable for the side on which the box is placed, or better still random assignment of the flags to the boxes, would have solved this problem.

But I think there is a more sinister trick at work here. We know that Germany are a strong team, and we know that Paul is lured into the boxes by mussels. So, since the staff can be confident that the German team will likely win most games, it is quite easy to rig the process by training Paul to prefer the German flag[2]. Remember that Octopi have strong colour vision and are very smart, so it could be possible to train a preference. Then, the probability of success in each predictive effort increases significantly. The Probability of success is P(Paul picks Germany and Germany win)+ P(Paul picks the opposition and Germany lose)=P(Paul picks Germany)*P(Germany win)+(1-P(Paul Picks Germany))*P(Germany lose), by the independence of the prediction and the outcome. But if P(Paul picks Germany)>1/2 and P(Germany win)>1/2, the total probability increases a lot. We know Germany won 3 games out of 5 this time around, so we could estimate P(Germany win)=0.6; if P(Paul picks Germany)=0.8, then we have the total p=0.8*0.6+0.4*0.2=0.56, p>1/2. If Germany’s win probability is really 0.8 (because Serbia were a pack of cheating bastards), then the probability increases to p=0.68.

Of course, because Germany win most games and Paul predicts they win most games, the actual fact that Paul is going to pick Germany more often anyway gets missed.

A final couple of notes. First, in this analysis[3], I have ignored the Holland/Spain prediction, because I read somewhere that Paul used to only predict on games involving Germany. This means that the Holland/Spain game is well outside the range of data on which the predictive model is based, and we shouldn’t assume it represents the same underlying probability structure or process (or manifestation of ultimate evil). So I’ve excluded this observation from my data set.

Secondly, it’s worth bearing in mind that statisticians should never, ever use statistical tests to test theoretically implausible events[4]. Because there is a small chance of type 1 error (rejecting the null hypothesis when the null is true), as soon as you apply a statistical test to a ridiculously implausible theory, you open the risk that you will prove it to be “true” by mistake. So all that is required to prove the existence of God is for some nong to conduct a statistical test of an apparent “miracle” that is really just a carefully trained Octopus, get a spurious result, and before you know it you have people worshipping his tentacly appendages.

Conclusion

Two non-parametric statistical tests have produced inconclusive results as to whether or not the shambling horrors of cthulhu walk among us, predicting our soccer matches. However, the test that rejected the null hypothesis was borderline, and consistent with the possibility that Paul has been trained to pick the German flag more often than other flags, thus ensuring increased predictive success and a high likelihood of a run of successful predictions, provided that Germany remain a strong team. This report concludes that Paul should probably not be burnt at the stake (or grilled) as a heretic, tentacled avatar of the brooding darkness; but it might be worthwhile to monitor him, his aquarium shrine, and the Cult that surround him, for further signs of the manifestations of chaos and, if witnessed, liquidate them and extirpate their teachings from the annals of history in the interests of the human race.

Update: Looking at the Wikipedia entry on our dark and tentacled oppressor, I note that actually he got 7 out of 7 results correct in this world cup, and only 4 out of 6 in the European cup. This doesn’t change the conclusion of our runs test (which simply becomes an even more powerful indication of his brooding and ultimate evil), but it makes his success rate in the European cup look even more merely mortal. Also the wikipedia entry correctly points out that in the group games there is a chance of a draw, so what we actually have here is a sequence of multinomial events with probability 1/3 of three outcomes in the first 3 tests, then 1/2 of two outcomes in the remainder (under the null). We would need to adjust the probabilities accordingly, for both the runs and the binomial test. This actually makes the binomial test a bit fiddly, but my guess is that it reduces the p-value slightly (due to the probabilities of success being lower). I think the wikipedia entry is slightly wrong on the odds of “at least 12 successes in 14 trials” due to the issue of correlation (as mentioned above)[5].

fn1: yet

fn2: My suspicion is that they ran a series of dummy runs with Paul before the cup, and either gave him a second mussel when he picked Germany, and/or sacrificed a virgin and offered her blood to the elder gods to enhance his magical powers; statistical testing seems to suggest the former was the case, but we can never be sure…

fn3: and I do use the term loosely

fn4: this applies to the kids at home too, obviously

fn5: also, has anyone else noticed that the wikipedia entry on the ecological fallacy confuses confounding and the ecological fallacy? At least, I thought it did last time I read it.

… and I will give you the world cup winning team[1]. This from the Spanish coach, in support of my comments about the demise of European soccer. I wonder if Holland has a similar approach? At the end of this article the Spanish coach mentions that the Germans developed the same approach to fostering young talent, but that Spain have been doing it longer.

This is similar in aspect to the remarkable phenomenon of the UK doing better than Australia in the 2008 Olympics. This was a direct result of money being poured into elite sports in preparation for 2012, and will undoubtedly be repeated in London.

But, lest one think that this makes for better sportspeople… the guardian had a graphic showing the most successful teams by GDP, and they largely weren’t from rich nations. But I can’t find it anymore.

Incidentally, my kick-boxing gym is training children as young as (my guess) 5 years old, and it’s very, very cute (you can see them in the third picture)… the teacher was trained in Thailand, and I wonder if he’s thinking of a Thai model for developing fighters – get them at 5 and make it their life. There’s an 8 or 10 year old boy (on the right in the pic) who is ferociously good, though apparently he bottles it a bit during fights. But it will be interesting to see the results when they’re adults…

fn1: yes, yes, I know, it’s premature. But the Octopus said so.

So, that festival of the boot is on again, and although since I moved to Europe my interest in soccer has waned considerably[1], I still watch the World Cup quite avidly. Of the 6 European soccer giants – Spain, Italy, Germany, England, France and Holland – only 4 made it to the round of 16, and in that round already another – England – has been knocked out in a match they lost 4-1 to a German team that beat Australia 4-0. This is the same England team that struggled to get through the group stage. The two finalists from 2006 went out in the group stage, and in such an ignominious fashion as hardly befits European minnows, let alone France or Italy. Italy was beaten comprehensively by Slovakia and only drew with tiny New Zealand after pulling a penalty with traditional Italian diving methods[2].

I noticed that the three European giants who have bombed so far all have quite old players. Italy and England particularly, but even France still has players like Thierry Henry. Holland has also been playing a little poorly – they really struggled against Japan – and they also have quite a few holdouts from previous cups. On the other hand, Germany has a very young team. This article in the Guardian makes the point that this is not a coincidence, and that the Germans have been putting a lot of work into developing local talent. It’s also the first German team to be representative of Germany’s multicultural modernity, with 5 or 6 players being of Arab/Turkish/Eastern European/latin American origin. I take this as a sign that the German FA has been searching far and wide for talent.

So what is with the old teams that bombed? I think that these three countries – the UK, France, Italy – have opened their football markets simultaneously[4] to easy foreign transfers and massive television marketing money in the last 20 years, and the consequence of this has been an easy-come-easy-go attitude by the clubs. Instead of doing the hard work of developing local talent, they’ve taken the low-risk approach of buying in talent from abroad. This makes FA Premier league games fun to watch, but it has had the dual effect of a) importing players from smaller countries and giving them exposure to world-class coaching and playing techniques and b) reducing the pool of talented local players. The consequence of this at the world cup is that these countries’ national teams not only have to select their line-up from a shallower pool of talent, and thus rely increasingly on has-beens like Rooney; but they also find themselves facing a wider pool of nations with quality players who have been groomed by these big football nations’ leagues. New Zealand, for example, has a line up whose entire transfer value was  a third that of one player on the Italian team (de Rossi, I think). They had one player from Blackburn in defense, another player from an English team in midfield, and another in offense, and they assembled around this spine a team that included several amateurs. In 1982 their team was entirely composed of amateurs. So while the available quality for NZ has increased considerably, England and Italy find themselves relying increasingly on old men, and in the washup of last night’s defeat the press are also claiming that the young players aren’t so great.

Make no mistake, this is good for football. Having an increasingly diverse pool of finals contenders, with 2 Asian teams through to the round of 16 (and one a favourite, I note, to go to the quarters!), an African team through to the quarters, and a selection of latin American teams, is good. But from the point of view of the football giants of Europe, something has gone wrong. Compare the British approach to football with the Australian or NZ approach to rugby. If a NZ player ever plays for a foreign club, they can never again play for NZ. So even though the foreign clubs pay vast sums more than the local clubs, NZ players wait until their world cup hopes are over before heading overseas – after their (shameful) 2008 World Cup loss, a whole stack of players who knew they wouldn’t be selected again headed to French and British clubs to earn the real money. As a result of this the All Blacks have players lined up 3 deep for most positions, and the lead players can’t guarantee selection in the next game if they don’t keep their act together – and this is the stated policy of the NZRB[5].

This should also be the case for the European soccer giants. There is no way that in a nation obsessed with football, as England is, a 30-something second-rate striker like Rooney should be able to even get in the squad, let alone onto the pitch. There should be a 28 year old and a couple of youngsters ahead of him – the same for Lampard, Cole, etc. Beckham stayed in long past his prime, and was a crap captain to boot. I think this is a result of market forces operating in England, and although one should rightly observe that although these market forces have had a good effect on the rest of the world game (and on the viewing public’s enjoyment of football), the British FA needs to think about some countervailing mechanisms to groom up a new generation of English players.

I suppose it could be argued that the Italian problem is not so much an effect of broadcast TV as the general corrupt and moribund nature of Italian institutions. But I think that Italy and France have similar broadcast models to the UK, and I wonder if the Northern European countries have (as is traditional up there) opted for a more genuinely social democratic approach to the game, that strikes a balance between the market model of “buy the best team you can” and the long-term good of the game. Because football is notable for its intense nationalism, I think that the long-term good of the game and national success are inextricably linked, as you can see from the excitement about soccer that is stirred up in rugby countries (like Australia) when we have international success. It strikes me as interesting that some of the European countries with the most intensely nationalistic fans – Italy and the UK – have managed to somehow water down their own national teams in a way that pours cold water on that nationalism. Transferring that national allegiance to clubs is not going to be  a good thing for social order at local soccer grounds, and the game isn’t going to maintain its populist appeal if it loses its nationalist appeal (not that it will ever be unpopular – soccer is a very very good game to play and to watch). But Associations like the FA have an important role to play in fostering local talent, otherwise why have them? And I’m sure there must be more than a few people in England and Italy and France this week thinking “why do we bother with an FA at all?” when their national teams perform so badly, their local leagues are essentially deregulated in every significant particular, and the FA doesn’t even properly monitor on-pitch referee or player behaviour.

The Italian captain made a comment last week to the effect that not beating NZ would be like the All Blacks failing to beat Italy in rugby. It’s noticeable that recently Italy have beaten England at Twickenham, and the IRB is moving to include Argentina in the Tri Nations. I wonder if this week a lot of Italian soccer fans are thinking of teaching themselves the rules of rugby, and diversifying their football interests? If Australians can do it[6], so can Italians.

fn1: Football culture in England (and probably much of Europe) is a horrible, macho and nationalist display of male tribal bonding that I just can’t get behind or support. From afar in Australia the Champions league was fun to watch, but in England it feels like you are participating in a form of ritualized abuse. The complete and total exclusion of women from all aspects of the sport, the hyper-macho posturing of the fans, their sudden exaggerated Englishness, it’s all horrible, as is the tense atmosphere the football areas, the armies of police, the dogs, the chanting aggressive dimwits wandering around in dangerous gangs, the implicit acceptance of this phenomenon as a side-effect of the game that has to be tolerated in order to enjoy its limited benefits. And, of course, there is the gender divide – with women thoroughly and completely uninterested and excluded. If you’re wondering why British women are so thoroughly unsporty, you don’t need to look any further than the crowd of a British football match, completely and utterly devoid of women. To people from outside Europe – or people from a rugby tradition inside Britain, for that matter – this all looks very strange.

fn2: Note as well that Italy had a particularly easy run, being drawn in a weak group and being given amazing referee favouritism – in their final game against Slovakia with 10 minutes to go their two strikers attacked the Slovakian keeper, kicking him and punching him, and the Slovakian keeper received a yellow card. The whole thing was caught on camera too – if it were Aussie Rules Football or Rugby those two men would have been sent packing immediately; and this came after another unprovoked attack in the first half. Italy should have finished that game with an 8 man team and a much less flattering scoreline[3].

fn3: In case you hadn’t noticed, I really hate the Italian national team. I have done ever since they beat Australia in the 2006 quarter finals with a shocking piece of diving. The sooner FIFA accepts the inevitable and introduces video refereeing and summary execution for diving, the better.

fn4: After that British player won a case in the European court, a case which ended up not benefiting him at all but completely changed the face of European football.

fn5: On a side note, I don’t much go in for the complaints of some in the British press that the English players are paid so much that they don’t care whether they win or lose internationally – I think they care very much, although I do think that injury-wise they probably assign their first loyalty to the club that pays them so much. But Southern hemisphere codes have a salary cap, which I think does have the consequence of reducing the prima donna element of player behaviour, and preventing the players form influencing the selectors as much. I also wonder if the greater respect rugby players show the referee compared to soccer has anything to do with their relative pay grades. At a rough guess, an Aussie football player is paid maybe 5 times as much as a referee, while an English star would be paid 50 times as much as a referee. Obviously institutional factors are the main driver of this, particularly the post-match judgements made in rugby which mean that you can’t just argue your way out of trouble on-pitch. But surely that pay grade differential makes a difference to on-pitch behaviour. As an example of down-to-earthness, when I did weights at the University of New South Wales I spotted bench press for a professsional rugby league player, who was doing rehabilitation weights during the summer break[7] in between contracts, before heading to Europe to play with a French team. I somehow doubt that your average premier league player ever has the misfortune of having to share training space with us mere mortals, let alone having a non-professional human being assist them with their weights.

fn6: Australia has 4 codes of football that we divide our attention between, and we’ve been world champions in three of them.

fn7: “rehabilitation weights” for a dislocated shoulder in this case meant doing 85-100kg bench press sets of 12, with clap push ups in between and 30 second rests; followed by dumbbell flies with 35 kg on each shoulder, and more clap push ups. The man himself probably weighed 100kg. That’s some rehabilitation!