Recently I wrote a post criticizing an article at the National Bureau of Economic Research that found a legal ivory sale in 2008 led to an increase in elephant poaching. This paper used a measure of illegal poaching called commonly the PIKE, which is measured through counting carcasses discovered in searches of designated areas. This PIKE measures the proportion of deaths that are due to illegal poaching. I argued that the paper was wrong because the PIKE has a lot of zero or potentially missing values, and is bounded between 0 to 1 but the authors used ordinary least squares (OLS) regression to model this bounded number, which creates huge problems. As a result, I wasn’t convinced by their findings.
Since then I have read a few reports related to the problem. I downloaded the data and decided to have a look at an alternative, simple way of estimating the PIKE in which the estimate of the PIKE emerges naturally from a standard Poisson regression model of mortality. This analysis better handles the large number of zeros in the data, and also the non-linear nature of death counts. I estimated a very simple model, but I think a stronger model can be developed using all the standard properties of a well-designed Poisson regression model, rather than trying to massage a model misspecification with the wrong distributional assumptions to try and get the correct answer. This is an outline of my methods and results from the most basic model.
Methods
I obtained the data of legal and illegal elephant carcasses from the Monitoring Illegal Killing of Elephants (MIKE) website. This data indicates the region, country code, site number, and number of natural deaths and illegally killed elephants at each site in each year from 2002 – 2015. I used the full data set, although for comparing with the NBER paper I also used the 2003-2013 data only. I didn’t exclude any very large death numbers as the NBER authors did.
The data set contains a column for total deaths and a column for illegal deaths. I transformed the data into a data set with two records for each site-year, so that one row was illegal killings and one was total deaths. I then defined a covariate, we will call it X, that was 0 for illegal killings and 1 for total killings. We can then define a Poisson model as follows, where Y is the count of deaths in each row of the data set (each observation):
Y~Poisson(lambda) (1)
ln(lambda)=alpha+beta*X (2)
Here I have dropped various subscripts because I’m really bad at writing maths in wordpress, but there should be an i for each observation, obviously.
This simple model can appear confounding to people who’ve been raised on only OLS regression. It has two lines and no residuals, and it also introduces the quantity lambda and the natural logarithm, which in this case we call a link function. The twiddle in the first line indicates that Y is Poisson distributed. If you check the Poisson distribution entry in Wikipedia you will see that the parameter lambda gives the average of the distribution. In this case it measures the average number of deaths in the i-th row. When we solve the model, we use a method called maximum likelihood estimation to obtain the relationship between that average number of deaths and the parameters in the second line – but taking the natural log of that average first. The Poisson distribution is able to handle values of Y (observed deaths) that are 0, even though this average parameter lambda is not 0 – and the average parameter cannot be 0 for a Poisson distribution, which means the linear equation in the second line is always defined. If you look at the example probability distributions for the Poisson distribution at Wikipedia, you will note that for small values of lambda values of 0 and 1 are very common. The Poisson distribution is designed to handle data that gives large numbers of zeros, and carefully recalibrates the estimates of lambda to suit those zeros.
It’s not how Obama would do it, but it’s the correct method.
The value X has only two values in the basic model shown above: 0 for illegal deaths, and 1 for total deaths. When X=0 equation (2) becomes
ln(lambda)=alpha (3)
and when X=1 it becomes
ln(lambda)=alpha+beta (4)
But when X=0 the deaths are illegal; let’s denote these by lambda_ill. When X=1 we are looking at total deaths, which we denote lambda_tot. Then for X=1 we have
ln(lambda_tot)=alpha+beta
ln(lambda_tot)-alpha=beta
ln(lambda_tot)-ln(lambda_ill)=beta (5)
because from equation (3) we have alpha=ln(lambda_ill). Note that the average PIKE can be defined as
PIKE=lambda_ill/lambda_tot
Then, rearranging equation (5) slightly we have
ln(lambda_tot/lambda_ill)=beta
-ln(lambda_ill/lambda_tot)=beta
ln(PIKE)=-beta
So once we have used maximum likelihood estimation to solve for the value of beta, we can obtain
PIKE=exp(-beta)
as our estimate of the average PIKE, with confidence intervals.
Note that this estimate of PIKE wasn’t obtained directly by modeling it as data – it emerged organically through estimation of the average mortality counts. This method of estimating average mortality rates is absolutely 100% statistically robust, and so too is our estimate of PIKE, as it is simply an interpretation of a mathematical expression derived from a statistically robust estimation method.
We can expand this model to enable various additional estimations. We can add a time term to equation (2), but most importantly we can add a term for the ivory sale and its interaction with poaching type, which enables us to get an estimate of the effect of the sale on the average number of illegal deaths and an estimate of the effect of the sale on the average value of the PIKE. This is the model I fitted, including a time term. To estimate the basic values of the PIKE in each year I fitted a model with dummy variables for each year, the illegal/legal killing term (X) and no other terms. All models were fitted in Stata/MP 14.
Results
Figure 1 shows the average PIKE estimated from the mortality data with a simple model using dummy variables for each year and illegal/total killing term only.

Figure 1: Estimated Annual PIKE values
The average PIKE before the sale was 0.43 with 95% Confidence Interval (CI) 0.40 to 0.44. Average PIKE after increased by 26%, to 0.54 (95% CI: 0.52 to 056). This isn’t a very large increase, and it should be noted that the general theory is that poaching is unsustainable when the PIKE value exceeds 0.5. The nature of this small increase can be seen in Figure 2, which plots average total and illegal carcasses over time.

Figure 2: Trend in illegal killings and total deaths
Although numbers of illegal carcasses increased over time, so did the number of legal carcasses. After adjusting for the time trend, illegal killings appear to have increased by a factor of 2.05 (95% CI 1.95 – 2.16) after the sale, and total kills by 1.58 (95% CI 1.53 – 1.64). Note that there appears to have been a big temporary spike in 2011-2012, but my results don’t change much if the data is analyzed from only 2003-2013.
Conclusion
Using a properly-specified model to estimate the PIKE as a side-effect of a basic model of average carcass counts, I found only a 25% increase in the PIKE despite very large increases in the number of illegal carcasses observed. This increase doesn’t appear to be just the effect of time, since after adjusting for time the post-2008 step is still significant, but there are a couple of possible explanations for it, including the drought effect mentioned by the authors of the NBER paper; some kind of relationship between illegal killing and overall mortality (for example because the oldest elephants are killed and this affects survival of the whole tribe); or an increase in monitoring activity. The baseline CITES report identified a relationship between the number of person-hours devoted to searching and the number of carcasses found, and the PIKE is significantly higher in areas with easy monitoring access. It’s also possible that it is related to elephant density.
I didn’t use exactly the same data as the NBER team for this model, so it may not be directly comparable – I built this model as an example of how to properly estimate the PIKE, not to get precise estimates of the PIKE. This model still has many flaws, and a better model would use the same structure but with the following adjustments:
- Random effects for site
- A random effect for country
- Incorporation of elephant populations, so that we are comparing rates rather than average carcasses. Population numbers are available in the baseline CITES report but I’m not sure if they’re still available
- Adjustment for age. Apparently baby elephant carcasses are hard to find but poachers only kill adults. Analyzing within age groups or adjusting for age might help to identify where the highest kill rates are and their possible effect on population models
- Adjustment for search effort and some other site-specific data (accessibility and size, perhaps)
- Addition of rhino data to enable a difference-in-difference analysis comparing with a population that wasn’t subject to the sale
These adjustments to the model would allow adjustment for the degree of searching and intervention involved in each country, which probably affect the ability to identify carcasses. It would also enable adjustment for possible contemporaneous changes in for example search effort. I don’t know how much of this data is available, but someone should use this model to estimate PIKE and changes in mortality rates using whatever is available.
Without the correct models, estimates of the PIKE and the associated implications are impossible. OLS models may be good enough for Obama, but they aren’t going to help any elephants. Through a properly constructed and adjusted Poisson regression model, we can get statistically robust estimates of the PIKE and trends in the PIKE, and we can better understand the real effect of interventions to control poaching.
August 19, 2016 at 6:36 pm
Hi I am a dunce at stats but once did some aerial wildlife counts in Africa. Linear strip transects are flown, divided into subunits of standard length (and area since the strip width is calibrated and maintained), within each of which we record the numbers of animals seen. This produces a population of subunit samples of animal density by species. Because the subunits are mapped it is easy to use them to produce density distribution maps of transects (as a graphical output of GIS etc). The well-known Mike Norton-Griffiths handbook “Counting Animals” explains all this. Total population sizes of animals in the area sampled (often the transects cover 10% or less of the entire survey zone) are estimated by assuming that density varies continuously and so the Normal distribution can be applied (it’s assumed that the sampling design ie the layout of the transects is close enough to random sampling; and the number of samples exceeds 30, and that there can be correction of any effects due to uneven sized sampling units). I think what happens next is that it is assumed that the population of subunit density samples can be assumed to have been drawn from an overall density distribution that is Normal, which enables one to apply Normal parameters to calculate the mean population density for each species with variances and then produce population estimates by pro-rating this to the total area form which the x% sample fraction of subunits was taken.
Since this is a standard method in wildlife biology I imagine the MIKE practitioners would probably have used some similar thinking (at least when they started) – imagining that dead animals will be distributed across landscapes in density distributions that are reminiscent of those of living animals – so a carcass count should provide density distributions that are continuous, expressed in carcasses per unit area – that being the case, I guess they would be justified in using parametric/continuous stat models, with two “species”: dead by natural causes; illegally killed?
But it seems the analysis has evolved to a more simplistic one that uses only carcass encounters and assignments to these of cause of mortality as an index of poaching intensity.
In the light of that it’s interesting that you suggest the PIKE model could be improved by, among other things, “Incorporation of elephant populations, so that we are comparing rates rather than average carcasses. Population numbers are available in the baseline CITES report but I’m not sure if they’re still available”. It seems that exactly that was part of the orginal design of the MIKE data analysis strategy eg section 4 in https://cites.org/sites/default/files/common/prog/mike/data/Data_Analysis_Strategy.pdf
The Paul Allen funded aerial surveys http://www.greatelephantcensus.com/ should give a robust estimate of population density and distribution.
It will be surprising if they show anything except that elephant range is being compressed in most range states, populations are shrinking and more poaching or retaliatory/problem animal killing is happening.
Have you had any comments from the NBER authors?
August 19, 2016 at 7:29 pm
Thanks for your comment Simon. Unfortunately I’ve got no understanding of wildlife counting – I did do some capture-recapture estimates of population size many years ago but I did it for injecting drug users, not elephants, and I think my knowledge would be very out of date. I have been looking around at other methods for estimating these quantities for mobile populations and found some interesting material about estimating survivorship for cetaceans (whose population counts are, I would guess, very uncertain) but I can’t say more about them yet because I haven’t tried them. I only really raised the population issue because in epidemiology this is a key element of analysis and in many cases you probably wouldn’t go further if you didn’t have it.
My concern with using carcass counts only is that we don’t know whether the increasing numbers reflect increasing search effort, increasing population (and thus increasing deaths) or increasing deaths (and thus reducing population). For example Japan has an increasing mortality count, but this is because everyone is living longer. If this were happening with elephants, this would be a great thing. But without at least one decent population estimate I guess you can’t tell.
How does the aerial wildlife sample handle populations that move around a lot, like elephants? I assume the 30 transects aren’t all performed on the same day?
If I find elephant populations I will definitely update the model. Unfortunately I only do these analyses as a hobby so I don’t exactly put a lot of effort into hunting out datasets. I haven’t heard anything from the NBER authors and I doubt I will, since I doubt they have noticed my humble blog …
August 22, 2016 at 7:39 pm
Hi – thanks for the reply. From my limited experience the aerial surveys are done across large areas with an emphasis on completing as fast as possible to try to avoid the sort of double counting risks that you have perceived. Looking back at the report from the survey I was involved in I see the transects were at 2.5km separations and we flew at a target height of 250 feet at a speed of 150 kph. We flew (I think) seven “sorties” on successive days to sample an area of ca 5600 sq km.
The area we surveyed was rectangular in shape with transects oriented across the shorter dimension, which I think was around 20km in length. This would mean that at a speed of 150kph it would take about 10 mins from the start of one transect and then get onto the start of the next (coming back in the oppostie direction). So if you see an elephant at the start of one transect the same elephant has about 18-20 mins to get to the nearest point on the subsequent transect, 2.5km distant. That would mean walking at 7.5kmh which I think is only a little bit more than normal walking pace for an elephant.
I’d guess that it was fairly unlikely that during a given session an elephant seen on one transect would move into the next transect to be double-counted within the time taken to move from one transect to the next, but an elephant moving 30-40km per day perhaps could manage to cross several transects in the course of a day and be counted on successive days! Our little report on our survey is at:
Click to access Ruaha%20LMGCA%20Rift%20Valley%20SRF%201994%20Dry%20Season%20REWMP.pdf
Well-funded and planned survey campaigns would probably use more than one plane with their flight planning seek to achieve maximum coverage in the shortest time possible while minimising double counting risks. Survey techniques may be tailored to try to reduce bias that may result from animals’ behaviours – an example I am aware of is that buffalo have highly aggregated distributions, so attempts may be made to undertake a total count when a herd is encountered. Modern techniques of survey also may employ video capture of what is in the transect beneath the aircraft and the enumeration of the samples can be done very meticulously post-survey, which probably improves accuracy a lot, maybe even to the extent that double counting can be almost eliminated.
September 2, 2016 at 8:36 pm
I see the great elephant census results are out now
https://peerj.com/articles/2354/#results
http://www.greatelephantcensus.com/final-report/
September 2, 2016 at 9:12 pm
I saw that in the news this morning, I am considering an updated model …