I’ve complained before about the reliability and quality of the open source statistics package, R. Sometimes I get pushback, with people suggesting that I just don’t understand what R is trying to do, or that there is an obvious way to do things differently that answers my complaints – that R is idiosyncratic but generally trustworthy.

Well, try this exercise, which I stumbled on today while trying to teach basic programming in R:

- Run a logistic regression model with any reasonable data set, assign the output to an object (let’s call it
*logit1*) - Extract the Akaike Information Criterion (AIC) from this object, using the command
*logit1$aic.*What is the value of the AIC? - Now extract basic information from the logistic model by typing its name (
*logit1*). What is the value of the AIC? - Now extract more detailed information from the logistic model by typing summary(
*logit1*). What is the value of the AIC?

When I did this today my AIC value was 54720.95. From the summary function it was 54721; from the basic output option it was 54720.

That’s right, depending on how you extract the information, R rounds the value of the AIC up, or truncates it. R *truncates* a numerical value without telling you.

Do you trust this package to conduct a maximum likelihood estimation procedure, when its developers not only can’t adhere to *standard practice in rounding*, but can’t even be internally consistent in their errors? And how can you convince someone who needs reliability in their numerical algorithms that they should use R, when R can’t even round numbers consistently?

I should point out that a decision to truncate a number is not a trivial decision. That isn’t something that happens because you didn’t change the default. Someone actually consciously programmed the basic output display method in R to truncate rather than round off. At some point they faced a decision between *floor()* and *round()* for a basic, essential part of the I/O for a statistics package, and they decided *floor()* was the better option. And no one has changed that decision ever since. I don’t think it’s a precision error either (the default precision of the summary function is 4 digits!) because the example I stumbled across today ended with the decimal digits .95. This was a conscious programming decision that no one bothered to fix.

The more I work with R, the more I believe that it is only good for automation, and all its output needs to be checked in a system with actual quality control. And that you should never, ever use it for any process anyone else is going to rely upon.