Today I’m trying to do some simple prediction in R. I have a simple linear model with a single missing data point, and I need to get the predicted values and standard errors for 14 data points including this one missing point. Because I find R’s data handling in linear prediction really hard to use, I decided to use a brutal, simple loop approach to get the job done. And in doing so I have uncovered a really weird piece of behavior.

My linear model predicts a proportion as a function of year, with year valued from 0 to 13 and 2 missing. My outputs are a matrix with the year and the predicted value (con.pred) and a matrix with the year and standard error (con.se). My input is the data (dataValues) and the resulting model (con.lm). So my model runs like this:

con.lm<-lm(y~year,data=dataValues)

for (j in 1:14){
year<-j-1
temp.pred<-predict(con.lm,newdata=year,se.fit=TRUE)
con.pred[j,2]<-temp.pred\$fit
con.se[j,2]<-temp.pred\$se
}

Now the weird thing here is that this breaks down and delivers the following error:

Error in eval(predvars, data, env) : not that many frames on the stack

However, it didn’t stop working immediately. The first 5 elements of the output con.pred have values – the process failed at j=6. Even weirder, when I ran it two hours ago it failed at j=10. However, it works fine if I make this tiny change:

for (j in 1:14){
year<-j-1
temp.pred<-predict(con.lm,newdata=data.frame(year),se.fit=TRUE)
con.pred[j,2]<-temp.pred\$fit
con.se[j,2]<-temp.pred\$se
}

How can it be that the prediction process works for the value year=5, but doesn’t work when year=6? But it works for all values if I redefine the scalar year to be a data frame? Does this mean that R treats scalars with a value below 6 as data frames, but not above 6?

Also, if I type the following commands directly into the command line I get different predicted values:

temp.pred<-predict(con.lm,newdata=(year=1),se.fit=TRUE)

temp.pred<-predict(con.lm,newdata=1,se.fit=TRUE)

However, the latter command doesn’t produce any warnings and does not do the default prediction (which would return 69 fitted values; it returns one value, that appears to correspond with a value of year of about 9. Furthermore, while the first of these two commands works, this command does not, and brings up the same not enough frames error:

temp.pred<-predict(con.lm,newdata=(year=10),se.fit=TRUE)

So I have confirmed in the command line that the commands I have given in script break at some value of year>1.

I don’t know what’s going on here, but it’s completely counter-intuitive. The only error returned is also completely meaningless. R really needs to improve its linear prediction and data handling, because there is something deeply wrong with the way it handles input data.