I know, I know, guaranteed flamewar title … but here goes. Recently I have read in quite a few locations (the BUGS Book, various papers) about Bayesian statistical models that use a non-informative prior. For example, a regression model with a prior on the slope that is N(0,10000) and a prior on the (log of) residual error that is U(-100,100). In the case of a simple mean-estimation model with normal likelihood this would lead to a normal posterior distribution for the mean. This leads me to two philosophical ponderings about Bayesian statistics that I am sure others must have answered.

  1. What is the point of a non-informative prior? Why not just use a classical non-Bayesian, likelihood-based estimation process? In Gelman’s Bayesian Data Analysis 3 (BDA3) he states that in the large sample limit, estimators derived from these non-informative priors will converge to the same value as the likelihood-based estimates, which seems entirely reasonable (how could they not?) But in this case, why bother? Surely this means that all you are doing is slowing the rate of convergence, i.e. essentially saying that your uncertainty about the prior distribution of the mean adds uncertainty to your estimate of the mean. But this is tautological. We all know that there is uncertainty about the estimate of the mean, or we wouldn’t be estimating it. Saying (for example) that your prior is symmetrically distributed about the value 0, with a wide range of possible values, and that the mean is normally distributed, doesn’t seem to add anything when the null hypothesis of a likelihood-based estimation is that the data is scattered about the value 0, and the mean is normally distributed. In the best possible case, this assumption is true, in which case the “Non-informative prior” adds no information, just uncertainty; in the worst possible case the data is not symmetrically distributed around 0, in which case your prior (which assumes it is), is actually an informative prior, since it forces the posterior to incorporate more symmetricity than the analogous likelihood-based estimate of the mean would offer.
  2. Does the large-sample limiting behavior in this situation alter the philosophical interpretation of confidence intervals? One of the many arguments for Bayesian statistics is that the posterior distribution of the estimate (let’s say the mean in this case) is more intuitively interpretable, i.e. lies closer to our intuitive interpretation of confidence intervals, than the classic likelihood-based “frequentist” estimator[1]. (I think I read this several times in various text books, e.g. Gelman’s BDA3). Specifically, we can interpret the Bayesian posterior interval as a distribution of mean values, so e.g. if it is normal we can say that values near the posterior mean are more likely; whereas with a classic 95% confidence interval we are supposed to only interpret that the true parameter value lies “somewhere” inside the interval. This makes posterior credible intervals more philosophically appealing than confidence intervals. However, in the large sample limit the posterior credible interval and the classic confidence interval converge. In this case, can we say that the large-sample confidence interval behaves the same as a large-sample credible interval, and if so can we interpret the confidence interval as also representing a distribution of possible values? If so, surely the same interpretation applies to the small-sample confidence interval? How do we reconcile these two conflicting interpretations of confidence intervals?

I think that confidence intervals are one of the most misunderstood and mysterious of common statistical methodology. While Bayesian credible intervals may be easier to interpret, I think they might just cast further confusion into our ability to interpret classical confidence intervals …

It seems to me that Bayesian statistics with a non-informative prior is a waste of time, just a way to do something fashionable that could be done more simply and accurately with a classical likelihood-based approach. At its worse it is misleading if the implicit assumption of your “non-informative” prior is symmetricity when the data suggests that there is no such thing. In the case where this is not an issue and the prior is genuinely non-informative I think it just adds uncertainty to estimation, without any philosophical insight into why. In this case it is counter-productive. Should Bayesian statistics always use an informative prior, and where such a prior is not possible, analysts should revert to classical statistics?

fn1: this is the last time I am ever going to use the word “frequentist.”