*A friend from Elba surprised me by sending the interesting paper and discussion of Dennis Lindley (2000), “The Philosophy of Statistics,” which I hadn’t seen in years. She suggested, as especially apt, J. Nelder’s remarks; I recommend the full article and discussion:*

Recently (Nelder,1999) I have argued that statistics should be called statistical science, and that probability theory should be called statistical mathematics (not mathematical statistics). I think that Professor Lindley’s paper should be called the philosophy of statistical mathematics, and within it there is little that I disagree with. However, my interest is in the philosophy of statistical science, which I regard as different. Statistical science is not just about the study of uncertainty but rather deals with inferences about scientific theories from uncertain data. An important quality about theories is that they are essentially open ended; at anytime someone may come along and produce a new theory outside the current set. This contrasts with probability where to calculate a specific probability it is necessary to have a bounded universe of possibilities over which the probabilities are defined. When there is intrinsic open-endedness it is not enough to have a residual class of all the theories that I have not thought of yet. The best that we can do is to express relative likelihoods of different parameter values, without any implication that one of them is true. Although Lindley stresses that probabilities are conditional I do not think that this copes with the open-endedness problem.

I follow Fisher in distinguishing between inferences about specific events, such as that it will rain here tomorrow and inferences about theories. .…

General ideas like exchangeability and coherence are fine in themselves, but problems arise when we try to apply them to data from the real world. In particular when combining information from several data sets we can assume exchangeability, but the data themselves may strongly suggest that this assumption is not true. Similarly we can be coherent and wrong, because the world is not as assumed by Lindley. I find the procedures of scientific inference to be more complex than those defined in the paper. These latter fall into the class of ‘wouldn’t it be nice if’, i.e. would it not be nice if the philosophy of statistical mathematics sufficed for scientific inference. I do not think that it does. (325)

- Lindley, D. V. (2000), “The Philosophy of Statistics,”
*Journal of the Royal Statistical Society,*Series D (*The Statistician*), Vol. 49, No. 3, 293-337 - Nelder, J.A. (2000), Commentary on “The Philosophy of Statistics,”
*Journal of the Royal Statistical Society,*Series D (*The Statistician*), Vol. 49, No. 3, 324-5. - Nelder, J.A. (1999) “From Statistics to Statistical Science”
*Statistician*48, 257-267.

I think Nelder makes a very important distinction between what we might do to organize our personal beliefs (ponder our personal priors) versus what type of data analysis is suitable for presentation to others. Others are interested in what a proper experiment or observational study has to say.

What I found most interesting was his use of “statistical science” for developing and appraising scientific theories. I’m sure many will consider the recommendation to “leave out the subjective” oversimple (even though that’s not quite what he meant), but his emphasis on the open nature required for useful scientific theorizing is deeper and more significant I think.

My takeaways include 1. The simple truth is that one cannot legitimately assign a probability to a theory because the probabilities of all competing theories must sum to 1, and we cannot identify all of the theories, and 2. That statistical science is about probing aspects of theories, which is a complex (and messy, perhaps) endeavor. It cannot fit very nicely in a tidy little package (or model).

True. What baffles me is why so many seen stuck in a kind of “statisticism” wherein it is imagined that all of the complex moves from the planning, collecting, modeling and drawing inferences from data must be formalized within a probability computation. Pearson said that he and Neyman were always thinking of contexts wherein the planning was closely connected to interpretation, and several piecemeal tests were envisioned, with lots of emphasis on how to model the phenomenon. (I think Fisher said it was an accident that experiment design was developed separately from inference.) Even where one gets to the “inferential” part, probability seems to get the logic wrong, at least for the appraisals that interest me, e.g., how good a job did this research do at probing and ruling out errors that could render it mistaken to infer some claim h. It might provide lousy grounds for both h and its denial. It’s a different mindset or task. But it is more general so as to subsume the special cases wherein the “inference” properly takes the form of a probability assignment to events. Some kind of “reconciliation” might be found here, but not until people are pried away from the statisticism standpoint.

“In analyzing data relative to one or more scientific theories, I would wish to present what is objective and not to mix this with subjective probabilities which are derived from my priors.”

I couldn’t agree more.

The problem is that Bayes doesn’t go away because it’s inconvenient, and you can’t just ignore the prior if you want the false positive risk (FPR). And experimenters do want the FPR (in fact most still believe, mistakenly, that that is what the P value gives you).

The best way out of this dilemma seems to be to calculate the prior, rather than assume it. We made a web calculator to make it easy: http://fpr-calc.ucl.ac.uk/

David: Sorry for delay. Things haven’t changed since Nelder just because you and some people insist on confusing the type i error probability with a posterior probability based on either beliefs or assumed high probabilities of true null hypotheses over the course of science or some other unclear reference class. Most people say all nulls are false, whereas your argument depends on giving them a fairly high point prior. This has been deemed problematic since Edwards, Lindmann and Savage and before. If the researchers are predicting the direction, they should be doing one sided tests, and the mismatch between p-values and posteriors disappears (as Casella and R. Berger 1987 show). Of course, we’ve been through this many times.

Thanks very much. I agree that the whole argument turns on whether or not you think that the point null is a sensible thing to test. It seems to me to be entirely reasonable, on the grounds that most bright ideas don’t work. Others think it’s unreasonable.

It’s hard to see how this disagreement will be resolved, but it is rather important for users that it should be.

David: I agree, but it should be quite strange for those advocating change of standards that it would turn on such a debatable point I haven’t seen any of them take up the issue, have you?

Well it certainly isn’t emphasised in recent discussion, though “point null” in the title of Berger & Sellke (1987), But quite often it’s buried in the mathematics so probably a lot of people don’t notice.

Another thing that’s been discussed remarkably little in the recent literature is the distinction between what I call the p-equals and the p-less-than cases. Despite the fact that it’s a distinction that goes back a long time, it’s now discussed so little that there doesn’t even seem to be a convenient name for them. It seems to me that, for interpretation of a single test, what you need is the p-equals interpretation. This gives false positive risks that are quite a lot higher than the p-less-than interpretation. The latter is used by Ioannidis and many others (though without any explicit acknowledgment). So my estimates of false positive risks are even bigger than theirs .

David: That’s a great point, I really think we should get to the bottom of this. Evidence policy “reforms” shouldn’t be based on questionable arguments, even in part. That’s what bothers me and many others about them. Many are in earnest,but some, I fear, are largely concerned with it ending up in their favor somehow–and the rewards that go with that.

I guess that we can all agree that declaring a discovery when you get P = 0.043 is not at all satisfactory. But it is almost universal at least in the biomedical literature. That means that something must be done, and it’s surely the job of statisticians to advise on what.