“If a statistical analysis is clearly shown to be effective … it gains nothing from being … principled,” according to Terry Speed in an interesting IMS article (2016) that Harry Crane tweeted about a couple of days ago [i]. Crane objects that you need principles to determine if it is effective, else it “seems that a method is effective (a la Speed) if it gives the answer you want/expect.” I suspected that what Speed was objecting to was an appeal to “principles of inference” of the type to which Neyman objected in my recent post. This turns out to be correct. Here are some excerpts from Speed’s article (emphasis is mine):

Not long ago I helped some people with the statistical analysis of their data. The approach I suggested worked reasonably well, somewhat better than the previously published approaches for dealing with that kind of data, and they were happy. But when they came to write it up, they wanted to describe our approach as principled, and I strongly objected. Why? Who doesn’t like to be seen as principled? I have several reasons for disliking this adjective, and not wanting to see it used to describe anything I do. My principal reason for feeling this way is that such statements carry the implication, typically implicit, but at times explicit, that any other approach to the task is unprincipled. I’ve had to grin and bear this slur on my integrity many times in the writings of Bayesians.

Not atypical is the following statement about probability theory in an article about Bayesian inference:

that it “furnishes us with a principled and consistent framework for meaningful reasoning in the presence of uncertainty.” Not a Bayesian? Then your reasoning is likely to be unprincipled, inconsistent, and meaningless.Calling something one does “principled” makes me think of Hamlet’s mother Queen Gertrude’s comment, “The lady doth protest too much, methinks.”If a statistical analysis is clearly shown to be effective at answering the questions of interest, it gains nothing from being described as principled. And if it hasn’t been shown so, fine words are a poor substitute. In the write-up mentioned at the beginning of this piece, we compared different analyses, and so had no need to tell the reader that we were principled: our approach was shown to be effective.Of course there is the possibility that multiple approaches to the same problem are principled, and they just adhere to different principles. Indeed, one of the ironies in the fact that my collaborators want to describe our approach to the analysis of their data as principled, is that a Bayesian approach is one of the alternatives. And as we have seen, all Bayesian analyses are principled.… I have another reason for feeling ambivalent about principles in statistics. Many years ago, people spent time debating philosophies of statistical inference; some still do. I got absorbed in it for a period in the 1970s. At that time, there was much discussion about the Sufficiency Principle and the Conditionality Principle (each coming in strong and weak versions), the Ancillarity Principle, not to mention the Weak Repeated Sampling Principle, the Invariance (Equivariance) Principle, and others, and the famous Likelihood Principle. There were examples, counter-examples, and theorems of the form “Principle A & Principle B implies Principle C”.

You might think that with so many principles of statistical inference, we’d always know what to do with the next set of data that walks in the door. But this is not so. The principles just mentioned all take as their starting point a statistical model, sometimes from a very restricted class of parametric models. Principles telling you how to get from the data and questions of interest to that point were prominent by their absence, and still are. Probability theory is little help to Bayesians when it comes to deciding on an initial probability model. Perhaps this is reasonable, as there is a difference between the philosophy of statistical inference and the art of making statistical inferences in a given context. We have lots of principles to guide us for dealing with the easy part of our analysis, but none for the hard part. While the younger me spent time on all those Principles over 40 years ago, I wouldn’t teach them today, or even recommend the discussions as worth reading.

Showing effectiveness, of course, relies on “principled”–sound and warranted–reasons for taking the analysis to satisfy given goals (e.g., that flaws and fallacies have been probed and avoided/reported, and that the problems of interest are solved to a reasonable extent). That’s what is involved in showing it to be effective as compared to different analyses. But I agree that 70’s style principles get the order wrong. Not only do they start with an assumed model, even within a statistical model they presuppose *a priori* standpoints about what “we really want”. Instead, any principles should be examined according to how well they promote our ability carry out and check analyses on grounds of effectiveness.

Nowadays, even most Bayesians recommend (default, non-subjective) methods that violate principles thought at one time to be sacrosanct. However, some of these old-style principles still simmer below the surface in many of today’s statistical debates and proposed reforms, so there is still a need to shine a spotlight on their unexamined assumptions. We should explain why some principles are even at odds with effective methods to solve given problems; and show the circularity of one of the most famous “theorems” of all.[ii] It’s too bad that a certain old-fashioned conception of the philosophy of statistical inference gives “principles” a bad name [iii]. We needn’t be allergic to developing principles in the sense of effective strategies for successful inquiries. Speed seems to agree. Here’s the end of his article:

But I do think there is a demand for the principles of what I’ll call initial data analysis, an encapsulation of the knowledge and experience of statisticians in dealing with the early part of an analysis, before we fix on a model or class of models. I am often asked by non-statisticians engaged in data science how they can learn applied statistics, and I don’t have a long list of places to send them. Whether what they need can be expressed in principles is not clear, but I think it’s worth trying. My first step in this direction was taken 16 years ago, when I posted two “Hints & Prejudices” on our microarray analysis web site, namely “Always log” and “Avoid assuming normality”. I am not against principles, but I like to remember Oscar Wilde’s aphorism: “Lean on principles, one day they’ll end up giving way.” (p. 17)

[i]

http://bulletin.imstat.org/wp-content/uploads/Bulletin45_6.pdf

[ii]Search this blog for quite a lot on the Likelihood Principle–it was one of the main topics in the first few years of this blog. Here’s a link to my article in Statistical Science.

[iii] I try to dispel this image of philosophy of statistics in my forthcoming *Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars* (2018, CUP).

Crane thinks I shouldn’t read much of importance into Speed’s short article, as it’s merely a magazine. I disagree, there’s an important issue here, and the fact that his tweet was the origin for my discussing it is irrelevant.

Statsmartin: I agree, as you can tell if you read my post. On the other hand, I also reject the idea of a priori principles of inference of the sort he mentions. It is they that need to be scrutinized according to how well they promote–or stand in the way of–warranted inquiry. It’s too bad that people will tweet replies while not commenting on the blog itself. Twitter, much as I use it, has diminished blogging.

I overlapped with Terry Speed in Sheffield 1973 and can remember

some of the discussions on the principles of statistics he mentions,

not that I took part, I had no idea what was going on. He has now

abandoned this and I agree with his present attitude which is to ignore

them unless they irritate me to such an extent that I feel I have to

say something. This is the case now.

Here are some comments on Crane and Martin.

4.1 The data must be relevant.

Mine are measurements of the quantity of copper in drinking water. The

problem is to specify the actual amount of copper in the sample of

drinking water. They clearly fulfill the demand of relevance.

4.2 The model must be sound.

The following models are all good approximations to the data, the

normal, the Laplace, the log-normal , a t_4, Cauchy, the comb distribution etc

etc. Which one do I choose and why?

The model should generalize to other possible data sets obtained under

the same scientific conditions. What does this mean for my copper

example, always copper, always drinking water, always the same

laboratory, always the same staff? What about sludge, cadmium,

dust, air, nitrates? Always without outliers, always with outliers,

with one outlier, with two outliers?

Always symmetric always skewed? A different model for each of these

possibilities? What would the authors suggest and why?

The authors state two possible roles. One involves a ‘hypothetical

population’. What is a hypothetical population, a real population or a

creation of the mind? I fail to understand why one needs a hypothetical

populations. What is the hypothetical population of my copper example?

I try to give an answer for the data at hand.

The second role is to describe the data generating mechanism. In what

sense is i.i.d. Bernoulli a description of the Newtonian chaos which

generates the result of the coin toss? Actual data generating

mechanisms are in general of a complexity that can in no way be

described by a simple probability model. This applies to the copper

example and the models I suggested there.

4.3 The inference must be valid D =data, P_theta , theta in Theta, a

hypothesis A subset Theta. Now it comes ‘rarely is it possible to {\it

prove} that A is true or false on D alone’. P_theta, Theta, A are all

constructs of the mind. You can only talk about A being true or false

if the data come attached with the whole model P_theta, Theta and a

true value of theta which seems to me to be a very baroque ontology.

My approach to this sort of data is to use functionals, mean, median,

M-functional, MAD etc. There is now no model. What I offer is a

procedure, or a set of procedures (Tukey who seems to have been

forgotten). I look at existence, breakdown points, boundedness,

continuity, differentiability, equivariance over a full neighbourhood

of the data. These latter concepts require a topology. I use the weak

topology of the Kolmogorov metric, more generally a metric based on

V-C classes and so on.

It is possible to do covariate choice for linear least squares

regression without postulating or using the standard linear model,

y=beta*x +noise or indeed any model. It is possible to use the same

idea to produce graphs for gene expression data.

I avoid the word true unless used in the sense of everyday truths or plains

truths (Bernard Williams). Models are not true they are

approximations. The authors mention have no concept of

approximation.