**28 November: (10 – 12 noon):**

**Mayo: “On Birnbaum’s argument for the Likelihood Principle: A 50-year old error and its influence on statistical foundations”**

**PH500 Seminar, Room: Lak 2.06 (Lakatos building).****London School of Economics and Political Science (LSE)**

**Background reading: PAPER**

See general announcement here.

*Background to the Discussion:* Question: How did I get involved in disproving Birnbaum’s result in 2006?

Answer: Appealing to something called the “weak conditionality principle (WCP)” arose in avoiding a classic problem (arising from mixture tests) described by David Cox (1958), as discussed in our joint paper:

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science* (D Mayo & A. Spanos eds.), CUP 276-304.

However, Birnbaum had argued (1962) that the WCP (together with other uncontroversial principles) entailed the “strong-likelihood principle” (SLP), from which it followed that frequentist sampling distributions were irrelevant for inference (once the data are available)! Moreover, Birnbaum’s result is given as uncontroversial in many textbooks:

It is not uncommon to see statistics texts argue that in frequentist theory one is faced with the following dilemma: either to deny the appropriateness of conditioning on the precision of the tool chosen by the toss of a coin, or else to embrace the strong likelihood principle, which entails that frequentist sampling distributions are irrelevant to inference once the data are obtained. This is a false dilemma. . . . The “dilemma” argument is therefore an illusion. (Cox and Mayo 2010, 298).

This led to Mayo 2010, and now to the discussion for my upcoming seminar. Other links to the Strong Likelihood Principle SLP: Cox & Mayo 2011 (appendix); Birnbaum 1970 Letter to Nature Editor; the current “U-Phil“; and “Breaking through the Breakthrough” posts on Dec 6 & Dec 7, 2011; and by further searching this blog*.

*Interested individuals who have not yet contacted me, write:*error@vt.edu .

On the two subsequent seminars: see here**.

*5 Dec (12 noon- 2 p.m.): Sir David Cox**12 Dec (10 a.m. -12 noon): Dr. Stephen Senn*

**For updates, details, and associated readings: please check the LSE Ph500 page on my blog, original announcement, or write to me.**

Blurb for the series of 5 seminars: “Contemporary problems in PhilStat”: Debates over the philosophical foundations of statistical science have a long and fascinating history marked by deep and passionate controversies that intertwine with fundamental notions of the nature of statistical inference and the role of probabilistic concepts in inductive learning. Progress in resolving decades-old controversies which still shake the foundations of statistics, demands both philosophical and technical acumen, but gaining entry into the current state of play requires a road map that zeroes in on core themes and current standpoints. While the seminar will attempt to minimize technical details, it will be important to clarify key notions to fully contribute to the debates. Relevance for general philosophical problems will be emphasized. Because the contexts in which statistical methods are most needed are ones that compel us to be most aware of strategies scientists use to cope with threats to reliability, considering the nature of statistical method in the collection, modeling, and analysis of data is an effective way to articulate and warrant general principles of evidence and inference.

**Room 2.06** Lakatos Building; Centre for Philosophy of Natural and Social Science
London School of Economics
Houghton Street
London WC2A 2AE
Administrator: T. R. Chivers@lse.ac.uk

*Birnbaum posts on Dec 31, 2011 & Jan 8, 2012; Likelihood posts (e.g., Oct 7, 2011, Oct 20, 2011, Jan 3, 2012, Aug 31, 2012)

**I expect to be joined by Dr. C. Hennig on at least one of the days.

Would it be possible to “attend” via Skype or the like?

Greg: We can try if one of the attendees volunteers—I don’t have reliable internet in the LSE, usually. Of course, you pretty much had a private class (for the Birnbaum seminar I mean)….Anyway, I’d glad you’re interested and I’ll see what attendees say.

Great, thanks!

Jaynes gives some of the history of the result from an O’Baysian perspective: (page 250 PLOS)

“Alan Birnbaum (1962) gave the first attempted proof of the likelihood principle to be generally accepted by orthodox statisticiana. From the enthusiastic discussion following the paper, we see that many regarded this as a major historical event in statistics. ….

But Birnbaum’s argument was not accepted by all orthodox statisticians, and Birnbaum himself seems to have had later doubts. …..

In any event, Kempthorne and Folks (1971) and Fraser (1980) contintued to attack ethe likelihood principle and deny its validity.”

He then goes on to say

“Indeed, even coin flip arguments cannot be accepted unconditionally if they are to be taken literally, particularly by a physicist who is aware of all the complicated things that can happen in real coin flips … if there is any logical connection between theta and the coin, so that knowing theta would tell us anything about the coin flip, then knowing the result of the coin flip must tell us something about theta. For example, if we are measuring a gravitational field by the period of a pendulum, but the coin is tossed in that same gravitational field, there is a clear logical connection. Both Benerard’s argument and Birnbaum’s conditionality principle contain an implicit hidden assumption that this is not the case. Presumably, they would reply that, without saying so explicitly, they really meant ‘coin flip’ in a more abstract sense of some binary experiment totally detached from theta and the means of measuring it. But then, the onus was them to define exactly what that binary experiment was, and they never did this.”

It’s an age old problem for Frequentists. As long as you’re content to imagine a “random independent binary experiment” as an abstract mathematical object everything is fine. But as soon as you try to identify such an experiment in the real world you run into the following paradox: the more carefully you specify exactly what that experiment is, the more predictable the results becomes.

The more you define the the physical setup which yields a random coin flip, the more predictable the coin flip becomes from the laws of physics. In practice Frequentist statisticians never try to specify the details of a physical setup which would produce a random sequence of coin flips.

It’s almost enough to make one wonder if probability assignments are really reflections of our state of knowledge (or equivalently ignorance).

Anon: You may want to check (quite a few!) existing links on (or reachable from) this blog to the likelihood principle to see what is current. e.g.,Mayo 2010, Cox & Mayo 2011 (appendix), and the current “U-Phil“.

Anon: Setting up a frequentist model certainly has something to do with our state of knowledge/ignorance and you’re right that this may involve, for example, treating some things as “identical repetitions” that are not *really* identical. You’re also right that something that may be treated as an i.i.d. sequence by a frequentist at some point can be modelled in a more sophisticated way later when some more information about the nature of the different events comes in. Probability (and mathematical) modelling generally requires idealisation and will necessarily ignore some details that are either seen as irrelevant or where it is not clear how to model them appropriately. That’s not different for Bayesians (of any kind) and Jaynes is happy to admit this in his book.

It doesn’t have much to do with Birnbaum, though.

It has nothing to do with Birnbaum or anything else. If we could model phenomena that precisely, we would not need probability or statistics. The need to make assumptions about random processes is commonplace. Error stats do this usefully.

John,

The roulette wheel is the epitome of a random process. If student of probability were playing the game they would no doubt assign equal probability to each number and do so with considerable confidence. After all, significant effort was spent designing to the wheels to achieve that outcome.

Yet it’s a historical fact that since about the early 80’s it was practical for teams using “shoe computers” and other devices to predict the outcome well enough to beat roulette. Their predictions were not precise, so they still had to assign probabilities to every number, but they were improved enough to give the player an advantage over the casino.

Note the “shoe computer” cheaters and the “equal probability” patrons (suckers!) were using these different probability assignments side-by-side at the same roulette wheel while watching the same little ball roll around.

Now, I have no problem with this at all. Both groups were using probability distributions which accurately reflected their state of knowledge. Here their “state of knowledge” consists of information about the space of allowed initial conditions and one group had knowledge which constrain the space to a smaller subset. This is “objective” in every sense in which you would want it to be objective and, moreover, is perfectly meaningful and useful even if the roulette wheel was only used one time and then destroyed.

But Dr. Mayo doesn’t view “P(i)” for some “i” on the roulette wheel that way at all. She views “P(i)” as an objective fact arising from a “random process” who’s truth is well calibrated and verified by some repeated experiment. “P(i)” has to be equal to the limiting relative frequency of “i” in many trials and is unique. All of her claims to the objectivity of Error Statistics rest on this.

So how is it possible to have two very different, but completely legitimate, probability assignments for the exact same roll of the wheel?

Anon,

I cannot speak for Dr. Mayo, who no doubt has a better response, but your example seems to me to have the same import as the biased coin story we have heard before. Nothing in frequency or error statistics endorses favoring the less accurate model of the random process. This is not a statistical issue, but more relating to good scientific practice. Prior information comes into play when framing the problem. This seems simple to me– perhaps I misunderstand you.

“Nothing in frequency or error statistics endorses favoring the less accurate model of the random process”

But we do need to at least sometimes favor, or at least permit, the use of the less accurate model.

Pat has an early model shoe computer that only gains an advantage of 5% at roulette, while Lee has a new model shoe computer that gains 10%. Should Pat’s shoe computer be rejected?

If we have a choice, then why choose the 5% gain over 10%? I might be lost in the thicket of the hypothetical, but this seems a simple issue. We typically try to utilize the most appropriate models to help make inferences. To do otherwise opens the door for criticism and perhaps refutation. Earlier in the blog– last Sep– we discussed Kadane`s biased coin story which was a bit of a straw man that had no bearing on actual practice. Also, there has been a lot of discussion of model checking, which seems to bear on this topic of should we expend effort to find an appropriate model. I do not see any real dilemma for error stats, though there seems to be foundational differences amongst the flavors of Bayesian and between subjective Bayesians and error statisticians as to how the choice of model will be made and justified.

john byrd,

I don’t think the point was that we’re forced to use an inferior model. The point seems to be that the Superior model we all would use in this instance, assigns probabilities that differ considerably from the approximately uniform frequencies the wheel was designed to deliver.