A famous chestnut given by Cox (1958) recently came up in conversation. The example “is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992, p. 582). When I describe it, you’ll find it hard to believe many regard it as causing an earthquake in statistical foundations, unless you’re already steeped in these matters. If half the time I reported my weight from a scale that’s always right, and half the time use a scale that gets it right with probability .5, would you say I’m right with probability ¾? Well, maybe. But suppose you knew that this measurement was made with the scale that’s right with probability .5? The overall error probability is scarcely relevant for giving the warrant of the particular measurement,knowing which scale was used. Continue reading

# strong likelihood principle

## Cox’s (1958) weighing machine example

## Midnight With Birnbaum (Happy New Year 2016)

**Just as in the past 5 years since I’ve been blogging, I revisit that spot in the road at 11p.m., just outside the Elbar Room, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wonder if the car will come for me this year, given that my Birnbaum article has been out since 2014… The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2017? Anyway, it’s 6 hrs later here, so I’m about to leave for that spot in the road… If I’m picked up, I’ll add an update at the end.**

You know how in that (not-so) recent Woody Allen movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf? He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve ~~2011~~ ~~2012~~, ~~2013~~, ~~2014~~, ~~2015~~, 2016) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] There are a couple of brief (12/31/14 & 15) updates at the end.

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics. I happen to be writing on your famous argument about the likelihood principle (LP). (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept. Continue reading

## Allan Birnbaum: Foundations of Probability and Statistics (27 May 1923 – 1 July 1976)

*Today is Allan Birnbaum’s birthday. In honor of his birthday this year, I’m posting the articles in the *Synthese* volume that was dedicated to his memory in 1977. The editors describe it as their way of “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up.(Even if you are,you may be unaware of some of these key papers.)*

**HAPPY BIRTHDAY ALLAN!**

*Synthese* Volume 36, No. 1 Sept 1977: *Foundations of Probability and Statistics*, Part I

**Editorial Introduction:**

This special issue of

Syntheseon the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors ofSynthesein October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.THE EDITORS

## Midnight With Birnbaum (Happy New Year)

**Just as in the past 4 years since I’ve been blogging, I revisit that spot in the road at 11p.m., just outside the Elbar Room, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. (The pic on the left is the only blurry image I have of the club I’m taken to.) I wonder if the car will come for me this year, given that my Birnbaum article has been out since 2014… The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics (and cognate methods). Yet as Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods. Maybe in 2016? Anyway, it’s 6 hrs later here, so I’m about to leave for that spot in the road…**

You know how in that (not-so) recent Woody Allen movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf? He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve ~~2011~~ ~~2012~~, ~~2013~~, ~~2014~~, 2015) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] There are a couple of brief (12/31/14 & 15) updates at the end.

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics. I happen to be writing on your famous argument about the likelihood principle (LP). (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii] Sorry,…I know it’s famous…

BIRNBAUM: Well, I shall happily invite you to take any case that violates the LP and allow me to demonstrate that the frequentist is led to inconsistency, provided she also wishes to adhere to the WLP and sufficiency (although less than S is needed).

ERROR STATISTICIAN: Well I happen to be a frequentist (error statistical) philosopher; I have recently (2006) found a hole in your proof,..er…well I hope we can discuss it.

BIRNBAUM: Well, well, well: I’ll bet you a bottle of Elba Grease champagne that I can demonstrate it! Continue reading

## Statistical “reforms” without philosophy are blind (v update)

Is it possible, today, to have a fair-minded engagement with debates over statistical foundations? I’m not sure, but I know it is becoming of pressing importance to try. Increasingly, people are getting serious about methodological reforms—some are quite welcome, others are quite radical. Too rarely do the reformers bring out the philosophical presuppositions of the criticisms and proposed improvements. Today’s (radical?) reform movements are typically launched from criticisms of statistical significance tests and P-values, so I focus on them. Regular readers know how often the P-value (that most unpopular girl in the class) has made her appearance on this blog. Here, I tried to quickly jot down some queries. (Look for later installments and links.) *What are some key questions we need to ask to tell what’s true about today’s criticisms of P-values? *

*I. To get at philosophical underpinnings, the single most import question is this:*

**(1) Do the debaters distinguish different views of the nature of statistical inference and the roles of probability in learning from data? ** Continue reading

## Joan Clarke, Turing, I.J. Good, and “that after-dinner comedy hour…”

I finally saw *The Imitation Game* about Alan Turing and code-breaking at Bletchley Park during WWII. This short clip of Joan Clarke, who was engaged to Turing, includes my late colleague I.J. Good at the end (he’s not second as the clip lists him). Good used to talk a great deal about Bletchley Park and his code-breaking feats while asleep there (see note[a]), but I never imagined Turing’s code-breaking machine (which, by the way, was called the Bombe and not Christopher as in the movie) was so clunky. The movie itself has two tiny scenes including Good. Below I reblog: “Who is Allowed to Cheat?”—one of the topics he and I debated over the years. Links to the full “Savage Forum” (1962) may be found at the end (creaky, but better than nothing.)

[a]”Some sensitive or important Enigma messages were enciphered twice, once in a special variation cipher and again in the normal cipher. …Good dreamed one night that the process had been reversed: normal cipher first, special cipher second. When he woke up he tried his theory on an unbroken message – and promptly broke it.” This, and further examples may be found in this obituary

[b] Pictures comparing the movie cast and the real people may be found here. Continue reading

## Midnight With Birnbaum (Happy New Year)

**Just as in the past 3 years since I’ve been blogging, I revisit that spot in the road at 11p.m.*,just outside the Elbar Room, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. I wonder if they’ll come for me this year, given that my Birnbaum article is out… This is what the place I am taken to looks like. [It’s 6 hrs later here, so I’m about to leave…]**

You know how in that (not-so) recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf? He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve ~~2011~~ ~~2012~~, ~~2013~~, 2014) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] There are a couple of brief (12/31/14) updates at the end.

ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics. I happen to be writing on your famous argument about the likelihood principle (LP). (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii] Sorry,…I know it’s famous…

BIRNBAUM: Well, I shall happily invite you to take any case that violates the LP and allow me to demonstrate that the frequentist is led to inconsistency, provided she also wishes to adhere to the WLP and sufficiency (although less than S is needed).

ERROR STATISTICIAN: Well I happen to be a frequentist (error statistical) philosopher; I have recently (2006) found a hole in your proof,..er…well I hope we can discuss it.

BIRNBAUM: Well, well, well: I’ll bet you a bottle of Elba Grease champagne that I can demonstrate it! Continue reading

## Has Philosophical Superficiality Harmed Science?

I have been asked what I thought of some criticisms of the scientific relevance of philosophy of science, as discussed in the following snippet from a recent *Scientific American* blog. My title elicits the appropriate degree of ambiguity, I think.

Quantum Gravity Expert Says “Philosophical Superficiality” Has Harmed PhysicsBy John Horgan | August 21, 2014 | 14

“I interviewed Rovelli by phone in the early 1990s when I was writing a story for

Scientific Americanabout loop quantum gravity, a quantum-mechanical version of gravity proposed by Rovelli, Lee Smolin and Abhay Ashtekar[i]

Horgan: What’s your opinion of the recent philosophy-bashing by Stephen Hawking, Lawrence Krauss and Neil deGrasse Tyson?

Rovelli: Seriously: I think they are stupid in this. I have admiration for them in other things, but here they have gone really wrong. Look: Einstein, Heisenberg, Newton, Bohr…. and many many others of the greatest scientists of all times, much greater than the names you mention, of course, read philosophy, learned from philosophy, and could have never done the great science they did without the input they got from philosophy, as they claimed repeatedly. You see: the scientists that talk philosophy down are simply superficial: they have a philosophy (usually some ill-digested mixture of Popper and Kuhn) and think that this is the “true” philosophy, and do not realize that this has limitations.Here is an example: theoretical physics has not done great in the last decades. Why? Well, one of the reasons, I think, is that it got trapped in a wrong philosophy: the idea that you can make progress by guessing new theory and disregarding the qualitative content of previous theories. This is the physics of the “why not?” Why not studying this theory, or the other? Why not another dimension, another field, another universe? Science has never advanced in this manner in the past. Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories. Quite remarkably, the best piece of physics done by the three people you mention is Hawking’s black-hole radiation, which is exactly this. But most of current theoretical physics is not of this sort. Why? Largely because of the philosophical superficiality of the current bunch of scientists.”

I find it intriguing that Rovelli suggests that “Science does not advance by guessing. It advances by new data or by a deep investigation of the content and the apparent contradictions of previous empirically successful theories.” I think this is an interesting and subtle claim with which I agree. Continue reading

## Putting the brakes on the breakthrough: An informal look at the argument for the Likelihood Principle

Friday, May 2, 2014, I will attempt to present my critical analysis of the Birnbaum argument for the (strong) Likelihood Principle, so as to be accessible to a general philosophy audience (flyer below). Can it be done? I don’t know yet, this is a first. It will consist of:

**Example 1**: Trying and Trying Again: Optional stopping**Example 2:**Two instruments with different precisions

[you shouldn’t get credit (or blame) for something you didn’t do]**The Breakthough:**Birnbaumization**Imaginary dialogue**with Allan Birnbaum

The full paper is here. My discussion takes several pieces a reader can explore further by searching this blog (e.g., under SLP, brakes e.g., here, Birnbaum, optional stopping). I will post slides afterwards.

## Midnight With Birnbaum (Happy New Year)

**Just as in the past 2 years since I’ve been blogging, I revisit that spot in the road, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. There are a couple of brief (12/31/13) updates at the end. **

You know how in that (not-so) recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf? He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve ~~2011~~ ~~2012~~, 2013) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i]

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii] Sorry,…I know it’s famous…

BIRNBAUM: Well, I shall happily invite you to take any case that violates the LP and allow me to demonstrate that the frequentist is led to inconsistency, provided she also wishes to adhere to the WLP and sufficiency (although less than S is needed).

ERROR STATISTICIAN: Well I happen to be a frequentist (error statistical) philosopher; I have recently (2006) found a hole in your proof,..er…well I hope we can discuss it.

BIRNBAUM: Well, well, well: I’ll bet you a bottle of Elba Grease champagne that I can demonstrate it!

ERROR STATISTICAL PHILOSOPHER: It is a great drink, I must admit that: I love lemons.

BIRNBAUM: OK. (A waiter brings a bottle, they each pour a glass and resume talking). Whoever wins this little argument pays for this whole bottle of vintage Ebar or Elbow or whatever it is Grease.

ERROR STATISTICAL PHILOSOPHER: I really don’t mind paying for the bottle.

BIRNBAUM: Good, you will have to. Take any LP violation. Let x’ be 2-standard deviation difference from the null (asserting m = 0) in testing a normal mean from the fixed sample size experiment E’, say n = 100; and let x” be a 2-standard deviation difference from an optional stopping experiment E”, which happens to stop at 100. Do you agree that:

(0) For a frequentist, outcome x’ from E’ (fixed sample size) is NOT evidentially equivalent to x” from E” (optional stopping that stops at n)

ERROR STATISTICAL PHILOSOPHER: Yes, that’s a clear case where we reject the strong LP, and it makes perfect sense to distinguish their corresponding p-values (which we can write as p’ and p”, respectively). The searching in the optional stopping experiment makes the p-value quite a bit higher than with the fixed sample size. For n = 100, data x’ yields p’= ~.05; while p” is ~.3. Clearly, p’ is not equal to p”, I don’t see how you can make them equal. Continue reading

## Saturday night comedy from a Bayesian diary (rejected post*)

*See “rejected posts”.

## Lucien Le Cam: “The Bayesians hold the Magic”

Today is Lucien Le Cam’s birthday. He was an error statistician whose remarks in an article, “A Note on Metastatisics,” in a collection on foundations of statistics (Le Cam 1977)* had some influence on me. A statistician at Berkeley, Le Cam was a co-editor with Neyman of the Berkeley Symposia volumes. I hadn’t mentioned him on this blog before, so here are some snippets from EGEK (Mayo, 1996, 337-8; 350-1) that begin with a snippet from a passage from Le Cam (1977) (Here I have fleshed it out):

“One of the claims [of the Bayesian approach] is that the experiment matters little, what matters is the likelihood function after experimentation. Whether this is true, false, unacceptable or inspiring, it tends to undo what classical statisticians have been preaching for many years: think about your experiment, design it as best you can to answer specific questions, take all sorts of precautions against selection bias and your subconscious prejudices. It is only at the design stage that the statistician can help you.

Another claim is the very curious one that if one follows the neo-Bayesian theory strictly one would not randomize experiments….However, in this particular case the injunction against randomization is a typical product of a theory which ignores differences between experiments and experiences and refuses to admit that there is a difference between events which are made equiprobable by appropriate mechanisms and events which are equiprobable by virtue of ignorance. …

In spite of this the neo-Bayesian theory places randomization on some kind of limbo, and thus attempts to distract from the classical preaching that double blind randomized experiments are the only ones really convincing.

There are many other curious statements concerning confidence intervals, levels of significance, power, and so forth. These statements are only confusing to an otherwise abused public”. (Le Cam 1977, 158)

Back to EGEK:

Why does embracing the Bayesian position tend to undo what classical statisticians have been preaching? Because Bayesian and classical statisticians view the task of statistical inference very differently,

In [chapter 3, Mayo 1996] I contrasted these two conceptions of statistical inference by distinguishing evidential-relationship or E-R approaches from testing approaches, … .

The E-R view is modeled on deductive logic, only with probabilities. In the E-R view, the task of a theory of statistics is to say, for given evidence and hypotheses, how well the evidence confirms or supports hypotheses (whether absolutely or comparatively). There is, I suppose, a certain confidence and cleanness to this conception that is absent from the error-statistician’s view of things. Error statisticians eschew grand and unified schemes for relating their beliefs, preferring a hodgepodge of methods that are truly ampliative. Error statisticians appeal to statistical tools as protection from the many ways they know they can be misled by data as well as by their own beliefs and desires. The value of statistical tools for them is to develop strategies that capitalize on their knowledge of mistakes: strategies for collecting data, for efficiently checking an assortment of errors, and for communicating results in a form that promotes their extension by others.

Given the difference in aims, it is not surprising that information relevant to the Bayesian task is very different from that relevant to the task of the error statistician. In this section I want to sharpen and make more rigorous what I have already said about this distinction.

…. the secret to solving a number of problems about evidence, I hold, lies in utilizing—formally or informally—the error probabilities of the procedures generating the evidence. It was the appeal to severity (an error probability), for example, that allowed distinguishing among the well-testedness of hypotheses that fit the data equally well… .

A few pages later in a section titled “*Bayesian Freedom, Bayesian Magic” (350-1):*

A big selling point for adopting the LP (strong likelihood principle), and with it the irrelevance of stopping rules, is that it frees us to do things that are sinful and forbidden to an error statistician.“This irrelevance of stopping rules to statistical inference restores a simplicity and freedom to experimental design that had been lost by classical emphasis on significance levels (in the sense of Neyman and Pearson). . . . Many experimenters would like to feel free to collect data until they have either conclusively proved their point, conclusively disproved it, or run out of time, money or patience … Classical statisticians … have frowned on [this]”. (Edwards, Lindman, and Savage 1963, 239)

^{1}Breaking loose from the grip imposed by error probabilistic requirements returns to us an appealing freedom.

Le Cam, … hits the nail on the head:

“It is characteristic of [Bayesian approaches] [2] . . . that they … tend to treat experiments and fortuitous observations alike. In fact, the main reason for their periodic return to fashion seems to be that they claim to hold the magic which permits [us] to draw conclusions from whatever data and whatever features one happens to notice”. (Le Cam 1977, 145)

In contrast, the error probability assurances go out the window if you are allowed to change the experiment as you go along. Repeated tests of significance (or sequential trials) are permitted, are even desirable for the error statistician; but a penalty must be paid for perseverance—for optional stopping. Before-trial planning stipulates how to select a small enough significance level to be on the lookout for at each trial so that the overall significance level is still low. …. Wearing our error probability glasses—glasses that compel us to see how certain procedures alter error probability characteristics of tests—we are forced to say, with Armitage, that “Thou shalt be misled if thou dost not know that” the data resulted from the try and try again stopping rule. To avoid having a high probability of following false leads, the error statistician must scrupulously follow a specified experimental plan. But that is because we hold that error probabilities of the procedure alter what the data are saying—whereas Bayesians do not. The Bayesian is permitted the luxury of optional stopping and has nothing to worry about. The Bayesians hold the magic.

Or is it voodoo statistics?

When I sent him a note, saying his work had inspired me, he modestly responded that he doubted he could have had all that much of an impact.

_____________

*I had forgotten that this *Synthese* (1977) volume on foundations of probability and statistics is the one dedicated to the memory of Allan Birnbaum after his suicide: “By publishing this special issue we wish to pay homage to professor Birnbaum’s penetrating and stimulating work on the foundations of statistics” (Editorial Introduction). In fact, I somehow had misremembered it as being in a Harper and Hooker volume from 1976. The *Synthese* volume contains papers by Giere, Birnbaum, Lindley, Pratt, Smith, Kyburg, Neyman, Le Cam, and Kiefer.

REFERENCES:

*Journal of the Royal Statistical Society (B)*23:1-37.

_______(1962). Contribution to discussion in *The foundations of statistical inference*, edited by L. Savage. London: Methuen.

_______(1975). *Sequential Medical Trials*. 2nd ed. New York: John Wiley & Sons.

Edwards, W., H. Lindman & L. Savage (1963) Bayesian statistical inference for psychological research. *Psychological Review* 70: 193-242.

Le Cam, L. (1974). J. Neyman: on the occasion of his 80th birthday. *Annals of Statistics*, Vol. 2, No. 3 , pp. vii-xiii, (with E.L. Lehmann).

Le Cam, L. (1977). A note on metastatistics or “An essay toward stating a problem in the doctrine of chances.” *Synthese* 36: 133-60.

Le Cam, L. (1982). A remark on empirical measures in *Festschrift in the honor of E. Lehmann. *P. Bickel, K. Doksum & J. L. Hodges, Jr. eds., Wadsworth pp. 305-327.

Le Cam, L. (1986). The central limit theorem around 1935. *Statistical Science*, Vol. 1, No. 1, pp. 78-96.

Le Cam, L. (1988) Discussion of “The Likelihood Principle,” by J. O. Berger and R. L. Wolpert. IMS Lecture Notes Monogr. Ser. 6 182–185. IMS, Hayward, CA

Le Cam, L. (1996) Comparison of experiments: A short review. In *Statistics, Probability and Game Theory. Papers in Honor of David Blackwell* 127–138. IMS, Hayward, CA.

Le Cam, L., J. Neyman and E. L. Scott (Eds). (1973). *Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability*, Vol. l: *Theory of Statistics*, Vol. 2: *Probability Theory*, Vol. 3: *Probability Theory*. Univ. of Calif. Press, Berkeley Los Angeles.

Mayo, D. (1996). [EGEK] *Error Statistics and the Growth of Experimental Knowledge. *Chicago: University of Chicago Press. (Chapter 10; Chapter 3)

Neyman, J. and L. Le Cam (Eds). (1967). *Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability*, Vol. I: *Statistics*, Vol. II: *Probability* Part I & Part II. Univ. of Calif. Press, Berkeley and Los Angeles.

[1] For some links on optional stopping on this blog: Highly probably vs highly probed: Bayesian/error statistical differences.; Who is allowed to cheat? I.J. Good and that after dinner comedy hour….; New Summary; Mayo: (section 7) “StatSci and PhilSci: part 2″; After dinner Bayesian comedy hour….; Search for more, if interested.

[2] Le Cam is alluding mostly to Savage, and (what he called) the “neo-Bayesian” accounts.

## Forthcoming paper on the strong likelihood principle

My paper, “On the Birnbaum Argument for the Strong Likelihood Principle” has been accepted by *Statistical Science*. The latest version is here. (It differs from all versions posted anywhere). If you spot any typos, please let me know (error@vt.edu). If you can’t open this link, please write to me and I’ll send it directly. As always, comments and queries are welcome.

I appreciate considerable feedback on SLP on this blog. Interested readers may search this blog for quite a lot of discussion of the SLP (e.g., here and here) including links to the central papers, “U-Phils” (commentaries) by others (e.g., here, here, and here), and amusing notes (e.g., Don’t Birnbaumize that experiment my friend, and Midnight with Birnbaum), and more…..

Abstract: An essential component of inference based on familiar frequentist notions, such as p-values, significance and confidence levels, is the relevant sampling distribution. This feature results in violations of a principle known as the strong likelihood principle (SLP), the focus of this paper. In particular, if outcomes

x^{∗}andy^{∗}from experimentsE_{1}andE_{2}(both with unknown parameterθ), have different probability modelsf_{1}( . ),f_{2}( . ), then even thoughf_{1}(x^{∗};θ) = cf_{2}(y^{∗};θ) for allθ, outcomesx^{∗}andy^{∗}may have different implications for an inference aboutθ. Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox (1958) proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known whichEproduced the measurement, the assessment should be in terms of the properties of_{i}E. The surprising upshot of Allan Birnbaum’s (1962) argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP), entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases also refute [WCP entails SLP]._{i}

Key words:Birnbaumization, likelihood principle (weak and strong), sampling theory, sufficiency, weak conditionality

** **

## Highly probable vs highly probed: Bayesian/ error statistical differences

A reader asks: “Can you tell me about disagreements on numbers between a severity assessment within error statistics, and a Bayesian assessment of posterior probabilities?” Sure.

There are differences between Bayesian posterior probabilities and formal error statistical measures, as well as between the latter and a severity (SEV) assessment, which differs from the standard type 1 and 2 error probabilities, p-values, and confidence levels—despite the numerical relationships. Here are some random thoughts that will hopefully be relevant for both types of differences. (Please search this blog for specifics.)

1. The most noteworthy difference is that error statistical inference makes use of outcomes other than the one observed, even after the data are available: there’s no other way to ask things like, how often would you find 1 nominally statistically significant difference in a hunting expedition over *k* or more factors? Or to distinguish optional stopping with sequential trials from fixed sample size experiments. Here’s a quote I came across just yesterday:

“[S]topping ‘when the data looks good’ can be a serious error when combined with frequentist measures of evidence. For instance, if one used the stopping rule [above]…but analyzed the data as if a

fixedsample had been taken, one couldguaranteearbitrarily strong frequentist ‘significance’ againstH_{0}.” (Berger and Wolpert, 1988, 77).

The worry about being guaranteed to erroneously exclude the true parameter value here is an error statistical affliction that the Bayesian is spared (even though I don’t think they can be too happy about it, especially when HPD intervals are assured of excluding the true parameter value.) See this post for an amusing note; Mayo and Kruse (2001) below; and, if interested, search the (strong) likelihood principle, and Birnbaum.

2. *Highly probable vs. highly probed*. SEV doesn’t obey the probability calculus: for any test T and outcome ** x**, the severity for both

*H*and ~

*H*might be horribly low. Moreover, an error statistical analysis is not in the business of probabilifying hypotheses but evaluating and controlling the capabilities of methods to discern inferential flaws (problems with linking statistical and scientific claims, problems of interpreting statistical tests and estimates, and problems of underlying model assumptions). This is the basis for applying what may be called the Severity principle. Continue reading

## Blogging (flogging?) the SLP: Response to Reply- Xi’an Robert

Christian Robert’s *reply* grows out of my last blogpost. On Xi’an’s Og :

A quick reply from my own Elba, in the Dolomiti: your arguments (about the sad consequences of the SLP) are not convincing wrt the derivation of SLP=WCP+SP. If I built a procedure that reports (E

_{1},x*) whenever I observe (E_{1},x*) or (E_{2},y*), this obeys the sufficiency principle; doesn’t it? (Sorry to miss your talk!)

Mayo’s response to Xi’an on the “sad consequences of the SLP.”[i]

This is a useful reply (so to me it’s actually not ‘flogging’ the SLP[ii]), and, in fact, I think Xi’an will now see why my arguments are convincing! Let’s use Xi’an’s procedure to make a parametric inference about q. Getting the report x* from Xi’an’s procedure, we know it could have come from E_{1} or E_{2}. In that case, the WCP forbids us from using either individual experiment to compute the inference implication. We use the sampling distribution of T_{B}.

Birnbaum’s statistic T_{B} is a technically sufficient statistic for Birnbaum’s experiment E_{B } (the conditional distribution of Z given T_{B} is independent of q). The question of whether this is the relevant or legitimate way to compute the inference when it is given that y* came from E_{2 }is the big question. The WCP says it is not. Now you are free to use Xi’an’s procedure (free to Birnbaumize) but that does not yield the SLP. Nor did Birnbaum think it did. That’s why he goes on to say: “Never mind. Don’t use Xi’an’s procedure. Compute the inference using E_{2 } just as the WCP tells you to. You know it came from E_{2 }. Isn’t that what David Cox taught us in 1958?”

Fine. But still no SLP! Note it’s not that SP and WCP conflict, it’s WCP and Birnbaumization that conflict. The application of a principle will always be relative to the associated model used to frame the question.[iii]

These points are all spelled out clearly in my paper: [I can’t get double subscripts here. E_{B }is the same as E-B][iv]

Given

y*, the WCP says do not Birnbaumize. One is free to do so, but not to simultaneously claim to hold the WCP in relation to the giveny*, on pain of logical contradiction. If one does choose to Birnbaumize, and to construct T_{B}, admittedly, the known outcomey* yields the same value of T_{B}as wouldx*. Using the sample space of E_{B}yields: (B): Infr_{E-B}[x*] = Infr_{E-B}[y*]. This is based on the convex combination of the two experiments, and differs from both Infr_{E1}[x*] and Infr_{E2}[y*]. So again, any SLP violation remains. Granted, if only the value of T_{B}is given, using Infr_{E-B}may be appropriate. For then we are given only the disjunction: Either (E_{1},x*) or (E_{2},y*). In that case one is barred from using the implication from either individual E_{i}. A holder of WCP might put it this way: once (E,z) is given, whether E arose from a q-irrelevant mixture, or was fixed all along, should not matter to the inference; but whether a result was Birnbaumized or not should, and does, matter.There is no logical contradiction in holding that if data are analyzed one way (using the convex combination in E

_{B}), a given answer results, and if analyzed another way (via WCP) one gets quite a different result. One may consistently apply both the E_{B }and the WCP directives to the same result, in the same experimental model, only in cases where WCP makes no difference. To claim the WCP never makes a difference, however, would entail that there can be no SLP violations, which would make the argument circular. Another possibility, would be to hold, as Birnbaum ultimately did, that the SLP is “clearly plausible” (Birnbaum 1968, 301) only in “the severely restricted case of a parameter space of just two points” where these are predesignated (Birnbaum 1969, 128). But SLP violations remain.

Note: The final draft of my paper uses equations that do not transfer directly to this blog. Hence, these sections are from a draft of my paper.

[i] Although I didn’t call them “sad,” I think it would be too bad to accept the SLP’s consequences. Listen to Birnbaum:

The likelihood principle is incompatible with the main body of modern statistical theory and practice, notably the Neyman-Pearson theory of hypothesis testing and of confidence intervals, and incompatible in general even with such well-known concepts as standard error of an estimate and significance level. (Birnbaum 1968, 300)

That is why Savage called it “a breakthrough” result. In the end, however, Birnbaum could not give up on control of error probabilities. He held the SLP only for the trivial case of predesignated simple hypotheses. (Or, perhaps he spied the gap in his argument? I suspect, from his writings, that he realized his argument went through only for such cases that do not violate the SLP.)

[ii] Readers may feel differently.

[iii] Excerpt from a draft of my paper:

*Model checking. *An essential part of the statements of the principles SP, WCP, and SLP is that the validity of the model is granted as adequately representing the experimental conditions at hand (Birnbaum 1962, 491). Thus, accounts that adhere to the SLP are not thereby prevented from analyzing features of the data such as residuals, which are relevant to questions of checking the statistical model itself. There is some ambiguity on this point in Casella and R. Berger (2002):

Most model checking is, necessarily, based on statistics other than a sufficient statistic. For example, it is common practice to examine residuals from a model. . . Such a practice immediately violates the Sufficiency Principle, since the residuals are not based on sufficient statistics. (Of course such a practice directly violates the [strong] LP also.) (Casella and R. Berger 2002, 295-6)

They warn that before considering the SLP and WCP, “we must be comfortable with the model” (296). It seems to us more accurate to regard the principles as inapplicable, rather than violated, when the adequacy of the relevant model is lacking.

Birnbaum, A.1968. “Likelihood.” In *International Encyclopedia of the Social Sciences*, 9:299–301. New York: Macmillan and the Free Press.

———. 1969. “Concepts of Statistical Evidence.” In *Philosophy, Science, and Method: Essays in Honor of Ernest Nagel*, edited by S. Morgenbesser, P. Suppes, and M. G. White, 112–143. New York: St. Martin’s Press.

Casella, G., and R. L. Berger. 2002. *Statistical Inference*. 2nd ed. Belmont, CA: Duxbury Press.

Mayo 2013, (http://arxiv-web3.library.cornell.edu/pdf/1302.7021v2.pdf)

## New Version: On the Birnbaum argument for the SLP: Slides for my JSM talk

In my latest formulation of the controversial Birnbaum argument for the strong likelihood principle (SLP), I introduce a new symbol to represent a function from a given experiment-outcome pair, (E,**z**) to a generic inference implication. This should clarify my argument (see my new paper).

(E,**z**) Infr_{E}(**z**) is to be read “the inference implication from outcome **z** in experiment E” (according to whatever inference type/school is being discussed).

*A draft of my slides for the Joint Statistical Meetings JSM in Montreal next week are right after the abstract. Comments are very welcome.*

Interested readers may search this blog for quite a lot of discussion of the SLP (e.g., here and here) including links to the central papers, “U-Phils” by others (e.g., here, here, and here), and amusing notes (e.g., Don’t Birnbaumize that experiment my friend, and Midnight with Birnbaum).

**On the Birnbaum Argument for the Strong Likelihood Principle**

**Abstract**

An essential component of inference based on familiar frequentist notions p-values, significance and confidence levels, is the relevant sampling distribution (hence the term *sampling theory*). This feature results in violations of a principle known as the *strong likelihood principle* (SLP), the focus of this paper. In particular, if outcomes **x*** and **y*** from experiments E_{1} and E_{2} (both with unknown parameter θ), have different probability models f_{1}, f_{2}, then even though f_{1}(**x***; θ) = cf_{2}(**y***; θ) for all θ, outcomes **x*** and** y*** may have different implications for an inference about θ**. **Although such violations stem from considering outcomes other than the one observed, we argue, this does not require us to consider experiments other than the one performed to produce the data. David Cox (1958) proposes the Weak Conditionality Principle (WCP) to justify restricting the space of relevant repetitions. The WCP says that once it is known which E_{i }produced the measurement, the assessment should be in terms of the properties of the particular E_{i}.

The surprising upshot of Allan Birnbaum’s (1962) argument is that the SLP appears to follow from applying the WCP in the case of mixtures, and so uncontroversial a principle as sufficiency (SP). But this would preclude the use of sampling distributions. The goal of this article is to provide a new clarification and critique of Birnbaum’s argument. Although his argument purports that [(WCP and SP) entails SLP], we show how data may violate the SLP while holding both the WCP and SP. Such cases directly refute [WCP entails SLP].

Comments, questions, errors are welcome.

Full paper can be found here: http://arxiv-web3.library.cornell.edu/abs/1302.7021

## Mark Chang (now) gets it right about circularity

Mark Chang wrote a comment this evening, but it is buried back on my Nov. 31 post in relation to the current U-Phil. Given all he has written on my attempt to “break through the breakthrough”, I thought to bring it up to the top. Chang ends off his comment with the sagacious, and entirely correct claim that so many people have missed:

**“What Birnbaum actually did was use the SLP to prove the SLP – as simple as that!” (Mark Chang)**

It is just too bad that readers of his (2013) book will not have been told this*! *Mark: Can you issue a correction? I definitely think you should! If only you’d written to me, I could have pointed this out pre-pub.*

That Birnbaum’s argument assumes what it claims to prove is just what I have been arguing all along. It is called a begging-the-question fallacy: An argument that boils down to:

A/therefore A

Such an argument is logically valid, and that is why formal validity does not mean much for getting conclusions accepted. Why? Well, even though such circular arguments are usually dressed up so that the premises do not so obviously repeat the conclusion, they are similarly fallacious: the truth of the premises already assumes the truth of the conclusion. If we are allowed to argue that way, you can argue anything you like! To not-A as well. That is not what the Great “Breakthrough” was supposed to be doing.

Chang’s comment (which is the same one he posted on Xi’an’s og here) also includes his other points, but fortunately, Jean Miller has recently gone through those in depth. In neither of my (generous) construals of Birnbaum do I claim his premises are inconsistent, by the way.

*But instead his readers are led to believe my criticism is flawed because of something about sufficiency having to do with a FAMILY of distributions (his caps on “family”, p. 138). This all came up as well in Xi”an’s og.

Chang, M. (2013) *Paradoxes in Scientific Inference.*

## U-Phil: Ton o’ Bricks

Birnbaum’s argument for the SLP involves some equivocations that are at once subtle and blatant. The subtlety makes it hard to translate into symbolic logic (I only partially translated it). Philosophers should have a field day with this, and I should be hearing more reports that it has suddenly hit them between the eyes like a ton of bricks, to use a mixture metaphor. Here are the key bricks. References can be found in here, background to the U-Phil here..

**Famous (mixture) weighing machine example and the WLP**** **

The main principle of evidence on which Birnbaum’s argument rests is the *weak conditionality principle *(WCP). This principle, Birnbaum notes, follows not from mathematics alone but from intuitively plausible views of “evidential meaning.” To understand the interpretation of the WCP that gives it its plausible ring, we consider its development in “what is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992).

*The basis for the WCP *

**Example 3. ***Two measuring instruments of different precisions. *We flip a fair coin to decide which of two instruments, E’ or E”, to use in observing a normally distributed random sample **X** to make inferences about mean q. E*’ *has a known variance of 10^{−4}, while that of E” is known to be 10^{4}. The experiment is a mixture: E-mix. The fair coin or other randomizer may be characterized as observing an indicator statistic J, taking values 1 or 2 with probabilities .5, independent of the process under investigation. The full data indicates first the result of the coin toss, and then the measurement: (E^{j}, **x**^{j}).[i]

The sample space of E-mix with components E^{j}, j = 1, 2, consists of the union of

{(j,** x’**): j = 0, possible values of** X’**} and {(j, **x**”): j = 1, possible values of **X**”}.

In testing a null hypothesis such as q = 0, the same **x** measurement would correspond to a much smaller p-value were it to have come from E′ than if it had come from E”: denote them as p′(**x**) and p′′(**x**), respectively. However, the overall significance level of the mixture, the convex combination of the p-value: [p′(**x**) + p′′(**x**)]/2, would give a misleading report of the precision or severity of the actual experimental measurement (See Cox and Mayo 2010, 296).

Suppose that we know we have observed a measurement from E” with its much larger variance:

The unconditional test says that we can assign this a higher level of significance than we ordinarily do, because if we were to repeat the experiment, we might sample some quite different distribution. But this fact seems irrelevant to the interpretation of an observation which we know came from a distribution [with the larger variance] (Cox 1958, 361).

In effect, an individual unlucky enough to use the imprecise tool gains a more informative assessment because he might have been lucky enough to use the more precise tool! (Birnbaum 1962, 491; Cox and Mayo 2010, 296). Once it is known whether E′ or E′′ has produced **x**, the p-value or other inferential assessment should be made conditional on the experiment actually run.

*Weak Conditionality Principle (WCP):*** **If a mixture experiment is performed, with components E’, E” determined by a randomizer (independent of the parameter of interest), then once (E’,** x’**) is known, inference should be based on E’ and its sampling distribution, not on the sampling distribution of the convex combination of E’ and E”.

*Understanding the WCP*

The WCP includes a prescription and a proscription for the proper evidential interpretation of** x’**, once it is known to have come from E’:

The evidential meaning of any outcome (E’,** x’**) of any experiment E having a mixture structure is the same as: the evidential meaning of the corresponding outcome** x’** of the corresponding component experiment E’*, ignoring otherwise the over-all structure of the original experiment *E (Birnbaum 1962, 489 E_{h} and x_{h} replaced with E’ and x’ for consistency).

While the WCP seems obvious enough, it is actually rife with equivocal potential. To avoid this, we spell out its three assertions.

*First*, it applies once we know which component of the mixture has been observed, and what the outcome was (E^{j} **x**^{j}). (Birnbaum considers mixtures with just two components).

*Second*, there is the prescription about evidential equivalence. Once it is known that E^{j} has generated the data, given that our inference is about a parameter of E^{j}, inferences are appropriately drawn in terms of the distribution in E^{j }—the experiment known to have been performed.

*Third*, there is the proscription. In the case of informative inferences about the parameter of E^{j} our inference should not be influenced by whether the decision to perform E^{j} was determined by a coin flip or fixed all along. Misleading informative inferences might result from averaging over the convex combination of E^{j} and an experiment known not to have given rise to the data. The latter may be called the unconditional (sampling) distribution. ….

*______________________________________________*

*One crucial equivocation: *

* *Casella and R. Berger (2002) write:

The [weak] Conditionality principle simply says that if one of two experiments is randomly chosen and the chosen experiment is done, yielding data **x**, the information about *q* depends only on the experiment performed. . . . *The fact that this experiment was performed, rather than some other, has not increased, decreased, or changed knowledge of **q**. *(p. 293, emphasis added)

I have emphasized the last line in order to underscore a possible equivocation. Casella and Berger’s intended meaning is the correct claim:

(i) Given that it is known that measurement **x**’ is observed as a result of using tool E’, then it does not matter (and it need not be reported) whether or not E’ was chosen by a random toss (that might have resulted in using tool E”) or had been fixed all along.

Of course we do not know what measurement would have resulted had the unperformed measuring tool been used.

Compare (i) to a false and unintended reading:

(ii) If some measurement **x** is observed, then it does not matter (and it need not be reported) whether it came from a precise tool E’ or imprecise tool E”.

The idea of detaching **x**, and reporting that “**x** came from somewhere I know not where,” will not do. For one thing, we need to know the experiment in order to compute the sampling inference. For another, E’ and E” may be like our weighing procedures with very different precisions. It is analogous to being given the likelihood of the result in Example 1,(here) withholding whether it came from a negative binomial or a binomial.

Claim (i), by contrast, may well be warranted, not on purely mathematical grounds, but as the most appropriate way to report the precision of the result attained, as when the WCP applies. The essential difference in claim (i) is that it is known that (E, **x**’), enabling its inferential import to be determined.

The linguistic similarity of (i) and (ii) may explain the equivocation that vitiates the Birnbaum argument.

Now go back and skim 3 short pages of notes here, pp 11-14, and it should hit you like a ton of bricks! If so, reward yourself with a double Elba Grease, else try again. Report your results in the comments.

## U-Phil: J. A. Miller: Blogging the SLP

*Jean A. Miller, PhD*

*Department of Philosophy
Virginia Tech*

*MIX & MATCH MESS: A NOTE ON A MISLEADING DISCUSSION OF MAYO’S BIRNBAUM PAPER*

Mayo in her “rejected” post (12/27/12) briefly points out how Mark Chang, in his book *Paradoxes of Scientific Inference* (2012, pp. 137-139), took pieces from the two distinct variations she gives of Birnbaum’s arguments, either of which shows the unsoundness of Birnbaum’s purported proof, and illegitimately combines them. He then mistakenly maintains that it is Mayo’s conclusions that are “faulty” rather than Birnbaum’s argument. In this note, I just want to fill in some of the missing pieces of what is going on here, so that others will not be misled. I put together some screen shots so you can read exactly what he wrote pp. 137-139. (See also Mayo’s note to Chang on Xi’an’s blog here.) Continue reading

## U-Phil: S. Fletcher & N.Jinn

*“Model Verification and the Likelihood Principle” by Samuel C. Fletcher*

Department of Logic & Philosophy of Science (PhD Student)

*University of California, Irvine*

I’d like to sketch an idea concerning the applicability of the Likelihood Principle (LP) to non-trivial statistical problems. What I mean by “non-trivial statistical problems” are those involving substantive modeling assumptions, where there could be any doubt that the probability model faithfully represents the mechanism generating the data. (Understanding exactly how scientific models represent phenomena is subtle and important, but it will not be my focus here. For more, see http://plato.stanford.edu/entries/models-science/.) In such cases, it is crucial for the modeler to verify, inasmuch as it is possible, the sufficient faithfulness of those assumptions.

But the techniques used to verify these statistical assumptions are themselves statistical. One can then ask: do techniques of model verification fall under the purview of the LP? That is: are such techniques a part of the inferential procedure constrained by the LP? I will argue the following:

(1) If they are—what I’ll call the *inferential view* of model verification—then there will be in general no inferential procedures that satisfy the LP.

(2) If they are not—what I’ll call the *non-inferential view*—then there are aspects of any evidential evaluation that inferential techniques bound by the LP do not capture. Continue reading