This is a first draft of part II of the presentation begun in the December 6 blog post. This completes the proposed presentation. I expect errors, and I will be grateful for feedback! (NOTE: I did not need to actually rip a cover of EGEK to obtain this effect!)

**SEVEN:NOW FOR THE BREAKTHROUGH**

You have observed **y”**, the .05 significant result from E”,the optional stopping rule, ending at n = 100.

Birnbaum claims he can show that you, as a frequentist error statistician, must grant that it is equivalent to having fixed n= 100 at the start (i.e., experiment E’)

*Reminder***:**

The (strong) LikelihoodPrinciple (LP) *is a universal conditional claim: *

If two data sets * y’*and

*from experiments E’ and E” respectively, have likelihood functions which are functions of the same parameter(s) µ*

**y”**and are proportional to each other, then* y’ *and

*should lead to identical inferential conclusions about µ*

**y”**As with conditional proofs,we assume the antecedent and try to derive the consequent, or equivalently, show a contradiction results whenever the antecedent holds and the consequent does not (reductio proof).

*Let’s do the reductio proof:*

**LP Violation Pairs**

Start with any violation ofthe LP, that is, a case where the antecedent of the LP holds, and the consequent does not hold, and show you get a contradiction.

Assume then that the pair of outcomes **y’** and **y”**, from E’ and E” respectively, represent a violation of the LP. We may call them *LP pairs.*

*Step 1: *

Birnbaum will describe a funny kind of ‘mixture’ experiment based on an LP pair; You observed y” say from experiment E”.

Having observed y” from the optional stopping (stopped say at n = 100) I am to imagine it resulted from getting heads on the toss of a fair coin, where tails would have meant performing the fixed sample size experiment with n = 100 from the start.

Next, erase the fact that **y”**came from E” and report (**y’**, E’)

Call this test statistic: T_{BB}:

**The Birnbaum test statistic T _{BB}:**

*Case 1: If you observe y” (from E”) and y” has an LP pair in E’, just report (y’, E’)*

Case 2: If your observed outcome does not have an LP pair, just report it as usual

(any outcome from optional stopping E” has an LP pair in the corresponding fixed sample size experiment E’)

*Only case 1 results matter for the points of the proof we need to consider.*

I said it was a funny kind of mixture, there are two things that make it funny:

- It didn’t happen, you only observed
**y”**from E” - Second, you are to report an outcome as
**y’**from E’ even though you actually observed**y”**from E” (and further,you are to report the mixture)

We may call it *Birnbaumizing* the result you got; whenever you have observed a potential LP violation, “Birnbaumize” it as above.

If you observe **y”** (from E”)and **y”** has an LP pair in E’, just report **y’** (i.e., report (**y’**, E’)

*So you’d report this whether you actually observed y’ or if you got y”*—————————–

We said our inference would be in the form of p-values

Now to obtain the p-value we must use the defined sampling distribution of *T _{BB}*—the convex combination:

- In reporting a p-valueassociated with y” we are to report the average of p’ and p”: (p’ + p”)/2.

(the ½ comes from the imagined fair coin)

Having thus “Birnbaumized”the particular LP pair that you actually observed, it appears that you must treat **y’** as evidentially equivalent to its LP pair, **y”**.

The test statistic T_{BB }is a *sufficient statistic*, technically, but the rest of the argument overlooks that an error statistician still must take into account the sampling distributions at each step.

At this step, it refers tothe distribution of T_{BB}.

But it changes in the second step, and that’s what dooms the ‘proof’, as we will now see.

0. Let **y’** and **y”** (from E’ andE”) be any LP violation pair, and say **y”** from E” has been observed

s= 1, **y’** = .196

1. * Premise 1: *Inferences from

**y’**and

**y”**, using the sampling distribution of the convex combination, are equivalent (Birnbaumization):

**Infr _{E’}(y’) is equal to Infr_{E”}(y”)** [both are equal to (p’ + p”)/2) ]

**2** * Premise2 (a)*: An inference from

**y’**using (i.e., conditioning on) the sampling distribution of E’ (the experiment that produced it), is p’

**Infr _{E’}(y’) equals p’**

* Premise 2 (b):* An inference from

**y”**using (i.e., conditioning on) the sampling distribution of E” (the experiment that produced it), is p”

**Infr _{E”}(y”) equals p”**

From (1), (2a and b): **Infr _{E’}(y’)equalsInfr_{E”}(y”)**

**Which is, or looks like the LP!**

It would follow of course that p’ equals p”!

But from (0), y’ and y” form a LP violation, so, p’ *is not equal to *p”.

p’ was .05, p” ~ .3

Thus it would appear the frequentist is led into a contradiction.

**The problem? ** There are different ways to show it, as always; here I allowed the premises to be true.

In that case this is an invalid argument, we have all true premises and a false conclusion.

I can consistently hold all the premises and the denial of the conclusion

1. the two outcomes get the same convex combo p-value if I play the Birnbaumization game

2. if I condition, the inferences from **y”** and **y’** are p” andp’, respectively

Denial of conclusion: p’ is not equal to p” (.05 is not equal to .3)

No contradiction.

We can put it in a valid form but then the premises can never both be true at the same time:

It’s not even so easy to put it in valid form (see my paper for several attempts):

*Premise 1: *Inferences from **y’**and** y”** are evidentially equivalent:

**Infr _{E’}(y’) is equal to Infr_{E”}(y”)**]

*Premise 2 (a)*: An inference from **y’** should use (i.e., conditioning on) the sampling distribution of E’ (the experiment that produced it)

**Infr _{E’}(y’) equals p’**

*Premise 2 (b):* An inference from **y”** should use (i.e., conditioning on) the sampling distributionof E” (the experiment that produced it):

**Infr _{E”}(y”) equals p”**

Usually the proofs just give the bold parts

From (1), (2a and b): **Infr _{E’}(y’)equals**

**Infr**

_{E”}(y”)**Which is the LP!**

Contradicting the assumption that **y’** and **y”** form an LP violation!

The problem now is this: in order to infer the conclusion the premises of the argument must be true, and it is impossible to have premises (1) and (2) true at the same time:

Premise (1) is true only if we use the sampling distribution given by the convex combinations (averaging over the LP pairs).

- This is the sampling distribution of T
_{BB}. - Yet to draw inferences using this sampling distribution renders both (2a) and (2b) false.
- The truth of (2a) and (2b) requires ‘conditioning’ on the experiment actually performed, or rather, they require we not ‘Birnbaumize’ the experiment from which the observed LP pair is known to have actually come!

I plan to give the audience a handout of chapter 7III, Error and Inference (Mayo and Cox 2010). Then I can point them to pages.

Although I have allowed premise (1) for the sake of argument, the very idea is extremely far-fetchedand unmotivated.[iii]

Pre-data, the frequentist would really need to consider all possible pairs that could be LP violationsand average over them….

It is worth noting that Birnbaum himself rejected the LP (Birnbaum 1969, 128): “Thus it seems that the likelihood concept cannot be construed so as to allow useful appraisal, and thereby possible control, of erroneous interpretations.”

References, to be added.

*I gratefully acknowledge Sir David Cox’s advice and encouragement on this and numerous earlier drafts.

[1]I will always mean the “strong” likelihood principle.

[2] In the context of error statistical inference, this is based on the particular statistic and sampling distribution specified by E.

[3] See EGEK, p.355 for discussion.

[ii] We think this captures the generally agreed upon meaning of the LP although statements may be found that seem stronger. For example, in Pratt, Raiffa, and Schlaifer, 1995:

If, in a given situation, two random variables are observable, and if the value * x* of the first and the value

*of the second give rise to the same likelihood function, then observing the value*

**y***of the first and observing the value*

**x***of the second are equivalent in the sense that*

**y***they should give the same inference, analysis, conclusion, decision, action, or anything else*. (Pratt, Raiffa, Schlaifer 1995, 542; emphasis added)

[iii] Cox thinks I should say more about the very idea of premise (1). He is right; but this is to be a very short talk, and this is not a short topic. References will be added shortly.

**REFERENCES (incomplete)**

Armitage, P. (1975). *Sequential Medical Trials*, 2^{nd} ed. New York: John Wiley & Sons.

Birnbaum, A. (1962). On the Foundations of Statistical Inference (with discussion), *Journal of the American Statistical Association*, **57**: 269–326.

Birnbaum. A. (1969). Concepts of Statistical Evidence. In *Philosophy, Science, and Method: Essays in Honor of Ernest Nagel, *edited by S. Morgernbesser, P. Suppes, and M. White, New York: St. Martin’s Press: 112-143.

Berger, J. O., and Wolpert, R.L. (1988). *The Likelihood Principle*, California Institute of Mathematical Statistics, Hayward, CA.

Cox, D.R. (1977). “The Role of Significance Tests (with Discussion),” *Scandinavian Journal of Statistics*, **4**: 49–70.

Cox D. R. and Mayo. D. (2010). “Objectivity and Conditionality in Frequentist Inference” in *Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science*, edited by D Mayo and A. Spanos, Cambridge: Cambridge University Press: 276-304.

Edwards, W., Lindman, H, and Savage, L. (1963). Bayesian Statistical Inference for Psychological Research, *Psychological Review*, **70**: 193-242.

Jaynes, E. T. (1976). Common Sense as an Interface. In *Foundations of Probability Theory, Statistical Inference and Statistical Theories of Science* Volume 2, edited by W. L. Harper and C.A. Hooker, Dordrect, The Netherlands: D.. Reidel: 218-257.

Joshi, V. M. (1976). “A Note on Birnbaum’s Theory of the Likelihood Principle.” *Journal of the American Statistical Association* **71**, 345-346.

Joshi, V. M. (1990). “Fallacy in the Proof of Birnbaum’s Theorem.” *Journal of Statistical Planning and Inference* **26**, 111-112.

Lindley D. V. (1976). Bayesian Statistics. In *Foundatioins of Probabilitiy theory, Statistical Inference and Statistical Theories of Science*, Volume 2, edited by W. L. Harper and C.A. Hooker, Dordrect, The Netherlands: D. Reidel: 353-362.

Mayo, D. (1996). *Error and the Growth of Experimental Knowledge*. The University of Chicago Press (Series in Conceptual Foundations of Science).

Mayo, D. (2010). “An Error in the Argument from Conditionality and Sufficiency to the LikelihoodPrinciple.” In *Error and Inference*: *Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science*, edited by D. Mayo and A. Spanos, Cambridge University Press. 305-314.

Mayo, D. and D. R. Cox. (2011) “Statistical Scientist Meets a Philosopher of Science: A Conversation with Sir David Cox.” *Rationality, Markets and Morals (RMM): Studies at the Intersection of Philosophy and Economics.** *Edited by M. Albert, H. Kliemt and B. Lahno. An open access journal published by the Frankfurt School: Verlag. Volume 2, (2011), 103-114.

Mayo D. and A. Spanos, eds. (2010). *Error and Inference*: *Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science*, Cambridge: Cambridge University Press.

Pratt, John W, H. Raffia and R. Schlaifer. (1995). *Introduction to Statistical Decision Theory*. Cambridge, MA: The MIT Press.

Savage, L., ed. (1962a), *The Foundations of Statistical Inference: A Discussion*. London: Methuen & Co.

Savage, L. (1962b), “‘Discussion on Birnbaum (1962),” *Journal of the American Statistical Association*, 57: 307–8.

Maybe I am naive but what justifies anyone in Birnbaumizing a result, i.e., acting as if it could have been the result of a mixture when the researcher knows it wasn’t the result of a mixture? and then treating it as if it came from a mixture, average over the experiment run and some make believe experiment that you didn’t do?

Good question. (As I mentioned, David Cox thinks I should make more of this.) It is a little logical game Birnbaum defines, and could not even be applied before seeing the data (unless you average over all LP violation pairs for “case 1” outcomes). It is to try to go in the opposite direction from weak conditioning in the case of genuine mixture tests. The reason is easy: By turning a pair (or more) of experiments into “one” experiment, the weak LP (or sufficiency) applies for a frequentist. Sorry to be dashing…