**U-Phil:** I would like to open up this post, together with Gandenberger’s (Oct. 30, 2012), to reader ** U-Phils, from December 6- 19** (< 1000 words) for posting on this blog (please see # at bottom of post). Where Gandenberger claims, “Birnbaum’s proof is valid and his premises are intuitively compelling,” I have shown that if Birnbaum’s premises are interpreted so as to be true, the argument is invalid. If construed as formally valid, I argue, the premises contradict each other. Who is right? Gandenberger doesn’t wrestle with my critique of Birnbaum,

*but I invite you (and Greg!) to do so.*I’m pasting a new summary of my argument below.

The main premises may be found on pp. 11-14. While these points are fairly straightforward (and do not require technical statistics), they offer an intriguing logical, statistical and linguistic puzzle. The following is an overview of my latest take on the Birnbaum argument. See also “Breaking Through the Breakthrough” posts: Dec. 6 and Dec 7, 2011.

Gandenberger also introduces something called the methodological likelihood principle. A related idea for a U-Phil is to ask: can one mount a sound, non-circular argument for that variant? And while one is at it, do his methodological variants of sufficiency and conditionality yield plausible principles?

*Graduate students and others invited!
*

______________________________________________________

**New Summary of Mayo Critique of Birnbaum’s Argument for the SLP**

*Deborah Mayo*

See also a (draft) of the full PAPER corresponding to this summary, a later and more satisfactory draft is here. Yet other links to the Strong Likelihood Principle SLP: Mayo 2010; Cox & Mayo 2011 (appendix).

Please alert me to corrections, not all the symbols transferred so well.

1.(SLP): For any two experiments E’ and E” with different probability models f’, f” but with the same unknown parameter θ, if the likelihood of outcomesx’* andx”* (from E’ and E” respectively) are proportional to each other, thenx’* andx”* should have the identical evidential import for any inference concerning parameter θ.

*SLP pairs*. When the antecedent holds, **x**’* and **x**”* are said to have “the same likelihood function”, i.e., f’(**x’**; θ) = cf”(**x**”, θ) for all θ, c a positive constant. In such cases, we abbreviate by saying **x**’* and **x**”* are *SLP pairs*, and the asterisk * will be used to indicate this.

So we can abbreviate the SLP as follows:

SLP: for any two experiments, E’ and E”, if

x’* andx”* areSLP pairs(from E’ and E” respectively) then

Infr_{E’}(x’*) equiv Infr_{E”}(x”*).-1-

_________________________________________________________________

*2.1 SLP Violation with Binomial, Negative Binomial*

**Example 1 ***. Binomial vs. Negative Binomial*. Consider independent Bernoulli trials, with the probability of success at each trial an unknown constant θ, but produced by different procedures, E’, E”. E’ is Binomial with a pre-assigned number n of Bernoulli trials, say 20, and R, the number of successes observed. In E” trials continue until a pre-assigned number r, say 6, of successes has occurred, with the number N trials recorded. The sampling distribution of R is Binomial:

f(R; θ) = (

_{n}C_{ r}) θ^{r}(1– θ)^{n-r}

while the sampling distribution of N is Negative Binomial.

f(N; θ) = (

_{n-1}C_{ r-1}) θ^{r}(1– θ)^{n-r}

If two outcomes from E’ and E” respectively, have the same number of successes and failures, r and n, then they have the “same” likelihood, in the sense that they are proportional to θ^{r}(1– θ)^{n-r}.

The two outcomes, **x**’* and **x**”* are SLP pairs. But the difference in the sampling distributions of the respective statistics, R and N, of E’ and E” respectively, entails a difference in p-values or confidence level assessments. Accordingly, their evidential appraisals differ for sampling distribution inference. Thus **x**’* and **x**”* are SLP pairs leading to an SLP violation.

-2-

__________________________________________________________________

*An SLP violation with Binomial (E’) and Negative Binomial (E”): *

(E’, r=6) and (E”, n=20) have proportional likelihoods

but Infr_{E’}(x’*= 6) isnotequiv to Infr_{ E”}(x”*=20).

*Loss of relevant information if the index is erased*

In making inferences about θ on the basis of data **x **in sampling theory**, **relevant information would be lost if the report removed the index from E and reported:

Data

xconsisted of r successes in n Bernoulli trials, generated fromeithera Binomial experiment with n fixed at 20, or a negative binomial experiment with r fixed at 6—erasing the index indicating the actual source of data.-3-

__________________________________________________________________

**2.2 SLP violation with fixed normal testing and optional stopping: E’, E” ** * *

**Example 2**. *Fixed vs. sequential sampling*. Suppose **X**’ and **X**” are sets of independent observations from N(μ,σ^{2}), with σ known, and p-values are to be calculated for the null hypothesis μ = 0. In E’ the sample size is fixed, whereas in E” the sampling rule is to continue sampling until 1.96σ/√n is attained or exceeded. Suppose E” is first able to stop with *n* = 169 trials. Then **x**” has a proportional likelihood to a result that could have occurred from E’, where n was fixed in advance to be 169, and result **x**’ is 1.96σ/√n from 0. Although the corresponding p-values would be different, the two results would be inferentially equivalent according to the SLP. This application of the SLP to the case of optional stopping is often call this the Stopping Rule Principle SRP (Berger and Wolpert 1988).[i]

*SLP violation with Fixed Normal Testing and Optional Stopping: E’, E” *

(E’, 1.96σ/13) and (E”, n = 169) have proportional likelihoods

Infr_{E’}(1.96σ /13) isnotequiv to Infr_{ E”}(n= 169).-4-

__________________________________________________________________

(a) Sufficient Statistic:Let datax= (x_{1},x_{2},…,x_{n}) be a realization of random variableX, following a distribution f, astatisticT(x) is asufficientstatistic if the following relation holds:f(

x; θ) = f_{T}(t; θ) f_{x|T}(x| t)

where f_{x|T} does not depend on the unknown parameter θ.

(b) Sufficiency Principle (general): If random sampleX, in experiment E, has probability density f(x;θ), and the assumptions of the model are valid, and T is minimal sufficient for θ, then if t(X’) = t(X”), then Infr_{E’}(x’) = Infr_{E”}(x”).

Since the sufficiency principle holds for different inference schools, any application must take into account the relevant method for inference under discussion (Cox and Mayo 2010).

(c) Sufficiency Principle applied in sampling theory:If a random variableX, in experiment E, arises from f(x;θ), and the assumptions of the model are valid, then all the information about θ contained in the data may be obtained from considering its minimal sufficient statistic t and thesampling distributionf_{T}(t;θ) of experiment E.-5-

__________________________________________________________________

*Weak Conditionality Principle (WCP):*If a mixture experiment is performed, with components E’, E” determined by a randomizer (independent of the parameter of interest), then once (E’,**x**’) is known, inference should be based on E’ and its sampling distribution; not on the sampling distribution of the convex combination of E’ and E”.

*4.1 Understanding the WCP*

The WCP includes a prescription and a proscription for the proper evidential interpretation of **x**’, once it is known to have come from E’:

The evidential meaning of any outcome (E’,

x’) of any experiment E having a mixture structure is the same as the evidential meaning of the corresponding outcomex’ of the corresponding component experiment E’,ignoring otherwise the over-all structure of the original experiment.”(Birnbaum 1962, 279)-6-

__________________________________________________________________

While the WCP seems obvious enough, it is actually rife with equivocal potential. To avoid this, we belabor here its three assertions.

*First*, it applies once we know which component of the mixture has been observed, and what the outcome was (E^{j},**x**^{j}). (Birnbaum considers mixtures with just two components).

*Second*, there is the prescription about evidential equivalence. Once it is known E^{j}has generated the data*,*given that our inference is about a parameter of E^{j}*,*inferences are appropriately drawn in terms of the sampling distribution in E^{j }–the experiment known to have been performed.

*Third,*there is the proscription: In the case of informative inferences about parameter of E^{j}our inference should not be influenced by whether the decision to perform E^{j}was determined by a coin flip or fixed all along. Misleading informative inferences result from averaging over the convex combination of E^{j}and an experiment known not to have given rise to the data. The latter may be called the*unconditional*sampling distribution.

-7-

__________________________________________________________________

*A second ambiguity. *Casella and Berger (2002) write:

The [weak] Conditionality principle simply says that if one of two experiments is randomly chosen and the chosen experiment is done, yielding data

x, the information about θ depends only on the experiment performed….The fact that this experiment was performed, rather than some other, has not increased, decreased, or changed knowledge ofθ. (emphasis added, 293)

Casella and Berger’s intended meaning is the correct claim:

(i) Given it is known that measurement

x’ is observed as a result of using tool E’, then it does not matter (and it need not be reported) whether or not E’ was chosen by a random toss (that might have resulted in using tool E”) or fixed all along.

Compare this to a false and unintended reading:

(ii) If some measurement

xis observed, then it does not matter (and it need not be reported) if it came from a precise tool E’ or imprecise tool E”.

Claim (i) by contrast, may well be warranted, not on purely mathematical grounds, but as the most appropriate way to report the precision of the result attained, as when WCP applies.

The linguistic similarity of (i) and (ii) may explain the equivocation that vitiates the Birnbaum argument.

-8-

*__________________________________________________________________ *

*4.3 Is WCP an Equivalence? **(you may wish to compare this to my earlier treatments, e.g., **Mayo 2010**😉*

A central question is whether WCP is a proper equivalence, holding in both directions (Evans, et.al..1986, Durbin 1970). Weighing against viewing it as an equivalence is this: it makes no sense to say one should use the unconditional rather than the conditional assessment (once it is known which component of a mixture was performed), and at the same time maintain the unconditional and conditional assessments are evidentially equivalent. WCP prescribes conditioning on the experiment known to have produced the data, *and not the other way around*. It is only because these do not yield equivalent appraisals that the WCP may serve to avoid counterintuitive assessments (e.g., that would otherwise be permitted from those famous weighing machines). It is their inequivalence, in short, that gives Cox’s WCP its normative proscriptive force:

WCP proscription: Once (E’,

x’) is known, Infr_{E’}(x’) should be computed using, not the unconditional sampling distribution over E’ and E”, but rather, the sampling distribution of E’.-9-

__________________________________________________________________

Yet there is an equivalence within the WCP , and so long as it is consistently interpreted, raises no problems.[ii] This turns out to be the linchpin of disentangling the Birnbaum argument.

To hold WCP for a given context is to judge that the information that E’ was determined by a flip is a redundancy, equivalent to conjoining a tautology to the outcome (E’, **x**’):

- Knowing that (E’,
**x**’) occurred, - Infr
_{E’}(**x**’) equiv [Infr_{E’}(**x**’) and (Either E’ was chosen by flipping, or E’ was fixed)]

where it given that the flipping conjunct in no way alters the construal of (E’, **x**’). [iii]

Viewing the WCP as endorsing a genuine “two-way” equivalence requires viewing any known experimental result as equivalent, evidentially, to its being a component of a corresponding mixture, even though it is known that in fact E was not chosen by a mixture. While this may seem unsettling, no untoward evidential interpretations result so long as the proscriptive part of the WCP remains, and is not contradicted (say by allowing the imaginary mixture to influence the interpretation of the known “component”).

-10-

__________________________________________________________________

**5. Birnbaum’s Argument**

SLP: for any two experiments, E’ and E”, if

x’* andx”* areSLP pairs(from E’ and E” respectively) then Infr_{E’}(x’*) equiv Infr_{E”}(x”*).

Begin with any case where the antecedent of the SLP holds. The task is to show the two ought to be deemed evidentially equivalent.

*Premise 1: *

Suppose we have observed (E’,

x’*) with an SLP pair (E”,x”*). Then view (E’,x’*) as having resulted from getting heads on the toss of a fair coin, where tails would have meant performing E” (any other irrelevant randomizer would do). This is sometimes called the “enlarged experiment”. Now construct the Birnbaum test statistic T-B defined in terms of the enlarged experiment:T-B(E

^{j},x^{j}*) = (E’,x’*), ifx’=x’* or j = 2 andx” =x”*.

Else, report the outcome (E^{j}, **x**^{j} ).

In words: in the case of a member of an SLP pair, statistic T-B has the effect of erasing the index j. Inference based on T-B is to be computed averaging over the performed and unperformed experiments E’ and E”. This is the *unconditional formulation* of the enlarged experiment. This gives premise one:

-11-

__________________________________________________________________

(1) For any (E’, **x**’*), the result of construing its evidential import in terms of the unconditional formation is that:

Infr

_{E-B}(x’*) equiv Infr_{E-B}(x”*)

The likelihood functions of (E’, **x**’*) and (E”, **x**”*) are proportional for all θ, being .5f(**x**’*;θ) and .5f(**x**”*; θ).

However E’ and E” are different models of the experiment producing the two likelihoods, and the enlarged model associated with T-B is yet a third model of the experiment. The second premise now concerns the WCP:

(2) Once it is known that* E’ *produced the outcome **x**’*, compute the inference just as if it were known all along that E^{’} was going to be performed, i.e., one should use the conditional formulation, ignoring any mixture structure:

Infr

_{E-B}(x’*) equiv Infr_{E’}(x’*)

More generally, once (**x**^{j}*) is known to have come from E^{j}, j = 1 or 2, premise (2) is

Infr

_{E-B}(x^{j}*) equiv Infr_{E’}(x^{j}*)

From premises (1) and (2) it is concluded, for any arbitrary SLP pair **x**’*, **x**”*,

Infr

_{E’}(x’*) equiv Infr_{E”}(x”*)-12-

__________________________________________________________________

The SLP is said to follow. This is an unsound argument.

*A sound argument must be both deductively valid and have all true premises.*

Consider the truth of the two premises of Birnbaum’s argument. Premise one: (Infr_{E-B}(**x**’*) equiv Infr_{E-B}(**x**”*) is true provided that

Infr

_{E-B}(x’*) is the inference from (E’,x’) averaging over the unconditional sampling distribution of statistic T-B. In effect it reports just the likelihood ofx*, which enters inference in terms of the convex combination of E’ and E”.

For premise two to be true

(i.e., Infr

_{E-B}(x^{j}*) equiv Infr_{E’}(x^{j}*) for j= 1, 2)

Infr_{E-B}(**x**^{j}*) must refer the inference from (E^{j}, **x**^{j*}) modeled in terms of the sampling distribution of E^{j} alone. The experiment E-B on which inference is to be based has different meanings in each premise. The argument is invalid.

-13-

__________________________________________________________________

*5.2 Second formulation: allowing true “if then” premises*

We can formulate the argument so that both premises are true “if then” statements[iv] incorporating the stipulated sampling distributions:

As before, suppose an arbitrary member of an SLP pair (E’, E”) is observed, e.g.,

(E’,

x’*) is observed. The question is to its evidential import.

(1) If Infr_{E-B}(**x**’*) is computed unconditionally, averaging over the sampling distributions of T-B, then

Infr

_{E-B}(x’*) equiv Infr_{E-B}(x”*)

(2) If Infr_{E-B}(E^{j},**x**^{j}*) is computed conditionally, using the sampling distribution of E^{j}:

Infr

_{E-B}(x^{j}*) equiv Infr_{E’}(x^{j}*) for i= 1, 2.

Construed as “if then” claims, the premises can both be true, but then we cannot validly infer the SLP:

Infr

_{E’}(x^{’}*) equiv Infr_{E”}(x^{”}*)

We would need contradictory antecedents to hold.

-14-

__________________________________________________________________

The formal invalidity is proved by any SLP violation, since in that case, the premises are true and the conclusion is false. SLP violation pairs are readily available (e.g., Examples 1 and 2), and no contradiction results. In fact, we have demonstrated something stronger: whenever we deal with an SLP violation pair, the two “if then” premises, when true yield a false conclusion.

REFERENCES: See Paper (or my latest version upcoming in Statistical Science).

[i] Applying the stopping rule principle requires stipulating that the stopping rule was uninformative for the inference, as in the above example.

[ii] Birnbaum himself is conflicted here. In his later, 1969 paper, Note 11, Birnbaum asserts, “The formulation of the conditionality concept as one of equivalence”, as in [WCP] was proposed by him in (1962) as the natural explication of the concept, not withstanding the one-sided form to which applications of the concept had been restricted (substitution of simpler for less simple models of evidence). This proposal seems to have found general acceptance among those interested in the concept.

[iii] For that matter, as Birnbaum suggests (1969, 119), a “trivial but harmless” augmentation to any experiment might be to toss a fair coin and report heads or tails (where this was irrelevant to the original model). Given (E’,** x**’),

Infr

_{E’}(x’) equiv [Infr_{E’}(x’) and either a coin was tossed or it was not].

He intends the move in applying the WCP is to be just as innocuous as the report of an irrelevant coin toss.

[iv] I am deliberately avoiding the term “conditional” statement, since it is used with a very different sense throughout.

#: This will give graduate students at my 28 Nov., 2012 presentation of this paper, as part of the (PH500) seminar, London School of Economics, a chance to submit something. Inquiries: error@vt.edu

For some older examples of U-Phils, see an earlier post, and search this blog.

-15-

This is a very interesting post, but I have to say that I cannot follow it to the end. I do, however, have some observations about the examples. (Forgive me if they are consequences of naivety!)

To me examples 1 and 2 are equivalent. The conflicts that they illustrate between ‘frequentism’ and the likelihood principle come from p-values being calculated in a manner that ‘corrects’ for the sequential nature of sampling. That is obvious in the second example, but the first example purports to be about binomial vs. negative binomial experiments. However the negative binomial experiment is really a sequential sampling scheme. Thus the two examples are equivalent.

Now a question. Is it possible that there are frequentist p-values that are not in conflict with the likelihood principle? I suggest that the conflict is really between likelihood principle and p-values that come from the Neyman-Pearsonian error-decision paradigm. The p-values ‘corrected’ for sequential sampling are calculated in a way that restores their linear relationship with type I errors. Such p-values relate to global error rates rather than the evidential worth of the data and so conflict with the likelihood principle is not only inevitable, but of no consequence. In contrast, p-values that are calculated assuming a fixed sample size for both of the examples seem to respect the conditionality principle and so, I assume, also respect the likelihood principle. Such p-values are indices of the evidential worth of the data.

Michael: since we’ve taken up the SLP a few times already on this blog, I was just going to give links, but decided to post a summary I sketched recently of a variant on my earlier discussion. I f you read the linked paper, I guarantee you’ll make it to the end (this overview may be too sketchy). Sure the two examples are analogous, but people have different intuitions about them, and of course one is discrete, the other continuous. They are just illustrations. I’ll have to study the rest of what you wrote later on, thanks.

You’re brave to offer such a guarantee! I’m downloading the paper now, but I look forward to your comments on my last paragraph.

I am brave, but let me know if I’m wrong.

OK, I’ve read your paper and, I think, understood much of it.

I wonder if the problem might vanish if one were to insist that p-values be calculated only assuming a fixed sample size. The other calculations, as my previous comment suggests, are specified for the purposes of fixing ‘pre-data’ error rates (what I consider to be ‘global’ error rates) rather than for the purposes of inference. Neyman and Pearson (1933) were quite explicit in denying the notion that experiments could support inductive inference, so I think that attempting to apply inferential principles to p-values that have been adjusted to conform to their system is probably unhelpful.

If the results in example 4 are determined as for a fixed sample size in both cases then the process that you call ‘Birnbaumization’ would not lead to any contradiction. (It would be silly to apply Birnbaumization to two results that were identical, but that is a different issue.)

Claims:

1. Error rates within the Neyman-Pearson error-decision framework exist in a different logical space from the likelihood principle et al., which apply to considerations of experimental evidence, and so any claims of compliance or non-compliance are valueless.

2. This critique of Birnbaum’s argument relies, at least in part, on the non-compliance with the conditionality principle of results calculated according to the Neyman-Pearson framework.

I cannot say that I understand your paper fully and I have only read it once, so my claims may be wrong. In particular, I’m assuming that your phrase ‘the sampling theorist’ and ‘sampling theory statistics’ denotes a frequentist system that includes Neyman-Pearson error-decision framework. However, even if I have made a mistake, be it minor or a howler, I’m fairly sure that there is something worth discussing in my claims.

Michael: I read your comment. No, we would not consider that p-values that ignore the sampling distribution succeed as “indices of the evidential worth of the data”. To us, the evidential worth of data (from a given test) for a claim H requires considering the probativeness or severity of the test, relative to the contemplated inference (or indication). So, for instance, if the test (in example 2) kept going and going until it finally obtained a 1.96 standard deviation difference from a hypothesized null, it would not be a good indication that a genuine discrepancy (from the null) had been found. The procedure makes it too easy to erroneously infer effects. Of course how easy depends on where it stops. Thus, to ignore the stopping rule, as you suggest, would not be to correctly indicate evidential worth, at least not to a frequentist error statistician. That’s the basis for common criticisms of ESP tests that “try and try again”: https://errorstatistics.com/2012/09/22/statistics-and-esp-research-diaconis/

You can search for many discussions on optional stopping on this blog. Here are a few fairly recent ones:

https://errorstatistics.com/2012/09/19/barnard-background-infointentions/

https://errorstatistics.com/2012/04/06/3184/

https://errorstatistics.com/2012/09/03/after-dinner-bayesian-comedy-hour/ (comedy).

And if you find the Birnbaum business still unclear after looking at the paper, there’s my New Year’s Eve skit I just noticed:

https://errorstatistics.com/2011/12/31/midnight-with-birnbaum/

There is no doubt that it is possible to inflate the rate of false positive errors by sampling to a foregone conclusion, undeclared interim analyses, data peeking, optional stopping etc. However, while such procedures that might amount to cheating within the error-decision framework of Neyman and Pearson, within the likelihood paradigm they result in only a modest increase in the risk of obtaining strong misleading evidence and they don’t alter the evidential meaning of the final data (Royall 2000; Blume 2008).

Royall, R. (2000). On the Probability of Observing Misleading Statistical Evidence. Journal of the American Statistical Association, 95(451), 760–768.

Blume, J. (2008). How Often Likelihood Ratios are Misleading in Sequential Trials. Communications in Statistics. Theory and Methods, 37(8), 1193–1206.

I am always surprised to read descriptions of the consequences of undeclared sequential testing that fail to mention that such a procedure reduces the risk of type II errors at the same time as it increases the risk of type I errors. The implication of cheating is seems to be based, at least in part, on a privileging of type I errors over type II errors. In many experiments with very small samples, an undeclared extension will decrease the overall risk of a wrong decision. I am not saying that undeclared interim analyses are a good idea within the Neyman-Pearson approach, just that they may not be uniformly deleterious to good inference.

I assume that you devised the test severity approach as a way of grafting evidential considerations onto the error-decision framework. However, I didn’t notice any severity considerations in the paper that critiques Birnbaum’s paper.

You do not seem yet to have addressed the claims that I made in the previous comment, so I’ll add a third that you may agree with:

3: We cannot assess evidence by way of the conventional error rate calculations associated with the Neyman-Pearson approach because that approach is not concerned with evidence.

Michael: Yes the Birnbaum paper quite deliberately did not assume any philosophy of statistics. One needs to assume no such philosophy in order to show his “proof” is illicit. On everything else you wrote, you obviously take the old standard positions against N-P statistics that we’ve patiently been dissecting on this blog, and so merely beg the question against its having an inferential or evidential construal. If you’re interested, use the search on this blog or find my articles under “publications”. Thanks for your interest.

Michael: Michael: The reason the optional stopping test doesn’t reap credit from the low (perhaps 0) type 2 error here is that the output is “reject the null, infer evidence of an effect” (or the like). The fact that a Geller-type ESP test might have no (or low) chance of ever inferring he has no paranormal ability when he does (i.e., no or low type 2 error), would scarcely add to the scientific merit of his inferring ability has been shown, based on an “impressive” set of hits (after trying and trying again).

I deny your claim 3. Birnbaum himself developed an evidential construal of N-P tests, but N and P, and surely P, had one based on using error probabilities to appraise the sensitivity, precision, and severity of tests in relation to a given inference. You might find the “Neyman’s Nursery” blogposts eye-opening!

This is the first time I understand what the “breakthrough” fuss and hoopla s about! I was taught the Birnbaum result as something proved, never questioned, godsend, and blessed by all. Now I learn there’s a big mistake, and that Allan retracted his own presentation. But Savage was also right, that the result would lead many to Bayes or Likelihood.

E.A. I think Birnbaum “essentially” retracted his presentation but slyly, e.g., abandoning the WCP all of a sudden, but replacing it with “truncation” as a way to avoid problems with the “proof”. He stops published work on the problem for like 7 years, and then declares he has changed his mind. I wish I had met him (I do have a lot of his papers.drafts.short letters here.)*

Certainly agree about Savage.

*Of course I’m not counting my “midnight with Birnbaum meeting” on New years Eve.

Mayo: As I wrote some time ago on Christian Robert’s blog, I believe (from a mathematician’s point of view) that the problem comes from the fact that Birnbaum didn’t define Infr_{E-B} and Infr in general properly as a mathematical object. You write:

“The experiment E-B on which inference is to be based has different meanings in each premise.”

I think that it *is* possible to define Infr_{E-B} in a way that it indeed has the same meaning in the two setups, which would save the proof (actually Christian Robert had a very simplistic version of doing this on his blog; the optimally stupid way of doing it is to define Infr=0 regardless of whatever fuss is made about it). However, the problem then is that the theorem has implications only for the specific and very restricted kind of inference as which Infr is defined (which in the “optimally stupid version” is of no use at all), and no implications for how other rational human beings would want to or should carry out inference legitimately.

If Infr is meant to be something more general, capturing all kinds of interesting inferences like the ones in your examples, I agree with your arguments, although I’d still diagnose the major flaw (regarding why the proof falls flat) to be that Birnbaum didn’t define or explain what kind of object Infr was meant to be in a way precise enough that can be used in a proof.

The consequence of this is that there is a third way out (apart from having the premises contradicting each other, or the conclusion being wrong), namely that premises and conclusion are alright but don’t say anything of interest, in case Infr is defined in a sufficiently uninteresting way.

By chance, I came across a paper that Gregory has written that takes up my criticism of Birnbaum (though he does not do so in his post) but in it he incorrectly claims that I restrict the Sufficiency Principle in some way. I do not. This is the usual definition and is the one Birnbaum uses. Most importantly, I do not prevent its application here at all. I apply sufficiency to premise 1, just as Birnbaum wants, and do not restrict it in any way. (A different variant on Birnbaum’s argument doesn’t even use sufficiency, but he always needs some way to “turn two experiments into one” –because the likelihood principle refers to two experiments. Further, am happy to allow him to do this (that’s what Birnbaumization is about). I hope that my new paper will be clearer to him. The main thing is to address the flaw I have trotted out in detail, and not assume I’m slipping into the ways others have often tried to deal with this.

Thank you for taking up my response to your argument, which does need to be more carefully formulated. In particular, I think I should say not that you do not allow Sufficiency to be applied to mixture experiments, but that you do not allow both Conditionality and Sufficiency to be applied to mixture experiments at one and the same time. (Thus, you do not allow Sufficiency to be applied to mixture experiments *when the Conditionality Principle is being used*.) Some such restriction is necessary when Sufficiency and Conditionality are regarded as telling you what sampling distribution to use for inference. But when the truth of the Likelihood Principle is at issue, it’s question-begging to assume that sampling distributions are being used for inference. The point of the proof is that the two principles together imply that sampling distributions should not be used for inference, because no method of inference that is based on sampling distributions can satisfy both principles simultaneously. The two principles (as originally formulated) cannot be contradictory because there are methods that conform to both of them–namely, likelihoodist methods and Bayesian conditioning.

I appreciate your patience in continuing to discuss this proof with me. I hope that we will reach a mutual understanding soon.

Greg: I’ll study this again later, but you’re still missing the point. It is not I who deny any combination of principles, only self-contradictions, if one expects a sound inference.

“But when the truth of the Likelihood Principle is at issue, it’s question-begging to assume that sampling distributions are being used for inference”

That’s the impression I got from my reading, too. I may try again.

Thanks for putting this up, I’m just getting to grips with these ideas so I’m learning a lot from it.

Is the key point here that premise two is the WCP (or a fairly direct consequence), and premise one is a violation of the WCP? I.e. in premise one Birnbaum claims (or, your account of Birnbaum claims) that in the “imaginary” enlarged experiment, the unconditional inference averaged across the two branches is taken, but this is exactly what the WCP says you shouldn’t do, no? I.e. in (1) Infr_{E-B}(x’*) is computed unconditionally, but you aren’t allowed to do that if you hold the WCP, because you know that you started with the experiment E’ (not the mixture experiment)? That would mean that surely you can’t hold premise one and two at the same time.

Or, perhaps, in the if-then formulation, the antecedents of the two premises are mutually exclusive, and thus though the two premises are consistent with each other (they can both be true), the if-part of both of them can’t be true at the same time, no matter whether you happen to agree with the WCP or not, or reserve judgement! Perhaps I’m just restating what you said above…

James: Thanks. yes, I think you’ve got it. If you doubt whether I am true to Birnbaum, please study him and verify for yourself.

In Mayo’s paper (Mayo 2010, p.305-314), the symbol * is used to denote the result when the likelihoods of the two experiments are proportional.

My understanding is that Mayo’s disproof versions 1 and 2 are essentially based on the same contradiction: (antecedent of) premise (1) is based on the unconditional formulation and premise (2) or antecedent of premise (2)’ is based on the conditional formulation.

In my view, premise (1) does not contradict with premise (2) or (2)’ (Mayo 2010, p.309) for the following reasons:

Premise (1) says in the case of * results, the conditional and unconditional results should be the same. This does not contradict with the statement “inference should be conditional on the experiment actually performed.” In other words, premises (1) and (2) can be both based on conditional formulations, but in the case of * results, premise (1) asserts that the conditional and unconditional results should be the same.

On the other hand, did Birnbaum prove anything meaningful? My answer is “No”. With the adaptation of the conditional principle (CP) and the strong likelihood principle (SLP), there is still plenty room for one to choose a different inferential procedure. What Birnbaum did was to report the same result (TBB) when the two likelihoods were proportional (neither SP nor CP requires one to report the result this way!) – This is what SLP wants. Therefore, What Birnbaum actually did was use the SLP to prove the SLP – as simple as that!