**by Deborah Mayo**

Birnbaum’s argument for the SLP involves some equivocations that are at once subtle and blatant. The subtlety makes it hard to translate into symbolic logic (I only partially translated it). Philosophers should have a field day with this, and I should be hearing more reports that it has suddenly hit them between the eyes like a ton of bricks, to use a mixture metaphor. Here are the key bricks. References can be found in here, background to the U-Phil here..

**Famous (mixture) weighing machine example and the WLP**** **

The main principle of evidence on which Birnbaum’s argument rests is the *weak conditionality principle *(WCP). This principle, Birnbaum notes, follows not from mathematics alone but from intuitively plausible views of “evidential meaning.” To understand the interpretation of the WCP that gives it its plausible ring, we consider its development in “what is now usually called the ‘weighing machine example,’ which draws attention to the need for conditioning, at least in certain types of problems” (Reid 1992).

*The basis for the WCP *

**Example 3. ***Two measuring instruments of different precisions. *We flip a fair coin to decide which of two instruments, E’ or E”, to use in observing a normally distributed random sample **X** to make inferences about mean q. E*’ *has a known variance of 10^{−4}, while that of E” is known to be 10^{4}. The experiment is a mixture: E-mix. The fair coin or other randomizer may be characterized as observing an indicator statistic J, taking values 1 or 2 with probabilities .5, independent of the process under investigation. The full data indicates first the result of the coin toss, and then the measurement: (E^{j}, **x**^{j}).[i]

The sample space of E-mix with components E^{j}, j = 1, 2, consists of the union of

{(j,** x’**): j = 0, possible values of** X’**} and {(j, **x**”): j = 1, possible values of **X**”}.

In testing a null hypothesis such as q = 0, the same **x** measurement would correspond to a much smaller p-value were it to have come from E′ than if it had come from E”: denote them as p′(**x**) and p′′(**x**), respectively. However, the overall significance level of the mixture, the convex combination of the p-value: [p′(**x**) + p′′(**x**)]/2, would give a misleading report of the precision or severity of the actual experimental measurement (See Cox and Mayo 2010, 296).

Suppose that we know we have observed a measurement from E” with its much larger variance:

The unconditional test says that we can assign this a higher level of significance than we ordinarily do, because if we were to repeat the experiment, we might sample some quite different distribution. But this fact seems irrelevant to the interpretation of an observation which we know came from a distribution [with the larger variance] (Cox 1958, 361).

In effect, an individual unlucky enough to use the imprecise tool gains a more informative assessment because he might have been lucky enough to use the more precise tool! (Birnbaum 1962, 491; Cox and Mayo 2010, 296). Once it is known whether E′ or E′′ has produced **x**, the p-value or other inferential assessment should be made conditional on the experiment actually run.

*Weak Conditionality Principle (WCP):*** **If a mixture experiment is performed, with components E’, E” determined by a randomizer (independent of the parameter of interest), then once (E’,** x’**) is known, inference should be based on E’ and its sampling distribution, not on the sampling distribution of the convex combination of E’ and E”.

*Understanding the WCP*

The WCP includes a prescription and a proscription for the proper evidential interpretation of** x’**, once it is known to have come from E’:

The evidential meaning of any outcome (E’,** x’**) of any experiment E having a mixture structure is the same as: the evidential meaning of the corresponding outcome** x’** of the corresponding component experiment E’*, ignoring otherwise the over-all structure of the original experiment *E (Birnbaum 1962, 489 E_{h} and x_{h} replaced with E’ and x’ for consistency).

While the WCP seems obvious enough, it is actually rife with equivocal potential. To avoid this, we spell out its three assertions.

*First*, it applies once we know which component of the mixture has been observed, and what the outcome was (E^{j} **x**^{j}). (Birnbaum considers mixtures with just two components).

*Second*, there is the prescription about evidential equivalence. Once it is known that E^{j} has generated the data, given that our inference is about a parameter of E^{j}, inferences are appropriately drawn in terms of the distribution in E^{j }—the experiment known to have been performed.

*Third*, there is the proscription. In the case of informative inferences about the parameter of E^{j} our inference should not be influenced by whether the decision to perform E^{j} was determined by a coin flip or fixed all along. Misleading informative inferences might result from averaging over the convex combination of E^{j} and an experiment known not to have given rise to the data. The latter may be called the unconditional (sampling) distribution. ….

*______________________________________________*

*One crucial equivocation: *

* *Casella and R. Berger (2002) write:

The [weak] Conditionality principle simply says that if one of two experiments is randomly chosen and the chosen experiment is done, yielding data **x**, the information about *q* depends only on the experiment performed. . . . *The fact that this experiment was performed, rather than some other, has not increased, decreased, or changed knowledge of **q**. *(p. 293, emphasis added)

I have emphasized the last line in order to underscore a possible equivocation. Casella and Berger’s intended meaning is the correct claim:

(i) Given that it is known that measurement **x**’ is observed as a result of using tool E’, then it does not matter (and it need not be reported) whether or not E’ was chosen by a random toss (that might have resulted in using tool E”) or had been fixed all along.

Of course we do not know what measurement would have resulted had the unperformed measuring tool been used.

Compare (i) to a false and unintended reading:

(ii) If some measurement **x** is observed, then it does not matter (and it need not be reported) whether it came from a precise tool E’ or imprecise tool E”.

The idea of detaching **x**, and reporting that “**x** came from somewhere I know not where,” will not do. For one thing, we need to know the experiment in order to compute the sampling inference. For another, E’ and E” may be like our weighing procedures with very different precisions. It is analogous to being given the likelihood of the result in Example 1,(here) withholding whether it came from a negative binomial or a binomial.

Claim (i), by contrast, may well be warranted, not on purely mathematical grounds, but as the most appropriate way to report the precision of the result attained, as when the WCP applies. The essential difference in claim (i) is that it is known that (E, **x**’), enabling its inferential import to be determined.

The linguistic similarity of (i) and (ii) may explain the equivocation that vitiates the Birnbaum argument.

Now go back and skim 3 short pages of notes here, pp 11-14, and it should hit you like a ton of bricks! If so, reward yourself with a double Elba Grease, else try again. Report your results in the comments.