Reblogging from a year ago. The Appendix of the “Cox/Mayo Conversation” (linked below [i]) is an attempt to quickly sketch Birnbaum’s argument for the strong likelihood principle (SLP), and its sins. Couple of notes: Firstly, I am a philosopher (of science and statistics) not a statistician. That means, my treatment will show all of the typical (and perhaps annoying) signs of being a trained philosopher-logician. I’ve no doubt statisticians would want to use different language, which is welcome. Second, this is just a blog (although perhaps my published version is still too informal for some).

But Birnbaum’s idea for comparing evidence across different methodologies is also an informal notion! He abbreviates by Ev(E, x): the inference, conclusion or evidence report about the parameter μ arising from experiment E and result x, according to the methodology being applied.

So, for sampling theory (I prefer “error statistics”, but no matter), the report might be a p-value (it could also be a confidence interval with its confidence coefficent, etc).

The strong LP is a general conditional claim:

(SLP): For any two experiments E’ and E” with different probability models but with the same unknown parameter μ, and x’ and x” data from E’ and E” respectively, where the likelihoods of x’ and x” are proportional to each other, then x’ and x” ought to have the identical evidential import for any inference concerning parameter ì.

For instance, E’ and E” might be Binomial sampling with n fixed, and Negative Binomial sampling, respectively. There are pairs of outcomes from E’ and E” that could serve in STP violations. For a more extreme example, E’ might be sampling from a Normal distribution with a fixed sample size n, and E” might be the corresponding experiment that uses an optional stopping rule: keep sampling until you obtain a result 2 standard deviations away from a null hypothesis.

Suppose we are testing the null hypothesis that μ = 0 (and for simplicity, a known standard deviation).

The SLP tells us (in relation to the optional stopping rule) that once you have observed a 2-standard deviation result, there ought to be no evidential difference between its having arisen from experiment E’, where n was fixed at 100, and experiment E” where the stopping rule happens to stop at n = 100 (i.e., it just happens that a 2-standard deviation result was observed after n= 100 trials).

The key point is that there is a difference in the corresponding p-‐values from E’ and E”, which we may write as p’ and p”, respectively. While p’ would be ~.05, p” would be much larger, perhaps ~ .3 (the numbers do not matter). The error probability accumulates because of the optional stopping.

Clearly p’ is not equal to p”, so the two outcomes are not evidentially equivalent for a frequentist. This constitutes a violation of the strong LP (which of course is just what is proper for a frequentist).

Unless a violation of the SLP is understood, it will be impossible to understand the issue about the Birnbaum argument. Some people are forgetting that for a “sampling theory” person, evidential import must always consider the sampling distribution. This sounds awfully redundant, and it is, but given what I’m reading on some blogs, it bears repeating.

One excellent feature of Kadane’s book is that he is very clear in remarking how frequentists violate the SLP.

I should note that Birnbaum himself rejected the SLP.

The SLP is a conditional (if-then claim) that makes a general assertion, about any x’, x” that satisfy the conditions in the antecedent. Therefore, it is false so long as there is any case where the antecedent holds and the consequent does not. Any STP violation takes this form.

(SLP Violation): Any case of two experiments E’ and E” with different probability models but with the same unknown parameter μ, where

- x’ and x” are results from E’ and E” respectively,
- likelihoods of x’ and x” are proportional to each other
- AND YET x’ and x” have different evidential import (i.e., Ev(E’,x’) is not equal to Ev(E”, x”))

I’ll wait a bit to continue with this. I am traveling around different countries, so blog posts may be irratic, (with possible errors you’ll point out).

(Made it to Zurich and rented car to Konstanz)

[i]

With respect to your example of the normal experiment with the optional stopping rule, the SLP does not imply what you claim it implies. In experiment E”, the probability distribution of the outcome is not the product of distributions of independent and identically distributed normal random variables. Rather, it’s a random walk Markov chain with an absorbing boundary that grows with the square root of N. Ignoring the actual data for the moment, the probability distribution for the number of trials in the experiment depends on mu, which means that when considered as a likelihood, the number of trials alone is informative about mu. In fact, for some values of mu, there’s a non-zero probability that the experiment never terminates…

See Wikipedia’s article on the law of the iterated logarithm for more information about the behavior of this random walk (without the boundary).Here’s a case where the SLP applies. Suppose you’re measuring the weight of an item. In experiment E’, the scale’s display has an upper bound of 99.9. If the number would exceed that bound, it displays “ERR” instead. In experiment E”, the upper bound is so large relative to the weight of the items that it is effectively infinite. In both experiments, the measured weight is subject to random error of known distribution (the same distribution in E’ and E”) and the number of trials is fixed. The SLP asserts that when all measured weights are below 100.0, the data provide the same evidence no matter which scale was used. In contrast, the error statistical approach requires taking the truncation of the sample space into account when computing p-values, creating confidence procedures or constructing rejection regions.

I think the binomial/negative binomial sampling problem is a special case that has perhaps misled some Bayesian philosophers or even statisticians into making false claims about optional stopping. Chapter 6 of Gelman’s Bayesian text provides the actual theory giving the conditions in which the data collection mechanism can safely be ignored.

In the above comment, imagine a paragraph break wherever the space after a period is missing.

Sorry, are you saying the SLP does not deny the relevance of the stopping rule in this example? It does (see for example Savage forum 1962)–and famously so*. I don’t deny there are many, many OTHER violations, which is why I recommend studying them before approaching the Birnbaum argument. Any will do to make the argument more vivid, even though the issue doesn’t depend on any particular violation of the SLP. Excuse me if I’m missing your point…

*”The likelihood principle emphasized in Bayesian statistics implies, among other things, that the rules governing when data collection stops are irrelevant to data interpretation. It is entirely appropriate to collect data until a point has been proved or disproven … (Edwards, Lindman and Savage 1963, p. 193).

“In general, suppose that you collect data of any kind whatsoever — not necessarily Bernoullian, nor identically distributed, nor independent of each other …— stopping only when the data thus far collected satisfy some criterion of a sort that is sure to be satisfied sooner or later, then the import of the sequence of n data actually observed will be exactly the same as it would be had you planned to take exactly n observations in the first place (ibid., 238-239)”.

This irrelevance of the stopping rule is sometimes called the Stopping Rule Principle (SRP); it is an implication of the (strong) LP.

Edwards, W., H. Lindman and L. J. Savage. 1963. “Bayesian Statistical Inference for Psychological Research.” Psychological Review 70, 450-499.

Sad to say, but the famous statisticians you quote are simply wrong on the math.

This blog format isn’t great for trying write equations, but reason is essentially that in E’ there are some sample paths that cross the stopping boundary before the final sample size is reached, whereas those sample paths have zero probability in E”, and the probability mass for the set of those sample paths is a function of mu. If you like, I can email you a pdf file showing you the sampling densities of the data in the two experiments if you like, from which it will be obvious that the likelihoods are not proportional.

Actually, the above explanation is incorrect. The reason is that the fact that the experiment terminated conveys information about mu.

Erm, they’re not wrong if they were talking about two-sided tests… I was thinking about one-sided tests because that’s what the formaldehyde paper is about.

The example of the slp violation here refers to two-sided tests. It is an extreme example, but it’s the one often used, as in the “savage forum.” of course there are tons of others, less extreme. In any event, the Birnbaum argument begins with a slp violation. that is why I say understanding such violations is needful to understand the issue.

I think you’re implicitly referring to the difference in sampling distributions rather than in likelihoods.

Like Reply

Corey While the number of trials N alone is indeed informative about the model parameter, the joint density of N and data(N) is truly proportional to the density of data(N). This may sound surprising but it is nonetheless the case. Check with the normal example. Or in Berger and Wolpert (1988).