Reblogging from a year ago. The Appendix of the “Cox/Mayo Conversation” (linked below [i]) is an attempt to quickly sketch Birnbaum’s argument for the strong likelihood principle (SLP), and its sins. Couple of notes: Firstly, I am a philosopher (of science and statistics) not a statistician. That means, my treatment will show all of the typical (and perhaps annoying) signs of being a trained philosopher-logician. I’ve no doubt statisticians would want to use different language, which is welcome. Second, this is just a blog (although perhaps my published version is still too informal for some).
But Birnbaum’s idea for comparing evidence across different methodologies is also an informal notion! He abbreviates by Ev(E, x): the inference, conclusion or evidence report about the parameter μ arising from experiment E and result x, according to the methodology being applied.
So, for sampling theory (I prefer “error statistics”, but no matter), the report might be a p-value (it could also be a confidence interval with its confidence coefficent, etc).
The strong LP is a general conditional claim:
(SLP): For any two experiments E’ and E” with different probability models but with the same unknown parameter μ, and x’ and x” data from E’ and E” respectively, where the likelihoods of x’ and x” are proportional to each other, then x’ and x” ought to have the identical evidential import for any inference concerning parameter ì.
For instance, E’ and E” might be Binomial sampling with n fixed, and Negative Binomial sampling, respectively. There are pairs of outcomes from E’ and E” that could serve in STP violations. For a more extreme example, E’ might be sampling from a Normal distribution with a fixed sample size n, and E” might be the corresponding experiment that uses an optional stopping rule: keep sampling until you obtain a result 2 standard deviations away from a null hypothesis.
Suppose we are testing the null hypothesis that μ = 0 (and for simplicity, a known standard deviation).
The SLP tells us (in relation to the optional stopping rule) that once you have observed a 2-standard deviation result, there ought to be no evidential difference between its having arisen from experiment E’, where n was fixed at 100, and experiment E” where the stopping rule happens to stop at n = 100 (i.e., it just happens that a 2-standard deviation result was observed after n= 100 trials).
The key point is that there is a difference in the corresponding p-‐values from E’ and E”, which we may write as p’ and p”, respectively. While p’ would be ~.05, p” would be much larger, perhaps ~ .3 (the numbers do not matter). The error probability accumulates because of the optional stopping.
Clearly p’ is not equal to p”, so the two outcomes are not evidentially equivalent for a frequentist. This constitutes a violation of the strong LP (which of course is just what is proper for a frequentist).
Unless a violation of the SLP is understood, it will be impossible to understand the issue about the Birnbaum argument. Some people are forgetting that for a “sampling theory” person, evidential import must always consider the sampling distribution. This sounds awfully redundant, and it is, but given what I’m reading on some blogs, it bears repeating.
One excellent feature of Kadane’s book is that he is very clear in remarking how frequentists violate the SLP.
I should note that Birnbaum himself rejected the SLP.
The SLP is a conditional (if-then claim) that makes a general assertion, about any x’, x” that satisfy the conditions in the antecedent. Therefore, it is false so long as there is any case where the antecedent holds and the consequent does not. Any STP violation takes this form.
(SLP Violation): Any case of two experiments E’ and E” with different probability models but with the same unknown parameter μ, where
- x’ and x” are results from E’ and E” respectively,
- likelihoods of x’ and x” are proportional to each other
- AND YET x’ and x” have different evidential import (i.e., Ev(E’,x’) is not equal to Ev(E”, x”))
I’ll wait a bit to continue with this. I am traveling around different countries, so blog posts may be irratic, (with possible errors you’ll point out).
(Made it to Zurich and rented car to Konstanz)
“A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo“
With respect to your example of the normal experiment with the optional stopping rule, the SLP does not imply what you claim it implies. In experiment E”, the probability distribution of the outcome is not the product of distributions of independent and identically distributed normal random variables. Rather, it’s a random walk Markov chain with an absorbing boundary that grows with the square root of N. Ignoring the actual data for the moment, the probability distribution for the number of trials in the experiment depends on mu, which means that when considered as a likelihood, the number of trials alone is informative about mu. In fact, for some values of mu, there’s a non-zero probability that the experiment never terminates…
See Wikipedia’s article on the law of the iterated logarithm for more information about the behavior of this random walk (without the boundary).Here’s a case where the SLP applies. Suppose you’re measuring the weight of an item. In experiment E’, the scale’s display has an upper bound of 99.9. If the number would exceed that bound, it displays “ERR” instead. In experiment E”, the upper bound is so large relative to the weight of the items that it is effectively infinite. In both experiments, the measured weight is subject to random error of known distribution (the same distribution in E’ and E”) and the number of trials is fixed. The SLP asserts that when all measured weights are below 100.0, the data provide the same evidence no matter which scale was used. In contrast, the error statistical approach requires taking the truncation of the sample space into account when computing p-values, creating confidence procedures or constructing rejection regions.
I think the binomial/negative binomial sampling problem is a special case that has perhaps misled some Bayesian philosophers or even statisticians into making false claims about optional stopping. Chapter 6 of Gelman’s Bayesian text provides the actual theory giving the conditions in which the data collection mechanism can safely be ignored.