Today is George Barnard’s birthday. I met him in the 1980s and we corresponded off and on until 1999. Here’s a snippet of his discussion with Savage (1962) (link below [i]) that connects to issues often taken up on this blog: stopping rules and the likelihood principle. (It’s a slightly revised reblog of an earlier post.) I’ll post some other items related to Barnard this week, in honor of his birthday.
Happy Birthday George!
Barnard: I have been made to think further about this issue of the stopping rule since I first suggested that the stopping rule was irrelevant (Barnard 1947a,b). This conclusion does not follow only from the subjective theory of probability; it seems to me that the stopping rule is irrelevant in certain circumstances. Since 1947 I have had the great benefit of a long correspondence—not many letters because they were not very frequent, but it went on over a long time—with Professor Bartlett, as a result of which I am considerably clearer than I was before. My feeling is that, as I indicated [on p. 42], we meet with two sorts of situation in applying statistics to data One is where we want to have a single hypothesis with which to confront the data. Do they agree with this hypothesis or do they not? Now in that situation you cannot apply Bayes’s theorem because you have not got any alternatives to think about and specify—not yet. I do not say they are not specifiable—they are not specified yet. And in that situation it seems to me the stopping rule is relevant.
In particular, suppose somebody sets out to demonstrate the existence of extrasensory perception and says ‘I am going to go on until I get a one in ten thousand significance level’. Knowing that this is what he is setting out to do would lead you to adopt a different test criterion. What you would look at would not be the ratio of successes obtained, but how long it took him to obtain it. And you would have a very simple test of significance which said if it took you so long to achieve this increase in the score above the chance fraction, this is not at all strong evidence for E.S.P., it is very weak evidence. And the reversing of the choice of test criteria would I think overcome the difficulty.
This is the answer to the point Professor Savage makes; he says why use one method when you have vague knowledge, when you would use a quite different method when you have precise knowledge. It seem to me the answer is that you would use one method when you have precisely determined alternatives, with which you want to compare a given hypothesis, and you use another method when you do not have these alternatives.
Savage: May I digress to say publicly that I learned the stopping-rule principle from professor Barnard, in conversation in the summer of 1952. Frankly I then thought it a scandal that anyone in the profession could advance an idea so patently wrong, even as today I can scarcely believe that some people resist an idea so patently right. I am particularly surprised to hear Professor Barnard say today that the stopping rule is irrelevant in certain circumstances only, for the argument he first gave in favour of the principle seems quite unaffected by the distinctions just discussed. The argument then was this: The design of a sequential experiment is, in the last analysis, what the experimenter actually intended to do. His intention is locked up inside his head and cannot be known to those who have to judge the experiment. Never having been comfortable with that argument, I am not advancing it myself. But if Professor Barnard still accepts it, how can he conclude that the stopping-rule principle is only sometimes valid? (emphasis added)
Barnard: If I may reply briefly to Professor Savage’s question as to whether I still accept the argument I put to Professor Savage in 1952 (Barnard 1947a), I would say that I do so in relation to the question then discussed, where it is a matter of choosing from among a number of simple statistical hypotheses. When it is a question of deciding whether an observed result is reasonably consistent or not with a single hypothesis, no simple statistical alternatives being specified, then the argument cannot be applied. I would not claim it as foresight so much as good fortune that on page 664 of the reference given I did imply that the likelihood-ratio argument would apply ‘to all questions where the choice lies between a finite number of exclusive alternatives’; it is implicit that the alternatives here must be statistically specified. (75-77)
Barnard, G.A. (1947a), “A Review of Sequential Analysis by Abraham Wald,” J. amer. Statist. Assoc., 42, 658-669.
Barnard, G.A. (1947b), “The Meaning of a Significance Level”, Biometrika, 34, 179-182.
Royall’s three questions of inference provide some clarity on the optional stopping problem. The questions are:
1. What do the data say?
2. What should I believe now that I have these data?
3. What should I do or decide now that I have these data.
The answer to question 1 is unaffected by whatever is going on in the head of the experimenter because the data are the data. The answers to questions 2 and 3 may be affected by the experimenter’s intentions and by the stopping rules, multiplicity, things lumped under the derogatory title P-hacking, and so on.
When we are not clear about that we mean by ‘inference’ and ‘evidence’ we risk being confused by stopping rules.
Michael: For a severe tester, x is evidence for H only if (& to the extent that) H was subjected to & passes a severe test. If a method is guaranteed to pass H, even if H is false, then we deny it has been subjected to a severe test. H must be corroborated, to use a Popperian term-there must be at least a reasonable chance to have found flaws in H, if it’s flawed. The stopping rule alters the sample space in such a way as to alter the probative capacity of the test. Making use of a preregistered report, presumably, also reflects our altered evidential appraisal of a result, upon learning of what he would have done if this too was non-significant. In the extreme case (with the test Barnard is referring to), it’s guaranteed the method will stop (in finitely many steps) and reject the null hypothesis, even if it’s true. The severity assessment will depend on how long & the stopping rule. The appraisal doesn’t fall under Royall’s belief or action categories, but to a distinct category: well testedness. I don’t deny there are different aims and contexts of inquiry. He excludes the one I most care about. (I suppose you can place almost anything under “acts”, but that wouldn’t be true to Royall’s intention.)