# Posts Tagged With: Stopping rules

## Who is allowed to cheat? I.J. Good and that after dinner comedy hour….

It was from my Virginia Tech colleague I.J. Good (in statistics), who died four years ago (April 5, 2009), at 93, that I learned most of what I call “howlers” on this blog. His favorites were based on the “paradoxes” of stopping rules.

“In conversation I have emphasized to other statisticians, starting in 1950, that, in virtue of the ‘law of the iterated logarithm,’ by optional stopping an arbitrarily high sigmage, and therefore an arbitrarily small tail-area probability, can be attained even when the null hypothesis is true. In other words if a Fisherian is prepared to use optional stopping (which usually he is not) he can be sure of rejecting a true null hypothesis provided that he is prepared to go on sampling for a long time. The way I usually express this ‘paradox’ is that a Fisherian [but not a Bayesian] can cheat by pretending he has a plane to catch like a gambler who leaves the table when he is ahead” (Good 1983, 135) [*]

This paper came from a conference where we both presented, and he was extremely critical of my error statistical defense on this point. (I was a year out of grad school, and he a University Distinguished Professor.)

One time, years later, after hearing Jack give this howler for the nth time, “a Fisherian [but not a Bayesian] can cheat, etc.,” I was driving him to his office, and suddenly blurted out what I really thought:

“You know Jack, as many times as I have heard you tell this, I’ve always been baffled as to its lesson about who is allowed to cheat. Error statisticians require the overall and not the ‘computed’ significance level be reported. To us, what would be cheating would be reporting the significance level you got after trying and trying again in just the same way as if the test had a fixed sample size. True, we are forced to fret about how stopping rules alter the error probabilities of tests, while the Bayesian is free to ignore them, but why isn’t the real lesson that the Bayesian is allowed to cheat?” (A published version of my remark may be found in EGEK p. 351: “As often as my distinguished colleague presents this point…”)

To my surprise, or actually shock, after pondering this a bit, Jack said something like, “Hmm, I never thought of it this way.”

By the way, the story of the “after dinner Bayesian comedy hour” on this blog, did not allude to Jack but to someone who gave a much more embellished version. Since it’s Saturday night, let’s once again listen into the comedy hour that unfolded at my dinner table at an academic conference:

Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a Read more »

Categories: Bayesian/frequentist, Comedy, Statistics | | 68 Comments

## Stephen Senn: Also Smith and Jones

Also Smith and Jones[1]
by Stephen Senn

Head of Competence Center for Methodology and Statistics (CCMS)

This story is based on a paradox proposed to me by Don Berry. I have my own opinion on this but I find that opinion boring and predictable. The opinion of others is much more interesting and so I am putting this up for others to interpret.

Two scientists working for a pharmaceutical company collaborate in designing and running a clinical trial known as CONFUSE (Clinical Outcomes in Neuropathic Fibromyalgia in US Elderly). One of them, Smith is going to start another programme of drug development in a little while. The other one, Jones, will just be working on the current project. The planned sample size is 6000 patients.

Smith says that he would like to look at the experiment after 3000 patients in order to make an important decision as regards his other project. As far as he is concerned that’s good enough.

Jones is horrified. She considers that for other reasons CONFUSE should continue to recruit all 6000 and that on no account should the trial be stopped early.

Smith say that he is simply going to look at the data to decide whether to initiate a trial in a similar product being studied in the other project he will be working on. The fact that he looks should not affect Jones’s analysis.

Jones is still very unhappy and points out that the integrity of her trial is being compromised.

Smith suggests that all that she needs to do is to state quite clearly in the protocol that the trial will proceed whatever the result of the interim administrative look and she should just write that this is so in the protocol. The fact that she states publicly that on no account will she claim significance based on the first 3000 alone will reassure everybody including the FDA. (In drug development circles, FDA stands for Finally Decisive Argument.)

However, Jones insists. She wants to know what Smith will do if the result after 3000 patients is not significant.

Smith replies that in that case he will not initiate the trial in the parallel project. It will suggest to him that it is not worth going ahead.

Jones wants to know suppose that the results for the first 3000 are not significant what will Smith do once the results of all 6000 are in.

Smith replies that, of course, in that case he will have a look. If (but it seems to him an unlikely situation) the results based on all 6000 will be significant, even though the results based on the first 3000 were not, he may well decide that the treatment works after all and initiate his alternative program, regretting, of course, the time that has been lost.

Jones points out that Smith will not be controlling his type I error rate by this procedure.

‘OK’, Says Smith, ‘to satisfy you I will use adjusted type I error rates. You, of course, don’t have to.’

The trial is run. Smith looks after 3000 patients and concludes the difference is not significant. The trial continues on its planned course. Jones looks after 6000 and concludes it is significant P=0.049. Smith looks after 6000 and concludes it is not significant, P=0.052. (A very similar thing happened in the famous TORCH study(1))

Shortly after the conclusion of the trial, Smith and Jones are head-hunted and leave the company.  The brief is taken over by new recruit Evans.

What does Evans have on her hands: a significant study or not?

Reference

1.  Calverley PM, Anderson JA, Celli B, Ferguson GT, Jenkins C, Jones PW, et al. Salmeterol and fluticasone propionate and survival in chronic obstructive pulmonary disease. The New England journal of medicine. 2007;356(8):775-89.

[1] Not to be confused with either Alias Smith and Jones nor even Alas Smith and Jones

Categories: Philosophy of Statistics, Statistics | | 14 Comments

## Stephen Senn: On the (ir)relevance of stopping rules in meta-analysis

Senn in China

Stephen Senn

Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

George Barnard has had an important influence on the way I think about statistics. It was hearing him lecture in Aberdeen (I think) in the early 1980s (I think) on certain problems associated with Neyman confidence intervals that woke me to the problem of conditioning. Later as a result of a lecture he gave to the International Society of Clinical Biostatistics meeting in Innsbruck in 1988 we began a correspondence that carried on at irregular intervals until 2000. I continue to have reasons to be grateful for the patience an important and senior theoretical statistician showed to a junior and obscure applied one.

One of the things Barnard was adamant about was that you had to look at statistical problems with various spectacles. This is what I propose to do here, taking as an example meta-analysis. Suppose that it is the case that a meta-analyst is faced with a number of trials in a given field and that these trials have been carried out sequentially. In fact, to make the problem both simpler and more acute, suppose that no stopping rule adjustments have been made. Suppose, unrealistically, that each trial has identical planned maximum size but that a single interim analysis is carried out after a fraction f of information has been collected. For simplicity we suppose this fraction f to be the same for every trial. The questions is ‘should the meta-analyst ignore the stopping rule employed’? The answer is ‘yes’ or ‘no’ depending on how (s)he combines the information and, interestingly, this is not a question of whether the meta-analyst is Bayesian or not. Read more »

Categories: Philosophy of Statistics, Statistics | | 2 Comments

## After dinner Bayesian comedy hour….

Given it’s the first anniversary of this blog, which opened with the howlers in “Overheard at the comedy hour …” let’s listen in as a Bayesian holds forth on one of the most famous howlers of the lot: the mysterious role that psychological intentions are said to play in frequentist methods such as statistical significance tests. Here it is, essentially as I remember it (though shortened), in the comedy hour that unfolded at my dinner table at an academic conference:

Did you hear the one about the researcher who gets a phone call from the guy analyzing his data? First the guy congratulates him and says, “The results show a statistically significant difference at the .05 level—p-value .048.” But then, an hour later, the phone rings again. It’s the same guy, but now he’s apologizing. It turns out that the experimenter intended to keep sampling until the result was 1.96 standard deviations away from the 0 null—in either direction—so they had to reanalyze the data (n=169), and the results were no longer statistically significant at the .05 level.

Much laughter.

So the researcher is tearing his hair out when the same guy calls back again. “Congratulations!” the guy says. “I just found out that the experimenter actually had planned to take n=169 all along, so the results are statistically significant.”

Howls of laughter.

But then the guy calls back with the bad news . . .

It turns out that failing to score a sufficiently impressive effect after n’ trials, the experimenter went on to n” trials, and so on and so forth until finally, say, on trial number 169, he obtained a result 1.96 standard deviations from the null.

It continues this way, and every time the guy calls in and reports a shift in the p-value, the table erupts in howls of laughter! From everyone except me, sitting in stunned silence, staring straight ahead. The hilarity ensues from the idea that the experimenter’s reported psychological intentions about when to stop sampling is altering the statistical results. Read more »