# Posts Tagged With: Stephen Senn

## Excerpts from S. Senn’s Letter on “Replication, p-values and Evidence”

.

I first blogged this letter here. Below the references are some more recent blog links of relevance to this issue.

Dear Reader:  I am typing in some excerpts from a letter Stephen Senn shared with me in relation to my April 28, 2012 blogpost.  It is a letter to the editor of Statistics in Medicine  in response to S. Goodman. It contains several important points that get to the issues we’ve been discussing. You can read the full letter here. Sincerely, D. G. Mayo

STATISTICS IN MEDICINE, LETTER TO THE EDITOR

From: Stephen Senn*

Some years ago, in the pages of this journal, Goodman gave an interesting analysis of ‘replication probabilities’ of p-values. Specifically, he considered the possibility that a given experiment had produced a p-value that indicated ‘significance’ or near significance (he considered the range p=0.10 to 0.001) and then calculated the probability that a study with equal power would produce a significant result at the conventional level of significance of 0.05. He showed, for example, that given an uninformative prior, and (subsequently) a resulting p-value that was exactly 0.05 from the first experiment, the probability of significance in the second experiment was 50 per cent. A more general form of this result is as follows. If the first trial yields p=α then the probability that a second trial will be significant at significance level α (and in the same direction as the first trial) is 0.5. Continue reading

Categories: 4 years ago!, reproducibility, S. Senn, Statistics |

## Stephen Senn: Fisher’s Alternative to the Alternative

.

As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog Senn from 3 years ago.

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading

Categories: Fisher, Statistics, Stephen Senn |

## STEPHEN SENN: Fisher’s alternative to the alternative

Reblogging 2 years ago:

By: Stephen Senn

This year [2012] marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading

Categories: Fisher, Statistics, Stephen Senn |

## Stephen Senn: Also Smith and Jones

Also Smith and Jones[1]
by Stephen Senn

Head of Competence Center for Methodology and Statistics (CCMS)

This story is based on a paradox proposed to me by Don Berry. I have my own opinion on this but I find that opinion boring and predictable. The opinion of others is much more interesting and so I am putting this up for others to interpret.

Two scientists working for a pharmaceutical company collaborate in designing and running a clinical trial known as CONFUSE (Clinical Outcomes in Neuropathic Fibromyalgia in US Elderly). One of them, Smith is going to start another programme of drug development in a little while. The other one, Jones, will just be working on the current project. The planned sample size is 6000 patients.

Smith says that he would like to look at the experiment after 3000 patients in order to make an important decision as regards his other project. As far as he is concerned that’s good enough.

Jones is horrified. She considers that for other reasons CONFUSE should continue to recruit all 6000 and that on no account should the trial be stopped early.

Smith say that he is simply going to look at the data to decide whether to initiate a trial in a similar product being studied in the other project he will be working on. The fact that he looks should not affect Jones’s analysis.

Jones is still very unhappy and points out that the integrity of her trial is being compromised.

Smith suggests that all that she needs to do is to state quite clearly in the protocol that the trial will proceed whatever the result of the interim administrative look and she should just write that this is so in the protocol. The fact that she states publicly that on no account will she claim significance based on the first 3000 alone will reassure everybody including the FDA. (In drug development circles, FDA stands for Finally Decisive Argument.)

However, Jones insists. She wants to know what Smith will do if the result after 3000 patients is not significant.

Smith replies that in that case he will not initiate the trial in the parallel project. It will suggest to him that it is not worth going ahead.

Jones wants to know suppose that the results for the first 3000 are not significant what will Smith do once the results of all 6000 are in.

Smith replies that, of course, in that case he will have a look. If (but it seems to him an unlikely situation) the results based on all 6000 will be significant, even though the results based on the first 3000 were not, he may well decide that the treatment works after all and initiate his alternative program, regretting, of course, the time that has been lost.

Jones points out that Smith will not be controlling his type I error rate by this procedure.

‘OK’, Says Smith, ‘to satisfy you I will use adjusted type I error rates. You, of course, don’t have to.’

The trial is run. Smith looks after 3000 patients and concludes the difference is not significant. The trial continues on its planned course. Jones looks after 6000 and concludes it is significant P=0.049. Smith looks after 6000 and concludes it is not significant, P=0.052. (A very similar thing happened in the famous TORCH study(1))

Shortly after the conclusion of the trial, Smith and Jones are head-hunted and leave the company.  The brief is taken over by new recruit Evans.

What does Evans have on her hands: a significant study or not?

Reference

1.  Calverley PM, Anderson JA, Celli B, Ferguson GT, Jenkins C, Jones PW, et al. Salmeterol and fluticasone propionate and survival in chronic obstructive pulmonary disease. The New England journal of medicine. 2007;356(8):775-89.

[1] Not to be confused with either Alias Smith and Jones nor even Alas Smith and Jones

Categories: Philosophy of Statistics, Statistics |

## Guest post: Bad Pharma? (S. Senn)

Professor Stephen Senn*
Full Paper: Bad JAMA?
Short version–Opinion Article: Misunderstanding publication bias
Video below

# Data filters

The student undertaking a course in statistical inference may be left with the impression that what is important is the fundamental business of the statistical framework employed: should one be Bayesian or frequentist, for example? Where does one stand as regards the likelihood principle and so forth? Or it may be that these philosophical issues are not covered but that a great deal of time is spent on the technical details, for example, depending on framework, various properties of estimators, how to apply the method of maximum likelihood, or, how to implement Markov chain Monte Carlo methods and check for chain convergence. However much of this work will take place in a (mainly) theoretical kingdom one might name simple-random-sample-dom. Continue reading

Categories: Statistics, Stephen Senn | Tags: ,

## Stephen Senn: On the (ir)relevance of stopping rules in meta-analysis

Senn in China

Stephen Senn

Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

George Barnard has had an important influence on the way I think about statistics. It was hearing him lecture in Aberdeen (I think) in the early 1980s (I think) on certain problems associated with Neyman confidence intervals that woke me to the problem of conditioning. Later as a result of a lecture he gave to the International Society of Clinical Biostatistics meeting in Innsbruck in 1988 we began a correspondence that carried on at irregular intervals until 2000. I continue to have reasons to be grateful for the patience an important and senior theoretical statistician showed to a junior and obscure applied one.

One of the things Barnard was adamant about was that you had to look at statistical problems with various spectacles. This is what I propose to do here, taking as an example meta-analysis. Suppose that it is the case that a meta-analyst is faced with a number of trials in a given field and that these trials have been carried out sequentially. In fact, to make the problem both simpler and more acute, suppose that no stopping rule adjustments have been made. Suppose, unrealistically, that each trial has identical planned maximum size but that a single interim analysis is carried out after a fraction f of information has been collected. For simplicity we suppose this fraction f to be the same for every trial. The questions is ‘should the meta-analyst ignore the stopping rule employed’? The answer is ‘yes’ or ‘no’ depending on how (s)he combines the information and, interestingly, this is not a question of whether the meta-analyst is Bayesian or not. Continue reading

Categories: Philosophy of Statistics, Statistics |

## Stephen Senn: The nuisance parameter nuisance

Senn in China

Stephen Senn

Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

“The nuisance parameter nuisance”

A great deal of statistical debate concerns ‘univariate’ error, or disturbance, terms in models. I put ‘univariate’ in inverted commas because as soon as one writes a model of the form (say) Yi =Xiβ + Єi, i = 1 … n and starts to raise questions about the distribution of the disturbance terms, Єi one is frequently led into multivariate speculations, such as, ‘is the variance identical for every disturbance term?’ and, ‘are the disturbance terms independent?’ and not just speculations such as, ‘is the distribution of the disturbance terms Normal?’. Aris Spanos might also want me to put inverted commas around ‘disturbance’ (or ‘error’) since what I ought to be thinking about is the joint distribution of the outcomes, Yi conditional on the predictors.

However, in my statistical world of planning and analysing clinical trials, the differences made to inferences according to whether one uses parametric versus non-parametric methods is often minor. Of course, using non-parametric methods does nothing to answer the problem of non-independent observations but for experiments, as opposed to observational studies, you can frequently design-in independence. That is a major potential pitfall avoided but then there is still the issue of Normality. However, in my experience, this is rarely where the action is. Inferences rarely change dramatically on using ‘robust’ approaches (although one can always find examples with gross-outliers where they do). However, there are other sorts of problem that can affect data which can make a very big difference. Continue reading

Categories: Philosophy of Statistics, Statistics |

## Stephen Senn: Fooling the Patient: an Unethical Use of Placebo? (Phil/Stat/Med)

Senn in China

Stephen Senn
Competence Centre for Methodology and Statistics
CRP Santé
Strassen, Luxembourg

I think the placebo gets a bad press with ethicists. Many do not seem to understand that the only purpose of a placebo as a control in a randomised clinical trial is to permit the trial to be run as double-blind. A common error is to assume that the giving of a placebo implies the withholding of a known effective treatment. In fact many placebo controlled trials are ‘add-on’ trials in which all patients get proven (partially) effective treatment. We can refer to such treatment as standard common background therapy.  In addition, one group gets an unproven experimental treatment and the other a placebo. Used in this way in a randomised clinical trial, the placebo can be a very useful way to increase the precision of our inferences.

A control group helps eliminate many biases: trend effects affecting the patients, local variations in illness, trend effects in assays and regression to the mean. But such biases could be eliminated by having a group given nothing (apart from the standard common background therapy). Only a placebo, however, can allow patients and physicians to be uncertain whether the experimental treatment is being given or not. And ‘blinding’ or ‘masking’ can play a valuable role in eliminating that bias which is due to either expectation of efficacy or fear of side-effects.

However, there is one use of placebo I consider unethical. In many clinical trials a so-called ‘placebo run-in’ is used. That is to say, there is a period after patients are enrolled in the trial and before they are randomised to one of the treatment groups when all of the patients are given a placebo.  The reasons can be to stabilise the patients or to screen out those who are poor compliers before the trial proper begins. Indeed, the FDA encourages this use of placebo and, for example, in a 2008 guideline on developing drugs for Diabetes advises:  ‘In addition, placebo run-in periods in phase 3 studies can help screen out noncompliant subjects’. Continue reading

Categories: Statistics |

## Stephen Senn: Randomization, ratios and rationality: rescuing the randomized clinical trial from its critics

Stephen Senn
Head of the Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg

An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:

Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence based medicine. Philosophy of Science 2002; 69: S316-S330: see page S324 )

It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.

The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within. Continue reading

Categories: Statistics |

## Excerpts from S. Senn’s Letter on “Replication, p-values and Evidence,”

Dear Reader:  I am typing in some excerpts from a letter Stephen Senn shared with me in relation to my April 28, 2012 blogpost.  It is a letter to the editor of Statistics in Medicine  in response to S. Goodman. It contains several important points that get to the issues we’ve been discussing, and you may wish to track down the rest of it. Sincerely, D. G. Mayo

Statist. Med. 2002; 21:2437–2444  https://errorstatistics.files.wordpress.com/2013/12/goodman.pdf

STATISTICS IN MEDICINE, LETTER TO THE EDITOR

A comment on replication, p-values and evidence: S.N. Goodman, Statistics in Medicine 1992; 11:875–879

From: Stephen Senn*

Some years ago, in the pages of this journal, Goodman gave an interesting analysis of ‘replication probabilities’ of p-values. Specifically, he considered the possibility that a given experiment had produced a p-value that indicated ‘significance’ or near significance (he considered the range p=0.10 to 0.001) and then calculated the probability that a study with equal power would produce a significant result at the conventional level of significance of 0.05. He showed, for example, that given an uninformative prior, and (subsequently) a resulting p-value that was exactly 0.05 from the first experiment, the probability of significance in the second experiment was 50 per cent. A more general form of this result is as follows. If the first trial yields p=α then the probability that a second trial will be significant at significance level α (and in the same direction as the first trial) is 0.5. Continue reading

Categories: Statistics |

## RMM-7: Commentary and Response on Senn published: Special Volume on Stat Scie Meets Phil Sci

Dear Reader: My commentary, “How Can We Cultivate Senn’s Ability, Comment on Stephen Senn, ‘You May Believe You are a Bayesian But You’re Probably Wrong’” and Senn’s, “Names and Games, A Reply to Deborah G. Mayo” have been published under the Discussion Section of Rationality, Markets, and Morals.(Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”)

I encourage you to submit your comments/exchanges on any of the papers in this special volume [this is the first].  (Information may be found on their webpage [no longer active 3/21/2021]. Questions/Ideas: please write to me at error@vt.edu.)

Categories: Philosophy of Statistics, Statistics | Tags:

## Mayo, Senn, and Wasserman on Gelman’s RMM** Contribution

Picking up the pieces…

Continuing with our discussion of contributions to the special topic,  Statistical Science and Philosophy of Science in Rationality, Markets and Morals (RMM),* I am pleased to post some comments on Andrew **Gelman’s paper “Induction and Deduction in Bayesian Data Analysis”.  (More comments to follow—as always, feel free to comment.)

Note: March 9, 2012: Gelman has commented to some of our comments on his blog today: http://andrewgelman.com/2012/03/coming-to-agreement-on-philosophy-of-statistics/

## D. Mayo

For now, I will limit my own comments to two: First, a fairly uncontroversial point, while Gelman writes that “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive,” a main point of my series (Part 123) of “No-Pain” philosophy was that “deductive” falsification involves inductively inferring a “falsifying hypothesis”.

More importantly, and more challengingly, Gelman claims the view he recommends “corresponds closely to the error-statistics idea of Mayo (1996)”.  Now the idea that non-Bayesian ideas might afford a foundation for strands of Bayesianism is not as implausible as it may seem. On the face of it, any inference to a claim, whether to the adequacy of a model (for a given purpose), or even to a posterior probability, can be said to be warranted just to the extent that the claim has withstood a severe test (i.e, a test that would, at least with reasonable probability, have discerned a flaw with the claim, were it false).  The question is: How well do Gelman’s methods for inferring statistical models satisfy severity criteria?  (I’m not sufficiently familiar with his intended applications to say.)

## Continue reading →

Categories: Philosophy of Statistics, Statistics, U-Phil |

## Guest Blogger. STEPHEN SENN: Fisher’s alternative to the alternative

By: Stephen Senn

This year marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading

Categories: Statistics |

## Senn Again (Gelman)

Senn will be glad to see that we haven’t forgotten him!  (see this blog Jan. 14, Jan. 15,  Jan. 23, and 24, 2012).  He’s back on Gelman’s blog today .

http://andrewgelman.com/2012/02/philosophy-of-bayesian-statistics-my-reactions-to-senn/

I hope to hear some reflections this time around on the issue often noted but not discussed: updating and down dating (see this blog, Jan. 26, 2012).

Categories: Philosophy of Statistics, Statistics |

## Updating & Downdating: One of the Pieces to Pick up on

pieces to pick up on (later)

Before moving on to a couple of rather different areas, there’s an issue that, while mentioned by both Senn and Gelman, did not come up for discussion; so let me just note it here as one of the pieces to pick up on later.

“It is hard to see what exactly a Bayesian statistician is doing when interacting with a client. There is an initial period in which the subjective beliefs of the client are established. These prior probabilities are taken to be valuable enough to be incorporated in subsequent calculation. However, in subsequent steps the client is not trusted to reason. The reasoning is carried out by the statistician. As an exercise in mathematics it is not superior to showing the client the data, eliciting a posterior distribution and then calculating the prior distribution; as an exercise in inference Bayesian updating does not appear to have greater claims than ‘downdating’ and indeed sometimes this point is made by Bayesians when discussing what their theory implies. (59)…..” Stephen Senn

“As I wrote in 2008, if you could really construct a subjective prior you believe in, why not just look at the data and write down your subjective posterior.” Andrew Gelman commenting on Senn

I’ve even heard subjective Bayesians concur on essentially this identical point, but I would think that many would take issue with it…no?

Categories: Statistics |

## U-PHIL (3): Stephen Senn on Stephen Senn!

I am grateful to Deborah Mayo for having highlighted my recent piece. I am not sure that it deserves the attention it is receiving.Deborah has spotted a flaw in my discussion of pragmatic Bayesianism. In praising the use of background knowledge I can neither be talking about automatic Bayesianism nor about subjective Bayesianism. It is clear that background knowledge ought not generally to lead to uninformative priors (whatever they might be) and so is not really what objective Bayesianism is about. On the other hand all subjective Bayesians care about is coherence and it is easy to produce examples where Bayesians quite logically will react differently to evidence, so what exactly is ‘background knowledge’?. Continue reading

Categories: Philosophy of Statistics, Statistics, U-Phil |

## Mayo Philosophizes on Stephen Senn: "How Can We Cultivate Senn’s-Ability?"

Where’s Mayo?

Although, in one sense, Senn’s remarks echo the passage of Jim Berger’s that we deconstructed a few weeks ago, Senn at the same time seems to reach an opposite conclusion. He points out how, in practice, people who claim to have carried out a (subjective) Bayesian analysis have actually done something very different—but that then they heap credit on the Bayesian ideal. (See also the blog post “Who Is Doing the Work?”) Continue reading

## “You May Believe You Are a Bayesian But You Are Probably Wrong”

The following is an extract (58-63) from the contribution by