Reblogging 2 years ago:
By: Stephen Senn
This year  marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).
The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.
The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.
Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading
Professor Stephen Senn*
Full Paper: Bad JAMA?
Short version–Opinion Article: Misunderstanding publication bias
The student undertaking a course in statistical inference may be left with the impression that what is important is the fundamental business of the statistical framework employed: should one be Bayesian or frequentist, for example? Where does one stand as regards the likelihood principle and so forth? Or it may be that these philosophical issues are not covered but that a great deal of time is spent on the technical details, for example, depending on framework, various properties of estimators, how to apply the method of maximum likelihood, or, how to implement Markov chain Monte Carlo methods and check for chain convergence. However much of this work will take place in a (mainly) theoretical kingdom one might name simple-random-sample-dom. Continue reading
Senn in China
Competence Centre for Methodology and Statistics
“The nuisance parameter nuisance”
A great deal of statistical debate concerns ‘univariate’ error, or disturbance, terms in models. I put ‘univariate’ in inverted commas because as soon as one writes a model of the form (say) Yi =Xiβ + Єi, i = 1 … n and starts to raise questions about the distribution of the disturbance terms, Єi one is frequently led into multivariate speculations, such as, ‘is the variance identical for every disturbance term?’ and, ‘are the disturbance terms independent?’ and not just speculations such as, ‘is the distribution of the disturbance terms Normal?’. Aris Spanos might also want me to put inverted commas around ‘disturbance’ (or ‘error’) since what I ought to be thinking about is the joint distribution of the outcomes, Yi conditional on the predictors.
However, in my statistical world of planning and analysing clinical trials, the differences made to inferences according to whether one uses parametric versus non-parametric methods is often minor. Of course, using non-parametric methods does nothing to answer the problem of non-independent observations but for experiments, as opposed to observational studies, you can frequently design-in independence. That is a major potential pitfall avoided but then there is still the issue of Normality. However, in my experience, this is rarely where the action is. Inferences rarely change dramatically on using ‘robust’ approaches (although one can always find examples with gross-outliers where they do). However, there are other sorts of problem that can affect data which can make a very big difference. Continue reading
Head of the Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg
An issue sometimes raised about randomized clinical trials is the problem of indefinitely many confounders. This, for example is what John Worrall has to say:
Even if there is only a small probability that an individual factor is unbalanced, given that there are indefinitely many possible confounding factors, then it would seem to follow that the probability that there is some factor on which the two groups are unbalanced (when remember randomly constructed) might for all anyone knows be high. (Worrall J. What evidence is evidence based medicine. Philosophy of Science 2002; 69: S316-S330: see page S324 )
It seems to me, however, that this overlooks four matters. The first is that it is not indefinitely many variables we are interested in but only one, albeit one we can’t measure perfectly. This variable can be called ‘outcome’. We wish to see to what extent the difference observed in outcome between groups is compatible with the idea that chance alone explains it. The indefinitely many covariates can help us predict outcome but they are only of interest to the extent that they do so. However, although we can’t measure the difference we would have seen in outcome between groups in the absence of treatment, we can measure how much it varies within groups (where the variation cannot be due to differences between treatments). Thus we can say a great deal about random variation to the extent that group membership is indeed random.
The second point is that in the absence of a treatment effect, where randomization has taken place, the statistical theory predicts probabilistically how the variation in outcome between groups relates to the variation within. Continue reading
Dear Reader: I am typing in some excerpts from a letter Stephen Senn shared with me in relation to my April 28, 2012 blogpost. It is a letter to the editor of Statistics in Medicine in response to S. Goodman. It contains several important points that get to the issues we’ve been discussing, and you may wish to track down the rest of it. Sincerely, D. G. Mayo
Statist. Med. 2002; 21:2437–2444 http://errorstatistics.files.wordpress.com/2013/12/goodman.pdf
STATISTICS IN MEDICINE, LETTER TO THE EDITOR
A comment on replication, p-values and evidence: S.N. Goodman, Statistics in Medicine 1992; 11:875–879
From: Stephen Senn*
Some years ago, in the pages of this journal, Goodman gave an interesting analysis of ‘replication probabilities’ of p-values. Specifically, he considered the possibility that a given experiment had produced a p-value that indicated ‘significance’ or near significance (he considered the range p=0.10 to 0.001) and then calculated the probability that a study with equal power would produce a significant result at the conventional level of significance of 0.05. He showed, for example, that given an uninformative prior, and (subsequently) a resulting p-value that was exactly 0.05 from the first experiment, the probability of significance in the second experiment was 50 per cent. A more general form of this result is as follows. If the first trial yields p=α then the probability that a second trial will be significant at significance level α (and in the same direction as the first trial) is 0.5. Continue reading
Dear Reader: My commentary, “How Can We Cultivate Senn’s Ability, Comment on Stephen Senn, ‘You May Believe You are a Bayesian But You’re Probably Wrong’” and Senn’s, “Names and Games, A Reply to Deborah G. Mayo” have been published under the Discussion Section of Rationality, Markets, and Morals.(Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”). http://www.rmm-journal.de/downloads/Comment_on_Senn.pdf
I encourage you to submit your comments/exchanges on any of the papers in this special volume [this is the first]. (Information may be found on their webpage. Questions/Ideas: please write to me at email@example.com.)
Picking up the pieces...
Continuing with our discussion of contributions to the special topic, Statistical Science and Philosophy of Science in Rationality, Markets and Morals (RMM),* I am pleased to post some comments on Andrew **Gelman’s paper “Induction and Deduction in Bayesian Data Analysis”. (More comments to follow—as always, feel free to comment.)
Note: March 9, 2012: Gelman has commented to some of our comments on his blog today: http://andrewgelman.com/2012/03/coming-to-agreement-on-philosophy-of-statistics/
For now, I will limit my own comments to two: First, a fairly uncontroversial point, while Gelman writes that “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive,” a main point of my series (Part 1, 2, 3) of “No-Pain” philosophy was that “deductive” falsification involves inductively inferring a “falsifying hypothesis”.
More importantly, and more challengingly, Gelman claims the view he recommends “corresponds closely to the error-statistics idea of Mayo (1996)”. Now the idea that non-Bayesian ideas might afford a foundation for strands of Bayesianism is not as implausible as it may seem. On the face of it, any inference to a claim, whether to the adequacy of a model (for a given purpose), or even to a posterior probability, can be said to be warranted just to the extent that the claim has withstood a severe test (i.e, a test that would, at least with reasonable probability, have discerned a flaw with the claim, were it false). The question is: How well do Gelman’s methods for inferring statistical models satisfy severity criteria? (I’m not sufficiently familiar with his intended applications to say.)
By: Stephen Senn
This year marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).
The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests. Continue reading
Senn will be glad to see that we haven’t forgotten him! (see this blog Jan. 14, Jan. 15, Jan. 23, and 24, 2012). He’s back on Gelman’s blog today .
I hope to hear some reflections this time around on the issue often noted but not discussed: updating and down dating (see this blog, Jan. 26, 2012).
Although, in one sense, Senn’s remarks echo the passage of Jim Berger’s that we deconstructed a few weeks ago, Senn at the same time seems to reach an opposite conclusion. He points out how, in practice, people who claim to have carried out a (subjective) Bayesian analysis have actually done something very different—but that then they heap credit on the Bayesian ideal. (See also the blog post “Who Is Doing the Work?”) Continue reading
The following is an extract (58-63) from the contribution by
Stephen Senn (Full article)
Head of the Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg
I am not arguing that the subjective Bayesian approach is not a good one to use. I am claiming instead that the argument is false that because some ideal form of this approach to reasoning seems excellent in theory it therefore follows that in practice using this and only this approach to reasoning is the right thing to do. A very standard form of argument I do object to is the one frequently encountered in many applied Bayesian papers where the first paragraphs lauds the Bayesian approach on various grounds, in particular its ability to synthesize all sources of information, and in the rest of the paper the authors assume that because they have used the Bayesian machinery of prior distributions and Bayes theorem they have therefore done a good analysis. It is this sort of author who believes that he or she is Bayesian but in practice is wrong. (58) Continue reading