# Stephen Senn

## S. Senn: “A Vaccine Trial from A to Z” with a Postscript (guest post)

.

Stephen Senn
Consultant Statistician
Edinburgh, Scotland

# Alpha and Omega (or maybe just Beta)

Well actually, not from A to Z but from AZ. That is to say, the trial I shall consider is the placebo- controlled trial of the Oxford University vaccine for COVID-19 currently being run by AstraZeneca (AZ) under protocol AZD1222 – D8110C00001 and which I considered in a previous blog, Heard Immunity. A summary of the design  features is given in Table 1. The purpose of this blog is to look a little deeper at features of the trial and the way I am going to do so is with the help of geometric representations of the sample space, that is to say the possible results the trial could produce. However, the reader is warned that I am only an amateur in all this. The true professionals are the statisticians at AZ who, together with their life science colleagues in AZ and Oxford, designed the trial. Continue reading

Categories: covid-19, RCTs, Stephen Senn

## S. Senn: “Error point: The importance of knowing how much you don’t know” (guest post)

.

Stephen Senn
Consultant Statistician
Edinburgh

‘The term “point estimation” made Fisher nervous, because he associated it with estimation without regard to accuracy, which he regarded as ridiculous.’ Jimmy Savage [1, p. 453]

## First things second

The classic text by David Cox and David Hinkley, Theoretical Statistics (1974), has two extremely interesting features as regards estimation. The first is in the form of an indirect, implicit, message and the second explicit and both teach that point estimation is far from being an obvious goal of statistical inference. The indirect message is that the chapter on point estimation (chapter 8) comes after that on interval estimation (chapter 7). This may puzzle the reader, who may anticipate that the complications of interval estimation would be handled after the apparently simpler point estimation rather than before. However, with the start of chapter 8, the reasoning is made clear. Cox and Hinkley state: Continue reading

Categories: Fisher, randomization, Stephen Senn | Tags:

## S. Senn: Red herrings and the art of cause fishing: Lord’s Paradox revisited (Guest post)

Stephen Senn
Consultant Statistician
Edinburgh

## Background

Previous posts[a],[b],[c] of mine have considered Lord’s Paradox. To recap, this was considered in the form described by Wainer and Brown[1], in turn based on Lord’s original formulation:

A large university is interested in investigating the effects on the students of the diet provided in the university dining halls : : : . Various types of data are gathered. In particular, the weight of each student at the time of his arrival in September and his weight the following June are recorded. [2](p. 304)

The issue is whether the appropriate analysis should be based on change-scores (weight in June minus weight in September), as proposed by a first statistician (whom I called John) or analysis of covariance (ANCOVA), using the September weight as a covariate, as proposed by a second statistician (whom I called Jane). There was a difference in mean weight between halls at the time of arrival in September (baseline) and this difference turned out to be identical to the difference in June (outcome). It thus follows that, since the analysis of change score is algebraically equivalent to correcting the difference between halls at outcome by the difference between halls at baseline, the analysis of change scores returns an estimate of zero. The conclusion is thus, there being no difference between diets, diet has no effect. Continue reading

Categories: Stephen Senn

## Stephen Senn: On the level. Why block structure matters and its relevance to Lord’s paradox (Guest Post)

.

Stephen Senn
Consultant Statistician
Edinburgh

Introduction

In a previous post I considered Lord’s paradox from the perspective of the ‘Rothamsted School’ and its approach to the analysis of experiments. I now illustrate this in some detail giving an example.

What I shall do

I have simulated data from an experiment in which two diets have been compared in 20 student halls of residence, each diet having been applied to 10 halls. I shall assume that the halls have been randomly allocated the diet and that in each hall 10 students have been randomly chosen to have their weights recorded at the beginning of the academic year and again at the end. Continue reading

## S. Senn: Fishing for fakes with Fisher (Guest Post)

.

Stephen Senn
for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

# Fishing for fakes with Fisher

### Stephen Senn

The essential fact governing our analysis is that the errors due to soil heterogeneity will be divided by a good experiment into two portions. The first, which is to be made as large as possible, will be completely eliminated, by the arrangement of the experiment, from the experimental comparisons, and will be as carefully eliminated in the statistical laboratory from the estimate of error. As to the remainder, which cannot be treated in this way, no attempt will be made to eliminate it in the field, but, on the contrary, it will be carefully randomised so as to provide a valid estimate of the errors to which the experiment is in fact liable. R. A. Fisher, The Design of Experiments, (Fisher 1990) section 28.

## Fraudian analysis?

John Carlisle must be a man endowed with exceptional energy and determination. A recent paper of his is entitled, ‘Data fabrication and other reasons for non-random sampling in 5087 randomised, controlled trials in anaesthetic and general medical journals,’ (Carlisle 2017) and has created quite a stir. The journals examined include the Journal of the American Medical Association and the New England Journal of Medicine. What Carlisle did was examine 29,789 variables using 72,261 means to see if they were ‘consistent with random sampling’ (by which, I suppose, he means ‘randomisation’). The papers chosen had to report either standard deviations or standard errors of the mean. P-values as measures of balance or lack of it were then calculated using each of three methods and the method that gave the value closest to 0.5 was chosen. For a given trial the P-values chosen were then back-converted to z-scores combined by summing them and then re-converted back to P-values using a method that assumes the summed Z-scores to be independent. As Carlisle writes, ‘All p values were one-sided and inverted, such that dissimilar means generated p values near 1’. Continue reading

Categories: Fisher, RCTs, Stephen Senn

## The ASA Document on P-Values: One Year On

I’m surprised it’s a year already since posting my published comments on the ASA Document on P-Values. Since then, there have been a slew of papers rehearsing the well-worn fallacies of tests (a tad bit more than the usual rate). Doubtless, the P-value Pow Wow raised people’s consciousnesses. I’m interested in hearing reader reactions/experiences in connection with the P-Value project (positive and negative) over the past year. (Use the comments, share links to papers; and/or send me something slightly longer for a possible guest post.)
Some people sent me a diagram from a talk by Stephen Senn (on “P-values and the art of herding cats”). He presents an array of different cat commentators, and for some reason Mayo cat is in the middle but way over on the left side,near the wall. I never got the key to interpretation.  My contribution is below:

Chart by S.Senn

“Don’t Throw Out The Error Control Baby With the Bad Statistics Bathwater”

D. Mayo*[1]

The American Statistical Association is to be credited with opening up a discussion into p-values; now an examination of the foundations of other key statistical concepts is needed. Continue reading

## S. Senn: “Placebos: it’s not only the patients that are fooled” (Guest Post)

Stephen Senn

Placebos: it’s not only the patients that are fooled

Stephen Senn
Head of  Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

In my opinion a great deal of ink is wasted to little purpose in discussing placebos in clinical trials. Many commentators simply do not understand the nature and purpose of placebos. To start with the latter, their only purpose is to permit blinding of treatments and, to continue to the former, this implies that their nature is that they are specific to the treatment studied.

Consider an example. Suppose that Pannostrum Pharmaceuticals wishes to prove that its new treatment for migraine, Paineaze® (which is in the form of a small red circular pill) is superior to the market-leader offered by Allexir Laboratories, Kalmer® (which is a large purple lozenge). Pannostrum decides to do a head-to head comparison and of course, therefore will require placebos. Every patient will have to take a red pill and a purple lozenge. In the Paineaze arm what is red will be Paineaze and what is purple ‘placebo to Kalmer’. In the Kalmer arm what is red will be ‘placebo to Paineaze’ and what is purple will be Kalmer.

Categories: PhilPharma, PhilStat/Med, Statistics, Stephen Senn

## S. Senn: “Painful dichotomies” (Guest Post)

.

Stephen Senn
Head of  Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

Painful dichotomies

The tweet read “Featured review: Only 10% people with tension-type headaches get a benefit from paracetamol” and immediately I thought, ‘how would they know?’ and almost as quickly decided, ‘of course they don’t know, they just think they know’. Sure enough, on following up the link to the Cochrane Review in the tweet it turned out that, yet again, the deadly mix of dichotomies and numbers needed to treat had infected the brains of researchers to the extent that they imagined that they had identified personal response. (See Responder Despondency for a previous post on this subject.)

The bare facts they established are the following:

The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo.

and the false conclusion they immediately asserted is the following

This means that only 10 in 100 or 10% of people benefited because of paracetamol 1000 mg.

To understand the fallacy, look at the accompanying graph. Continue reading

Categories: junk science, PhilStat/Med, Statistics, Stephen Senn

## Stephen Senn: Double Jeopardy?: Judge Jeffreys Upholds the Law (sequel to the pathetic P-value)[4]

S. Senn

Stephen Senn
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

Double Jeopardy?: Judge Jeffreys Upholds the Law*[4]

“But this could be dealt with in a rough empirical way by taking twice the standard error as a criterion for possible genuineness and three times the standard error for definite acceptance”. Harold Jeffreys(1) (p386)

This is the second of two posts on P-values. In the first, The Pathetic P-Value, I considered the relation of P-values to Laplace’s Bayesian formulation of induction, pointing out that that P-values, whilst they had a very different interpretation, were numerically very similar to a type of Bayesian posterior probability. In this one, I consider their relation or lack of it, to Harold Jeffreys’s radically different approach to significance testing. (An excellent account of the development of Jeffreys’s thought is given by Howie(2), which I recommend highly.)

The story starts with Cambridge philosopher CD Broad (1887-1971), who in 1918 pointed to a difficulty with Laplace’s Law of Succession. Broad considers the problem of drawing counters from an urn containing n counters and supposes that all m drawn had been observed to be white. He now considers two very different questions, which have two very different probabilities and writes: Continue reading

Categories: Jeffreys, P-values, reforming the reformers, Stephen Senn | Tags:

## Stephen Senn: Double Jeopardy?: Judge Jeffreys Upholds the Law (sequel to the pathetic P-value)

S. Senn

Stephen Senn
Head of Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health

Double Jeopardy?: Judge Jeffreys Upholds the Law

“But this could be dealt with in a rough empirical way by taking twice the standard error as a criterion for possible genuineness and three times the standard error for definite acceptance”. Harold Jeffreys(1) (p386)

This is the second of two posts on P-values. In the first, The Pathetic P-Value, I considered the relation of P-values to Laplace’s Bayesian formulation of induction, pointing out that that P-values, whilst they had a very different interpretation, were numerically very similar to a type of Bayesian posterior probability. In this one, I consider their relation or lack of it, to Harold Jeffreys’s radically different approach to significance testing. (An excellent account of the development of Jeffreys’s thought is given by Howie(2), which I recommend highly.)

The story starts with Cambridge philosopher CD Broad (1887-1971), who in 1918 pointed to a difficulty with Laplace’s Law of Succession. Broad considers the problem of drawing counters from an urn containing n counters and supposes that all m drawn had been observed to be white. He now considers two very different questions, which have two very different probabilities and writes:

Note that in the case that only one counter remains we have n = m + 1 and the two probabilities are the same. However, if n > m+1 they are not the same and in particular if m is large but n is much larger, the first probability can approach 1 whilst the second remains small.

The practical implication of this just because Bayesian induction implies that a large sequence of successes (and no failures) supports belief that the next trial will be a success, it does not follow that one should believe that all future trials will be so. This distinction is often misunderstood. This is The Economist getting it wrong in September 2000

The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child’s degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.

See Dicing with Death(3) (pp76-78).

The practical relevance of this is that scientific laws cannot be established by Laplacian induction. Jeffreys (1891-1989) puts it thus

Thus I may have seen 1 in 1000 of the ‘animals with feathers’ in England; on Laplace’s theory the probability of the proposition, ‘all animals with feathers have beaks’, would be about 1/1000. This does not correspond to my state of belief or anybody else’s. (P128)

## Sir Harold Jeffreys’ (tail area) one-liner: Saturday night comedy (b)

.

This headliner appeared before, but to a sparse audience, so Management’s giving him another chance… His joke relates to both Senn’s post (about alternatives), and to my recent post about using (1 – β)/α as a likelihood ratio--but for very different reasons. (I’ve explained at the bottom of this “(b) draft”.)

….If you look closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike, (especially as he’s no longer doing the Tonight Show) ….

.

It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler joke* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

What’s unusual is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

## Stephen Senn: Fisher’s Alternative to the Alternative

.

As part of the week of recognizing R.A.Fisher (February 17, 1890 – July 29, 1962), I reblog Senn from 3 years ago.

‘Fisher’s alternative to the alternative’

By: Stephen Senn

[2012 marked] the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in 1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading

Categories: Fisher, Statistics, Stephen Senn |

## What’s wrong with taking (1 – β)/α, as a likelihood ratio comparing H0 and H1?

.

Here’s a quick note on something that I often find in discussions on tests, even though it treats “power”, which is a capacity-of-test notion, as if it were a fit-with-data notion…..

1. Take a one-sided Normal test T+: with n iid samples:

H0: µ ≤  0 against H1: µ >  0

σ = 10,  n = 100,  σ/√n =σx= 1,  α = .025.

So the test would reject H0 iff Z > c.025 =1.96. (1.96. is the “cut-off”.)

~~~~~~~~~~~~~~

1. Simple rules for alternatives against which T+ has high power:
• If we add σx (here 1) to the cut-off (here, 1.96) we are at an alternative value for µ that test T+ has .84 power to detect.
• If we add 3σto the cut-off we are at an alternative value for µ that test T+ has ~ .999 power to detect. This value, which we can write as µ.999 = 4.96

Let the observed outcome just reach the cut-off to reject the null,z= 1.96.

If we were to form a “likelihood ratio” of μ = 4.96 compared to μ0 = 0 using

[Power(T+, 4.96)]/α,

it would be 40.  (.999/.025).

It is absurd to say the alternative 4.96 is supported 40 times as much as the null, understanding support as likelihood or comparative likelihood. (The data 1.96 are even closer to 0 than to 4.96). The same point can be made with less extreme cases.) What is commonly done next is to assign priors of .5 to the two hypotheses, yielding

Pr(H0 |z0) = 1/ (1 + 40) = .024, so Pr(H1 |z0) = .976.

Such an inference is highly unwarranted and would almost always be wrong. Continue reading

## 3 YEARS AGO: (JANUARY 2012) MEMORY LANE

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: January 2012. I mark in red three posts that seem most apt for general background on key issues in this blog.

January 2012

• (1/3) Model Validation and the LLP-(Long Playing Vinyl Record)
• (1/8) Don’t Birnbaumize that Experiment my Friend*
• (1/10) Bad-Faith Assertions of Conflicts of Interest?*
• (1/13) U-PHIL: “So you want to do a philosophical analysis?”
• (1/14)  But You Are Probably Wrong” (Extract from Senn RMM article)
• (1/15) U-Phil  “How Can We Culivate Senn’s-Ability?”
• (1/17) “Philosophy of Statistics”: Nelder on Lindley
• (1/19) RMM-6 Special Volume on Stat Sci Meets Phil Sci (Sprenger)
• (1/22) U-Phil: Stephen Senn (1): C. Robert, A. Jaffe, and Mayo (brief remarks)
• (1/23): Andrew Gelman
• (1/24) U-Phil (3): Stephen Senn on Stephen Senn!
• (1/26) Updating & Downdating: One of the Pieces to Pick up
• (1/29)  Skepticism, Rationality, Popper, and All That: First of 3 Parts

This new, once-a-month, feature began at the blog’s 3-year anniversary in Sept, 2014. I will count U-Phil’s on a single paper as one of the three I highlight (else I’d have to choose between them). I will comment on  3-year old posts from time to time.

This Memory Lane needs a bit of explanation. This blog began largely as a forum to discuss a set of contributions from a conference I organized (with A. Spanos and J. Miller*) “Statistical Science and Philosophy of Science: Where Do (Should) They meet?”at the London School of Economics, Center for the Philosophy of Natural and Social Science, CPNSS, in June 2010 (where I am a visitor). Additional papers grew out of conversations initiated soon after (with Andrew Gelman and Larry Wasserman). The conference site is here.  My reflections in this general arena (Sept. 26, 2012) are here.

As articles appeared in a special topic of the on-line journal, Rationality, Markets and Morals (RMM), edited by Max Albert[i]—also a conference participant —I would announce an open invitation to readers to take a couple of weeks to write an extended comment.  Each “U-Phil”–which stands for “U philosophize”- was a contribution to this activity. I plan to go back to that exercise at some point.  Generally I would give a “deconstruction” of the paper first, followed by U-Phils, and then the author gave responses to U-Phils and me as they wished. You can readily search this blog for all the U-Phils and deconstructions**.

I was also keeping a list of issues that we either haven’t taken up, or need to return to. One example here is: Bayesian updating and down dating. Further notes about the origins of this blog are here. I recommend everyone reread Senn’s paper.**

For newcomers, here’s your chance to catch-up; for old timers,this is philosophy: rereading is essential!

[i] Along with Hartmut Kliemt and Bernd Lahno.

*For a full list of collaborators, sponsors, logisticians, and related collaborations, see the conference page. The full list of speakers is found there as well.

**The U-Phil exchange between Mayo and Senn was published in the same special topic of RIMM. But I still wish to know how we can cultivate “Senn’s-ability.” We could continue that activity as well, perhaps.

Previous 3 YEAR MEMORY LANES:

Categories: 3-year memory lane, blog contents, Statistics, Stephen Senn, U-Phil

## S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)

.

Stephen Senn
Competence Center for Methodology and Statistics (CCMS)
Luxembourg

Responder despondency: myths of personalized medicine

The road to drug development destruction is paved with good intentions. The 2013 FDA report, Paving the Way for Personalized Medicine  has an encouraging and enthusiastic foreword from Commissioner Hamburg and plenty of extremely interesting examples stretching back decades. Given what the report shows can be achieved on occasion, given the enthusiasm of the FDA and its commissioner, given the amazing progress in genetics emerging from the labs, a golden future of personalized medicine surely awaits us. It would be churlish to spoil the party by sounding a note of caution but I have never shirked being churlish and that is exactly what I am going to do. Continue reading

Categories: evidence-based policy, Statistics, Stephen Senn

## Stephen Senn: Blood Simple? The complicated and controversial world of bioequivalence (guest post)

Blood Simple?
The complicated and controversial world of bioequivalence

by Stephen Senn*

Those not familiar with drug development might suppose that showing that a new pharmaceutical formulation (say a generic drug) is equivalent to a formulation that has a licence (say a brand name drug) ought to be simple. However, it can often turn out to be bafflingly difficult[1]. Continue reading

## Power taboos: Statue of Liberty, Senn, Neyman, Carnap, Severity

Is it taboo to use a test’s power to assess what may be learned from the data in front of us? (Is it limited to pre-data planning?) If not entirely taboo, some regard power as irrelevant post-data[i], and the reason I’ve heard is along the lines of an analogy Stephen Senn gave today (in a comment discussing his last post here)[ii].

Senn comment: So let me give you another analogy to your (very interesting) fire alarm analogy (My analogy is imperfect but so is the fire alarm.) If you want to cross the Atlantic from Glasgow you should do some serious calculations to decide what boat you need. However, if several days later you arrive at the Statue of Liberty the fact that you see it is more important than the size of the boat for deciding that you did, indeed, cross the Atlantic.

My fire alarm analogy is here. My analogy presumes you are assessing the situation (about the fire) long distance. Continue reading

## Stephen Senn: “Delta Force: To what extent is clinical relevance relevant?” (Guest Post)

Senn

Stephen Senn
Competence Center for Methodology and Statistics (CCMS),
Luxembourg

Delta Force
To what extent is clinical relevance relevant?

Inspiration
This note has been inspired by a Twitter exchange with respected scientist and famous blogger  David Colquhoun. He queried whether a treatment that had 2/3 of an effect that would be described as clinically relevant could be useful. I was surprised at the question, since I would regard it as being pretty obvious that it could but, on reflection, I realise that things that may seem obvious to some who have worked in drug development may not be obvious to others, and if they are not obvious to others are either in need of a defence or wrong. I don’t think I am wrong and this note is to explain my thinking on the subject. Continue reading

Categories: power, Statistics, Stephen Senn

## Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

This headliner appeared last month, but to a sparse audience (likely because it was during winter break), so Management’s giving him another chance…

You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike ….

It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since it’s Saturday night, let’s listen in on Sir Harold’s howler* in criticizing the use of p-values.

“Did you hear the one about significance testers rejecting H0 because of outcomes H0 didn’t predict?

What’s unusual, is that they do it when these unpredicted outcomes haven’t even occurred!”

Much laughter.

[The actual quote from Jeffreys: Using p-values implies that “An hypothesis that may be true is rejected because it has failed to predict observable results that have not occurred. This seems a remarkable procedure.” (Jeffreys 1939, 316)]

I say it’s funny, so to see why I’ll strive to give it a generous interpretation. Continue reading

Categories: Comedy, Fisher, Jeffreys, P-values, Stephen Senn

## STEPHEN SENN: Fisher’s alternative to the alternative

Reblogging 2 years ago:

By: Stephen Senn

This year [2012] marks the 50th anniversary of RA Fisher’s death. It is a good excuse, I think, to draw attention to an aspect of his philosophy of significance testing. In his extremely interesting essay on Fisher, Jimmie Savage drew attention to a problem in Fisher’s approach to testing. In describing Fisher’s aversion to power functions Savage writes, ‘Fisher says that some tests are more sensitive than others, and I cannot help suspecting that that comes to very much the same thing as thinking about the power function.’ (Savage 1976) (P473).

The modern statistician, however, has an advantage here denied to Savage. Savage’s essay was published posthumously in 1976 and the lecture on which it was based was given in Detroit on 29 December 1971 (P441). At that time Fisher’s scientific correspondence did not form part of his available oeuvre but in1990 Henry Bennett’s magnificent edition of Fisher’s statistical correspondence (Bennett 1990) was published and this throws light on many aspects of Fisher’s thought including on significance tests.

The key letter here is Fisher’s reply of 6 October 1938 to Chester Bliss’s letter of 13 September. Bliss himself had reported an issue that had been raised with him by Snedecor on 6 September. Snedecor had pointed out that an analysis using inverse sine transformations of some data that Bliss had worked on gave a different result to an analysis of the original values. Bliss had defended his (transformed) analysis on the grounds that a) if a transformation always gave the same result as an analysis of the original data there would be no point and b) an analysis on inverse sines was a sort of weighted analysis of percentages with the transformation more appropriately reflecting the weight of information in each sample. Bliss wanted to know what Fisher thought of his reply.

Fisher replies with a ‘shorter catechism’ on transformations which ends as follows: Continue reading

Categories: Fisher, Statistics, Stephen Senn |