August 6: JSM 2020 Panel on P-values & “Statistical Significance”

JSM 2020 Panel Flyer (PDF)
JSM online program w/panel abstract & information):

Categories: ASA Guide to P-values, Error Statistics, evidence-based policy, JSM 2020, P-values, Philosophy of Statistics, science communication, significance tests | Leave a comment

JSM 2020 Panel on P-values & “Statistical Significance”

All: On July 30 (10am EST) I will give a virtual version of my JSM presentation, remotely like the one I will actually give on Aug 6 at the JSM. Co-panelist Stan Young may as well. One of our surprise guests tomorrow (not at the JSM) will be Yoav Benjamini!  If you’re interested in attending our July 30 practice session* please follow the directions here. Background items for this session are in the “readings” and “memos” of session 5.

*unless you’re already on our LSE Phil500 list

JSM 2020 Panel Flyer (PDF)
JSM online program w/panel abstract & information):

Categories: Announcement, JSM 2020, significance tests, stat wars and their casualties | Leave a comment

Stephen Senn: Losing Control (guest post)


Stephen Senn
Consultant Statistician

Losing Control

Match points

The idea of local control is fundamental to the design and analysis of experiments and contributes greatly to a design’s efficiency. In clinical trials such control is often accompanied by randomisation and the way that the randomisation is carried out has a close relationship to how the analysis should proceed. For example, if a parallel group trial is carried out in different centres, but randomisation is ‘blocked’ by centre then, logically, centre should be in the model (Senn, S. J. & Lewis, R. J., 2019). On the other hand if all the patients in a given centre are allocated the same treatment at random, as in a so-called cluster randomised trial, then the fundamental unit of inference becomes the centre and patients are regarded as repeated measures on it. In other words, the way in which the allocation has been carried out effects the degree of matching that has been achieved and this, in turn, is related to the analysis that should be employed. A previous blog of mine, To Infinity and Beyond,  discusses the point.

Balancing acts

In all of this, balance, or rather the degree of it, plays a fundamental part, if not the one that many commentators assume. Balance of prognostic factors is often taken as being necessary to avoid bias. In fact, it is not necessary. For example, supposed we wished to eliminate the effect of differences between centres in a clinical trial but had not, in fact, blocked by centre. We would then just by chance, have some centres in which numbers of patients on treatment and control differed. The simple difference of the two means for the trial as a whole would then have some influence from the centres, which might be regarded as biasing. However, these effects can be eliminated by the simple stratagem of analysing the data in two stages. In the first stage we compare the means under treatment and control within each centre. In the second stage we combine these differences across the centre weighting them according to the amount of information provided. In fact, including centre as a factor in a linear model to analyse the effect of treatment achieves the same result as this two-stage approach.

This raises the issue, ‘what is the value of balance?’. The answer is that other things being equal, balanced allocations are more efficient in that they lead to lower variances. This follows from the fact that the variance of a contrast based on two means is

where σ21, σ22 are the variances in the two groups being compared and n1n2 the two sample sizes. In an experimental context, it is often reasonable to proceed as if σ21 = σ22 so that writing σ2 for each variance, we have an expression for the variance of the contrast of.

Now consider the successive ratios 1, 1/2, 1/3,…1/n. Each term is smaller than the preceding term. However, the amount by which a term is smaller is less than the amount by which the preceding term was smaller than the term that preceded it. For example, 1/3-1/4 = 1/12 but 1/2-1/3 = 1/6. In general we have 1/n – 1/n+1 = 1/n(n+1), which clearly reduces with increasing n. It thus follows that if an extra observation can be added to construct such a contrast, it will have the greater effect on reducing that contrast if it can be added to the group that has the fewest observations. This in turn implies, other things being equal, that balanced contrasts are more efficient.

Exploiting the ex-external

However, it is often the case in a randomised clinical trial of a new treatment that a potential control treatment has been much studied in the past. Thus, many more observations, albeit of a historical nature, are available for the control treatment than the experimental one. This in turn suggests that if the argument that balanced datasets are better is used, we should now allocate more patients, and perhaps even all that are available, to the experimental arm. In fact, things are not so simple.

First, it should be noted, that if blinding of patients and treating physicians to the treatment being given is considered important, this cannot be convincingly implemented unless randomisation is employed (Senn, S. J., 1994). I have discussed the way that this may have to proceed in a previous blog, Placebos: it’s not only the patients that are fooled but in fact, in what follows, I am going to assume that blinding is unimportant and consider other problems with using historical controls.

When historical controls are used there are two common strategies. The first is to regard the historical controls as providing an external standard which may be regarded as having negligible error and to use it, therefore, as an unquestionably valid reference. If significance tests are used, a one-sample test is applied to compare the experimental mean to the historical standard. The second is to treat historical controls as if they were concurrent controls and to carry out the statistical analysis that would be relevant were this the case. Both of these are inadequate. Once I have considered them, I shall turn to a third approach that might be acceptable.

A standard error

If an experimental group is compared to a historical standard, as if that standard were currently appropriate and established without error, an implicit analogy is being made to a parallel group trial with a control group arm of infinite size. This can be seen by looking at formula (2). Suppose that we let the first group be the control group and the second one the experimental group. As n1 → ∞, then formula (2) will approach σ2/n2 , which is, in fact the formula we intend to use.

Figure 1 shows the variance that this approach uses as a horizontal red line and the variance that would apply to a parallel group trial. The experimental group size has been set at 100 and the control group sample size to vary from 100 to 2000. The within group variance has been set to σ2 = 1. It can be seen that this approach of the historical standard underestimates considerably the variance that will apply. In fact even the formula given by blue line will underestimate the variance as we shall explain below.

Figure 1. The variance of the contrast for a two-group parallel clinical trial for which the number of patients on the experimental arm is 100 as a function of the number on the control group arm.

It thus follows that assessing the effect from a single arm given an experimental treatment by comparison to a value from historical controls but using a formula for the standard error of σ/√n2, where σ is the within-treated group standard deviation and nis the number of patients, will underestimate the uncertainty in this comparison.

Parallel lies

A common alternative is to treat the historical data as if they came concurrently from a parallel group trial. This overlooks many matters, not least of which is that in many cases the data will have come from completely different centres and, whether or not they came from different centres, they came from different studies. That being so, the nearest analogue of a randomised trial is not a parallel group trial but a cluster randomised trial with study as a unit of clustering. The general set up is illustrated in Figure 2. This shows a comparison of data taken from seven historical studies of a control treatment (C) and one new study of an experimental treatment (E).

Figure 2. A data set consisting of information on historical controls (C) in seven studies and information on an experimental treatment in a new study.

This means that there is a between-study variance that has to be added to the within-study variances.

Cluster muster

The consequence is that the control variance is not just a function of the number of patients but also of the number of studies. Suppose there are k such studies, then even if each of these studies has a huge number of patients, the variance of the control mean cannot be less than ϒ2/k, where ϒis the between-study variance.  However, there is worse to come. The study of the new experimental treatment also has a between-study contribution but since there is only one such study its variance is ϒ2/1 = ϒ2. The result is that a lower bound for the variance of the contrast using historical data is

It turns out that the variance of the treatment contrast decreases disappointingly according to the number of clusters you can muster. Of course, in practice, things are worse, since all of this is making the optimistic assumption that historical studies are exchangeable with the current one (Collignon, O. et al., 2019; Schmidli, H. et al., 2014).

Optimists may ask, however, whether this is not all a fuss about nothing. The theory indicates that this might be a problem but is there anything in practice to indicate it is. Unfortunately, yes. The TARGET study provides a good example of the sort of difficulties encountered in practice (Senn, S., 2008). This was a study comparing Lumiracoxib, Ibuprofen and Naproxen in osteoarthritis. For practical reasons, centres were either enrolled in a sub-study comparing Lumiracoxib to Ibuprofen or one comparing Lumiracoxib to Naproxen. There were considerable differences between sub-studies in terms of baseline characteristics but not within sub-studies and there were even differences at outcome for lumiracoxib depending on which sub-study patients were enrolled in. This was not a problem for the way the trial was analysed, it was foreseen from the outset, but it provides a warning that differences between studies may be important.

Another example is provided by Collignon, O. et al. (2019). Looking at historical data on acute myeloid leukaemia (AML), they identified 19 studies of a proposed control treatment Azacitidine. However, the variation from study to study was such that the 1279 subjects treated in these studies would only provide, in the best of cases, as much information as 50 patients studied concurrently.

COVID Control

How have we done in the age of COVID? Not always very well. To give an example, a trial that received much coverage was one of hydroxychloroquine in the treatment of patients suffering from corona virus infection (Gautret, P. et al., 2020). The trial was in 20 patients and “Untreated patients from another center and cases refusing the protocol were included as negative controls.” The senior author Didier Raoult later complained of the ‘invasion of methodologists’ and blamed them and the pharmaceutical industry for a ‘moral dictatorship’ that physicians should resist and compared modellers to astrologers (Nau, J.-Y., 2020).

However, the statistical analysis section of the paper has the following to say

Statistical differences were evaluated by Pearson’s chi-square or Fisher’s exact tests as categorical variables, as appropriate. Means of quantitative data were compared using Student’s t-test.

Now, Karl Pearson, RA Fisher and Student were all methodologists. So, Gautret, P. et al. (2020) do not appear to be eschewing the work of methodologists, far from it. They are merely choosing to use this work inappropriately. But nature is a hard task-mistress and if outcome varies considerably amongst those infected with COVID-19, and we know it does, and if patients vary from centre to centre, and we know they do, then variation from centre to centre cannot be ignored and trials in which patients have not been randomised concurrently cannot be analysed as if they were. Fisher’s exact test, Pearson’s chi-square and Student’s t will underestimate the variation.

The moral dictatorship of methodology

Methodologists are, indeed, moral dictators. If you do not design your investigations carefully you are on the horns of a dilemma. Either, you carry out simplistic analyses that are simply wrong or you are condemned to using complex and often unconvincing modelling. Far from banishing the methodologists, you are holding the door wide open to let them in.


This is based on work that was funded by grant 602552 for the IDEAL project under the European Union FP7 programme and support from the programme is gratefully acknowledged.


Collignon, O., Schritz, A., Senn, S. J., & Spezia, R. (2019). Clustered allocation as a way of understanding historical controls: Components of variation and regulatory considerations. Statistical Methods in Medical Research, 962280219880213

Gautret, P., Lagier, J. C., Parola, P., Hoang, V. T., Meddeb, L., Mailhe, M., . . . Raoult, D. (2020). Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial. Int J Antimicrob Agents, 105949

Nau, J.-Y. (2020). Hydroxychloroquine : le Pr Didier Raoult dénonce la «dictature morale» des méthodologistes.  Retrieved from

Schmidli, H., Gsteiger, S., Roychoudhury, S., O’Hagan, A., Spiegelhalter, D., & Neuenschwander, B. (2014). Robust meta‐analytic‐predictive priors in clinical trials with historical control information. Biometrics, 70(4), 1023-1032

Senn, S. J. (2008). Lessons from TGN1412 and TARGET: implications for observational studies and meta-analysis. Pharmaceutical Statistics, 7, 294-301

Senn, S. J. (1994). Fisher’s game with the devil. Statistics in Medicine, 13(3), 217-230

Senn, S. J., & Lewis, R. J. (2019). Treatment Effects in Multicenter Randomized Clinical Trials. JAMA

Categories: covid-19, randomization, RCTs, S. Senn | 7 Comments

JSM 2020: P-values & “Statistical Significance”, August 6


To register for JSM:

Categories: JSM 2020, P-values | Leave a comment

Colleges & Covid-19: Time to Start Pool Testing


I. “Colleges Face Rising Revolt by Professors,” proclaims an article in today’s New York Times, in relation to returning to in-person teaching:

Thousands of instructors at American colleges and universities have told administrators in recent days that they are unwilling to resume in-person classes because of the pandemic. More than three-quarters of colleges and universities have decided students can return to campus this fall. But they face a growing faculty revolt.

…This comes as major outbreaks have hit college towns this summer, spread by partying students and practicing athletes.

In an indication of how fluid the situation is, the University of Southern California said late Wednesday that “an alarming spike in coronavirus cases” had prompted it to reverse an earlier decision to encourage attending classes in person.

…. Faculty members at institutions including Penn State, the University of Illinois, Notre Dame and the State University of New York have signed petitions complaining that they are not being consulted and are being pushed back into classrooms too fast.

… “I shudder at the prospect of teaching in a room filled with asymptomatic superspreaders,” wrote Paul M. Kellermann, 62, an English professor at Penn State, in an essay for Esquire magazine, proclaiming that “1,000 of my colleagues agree.” Those colleagues have demanded that the university give them a choice of doing their jobs online or in person.

II. There is currently a circulating petition of Virginia faculty making similar requests, and if you’re a Virginia faculty member and wish to sign, you still have one day (7/4/20).

A preference to teach remotely isn’t only to mitigate the risk of infection by asymptotic students, it may also reflect the need to take care of children who might not be in school full-time this fall. Yet a return to in-person teaching has been made the default option in many universities such as Virginia Tech (which has decided 1/3 of classes will be in person).

Other universities have been more open to letting professors decide for themselves what to do. “Due to these extraordinary circumstances, the university is temporarily suspending the normal requirement that teaching be done in person,” the University of Chicago said in a message to instructors on June 26.

Yale said on Wednesday that it would bring only a portion of its students back to campus for each semester: freshmen, juniors and seniors in the fall, and sophomores, juniors and seniors in the spring. “Nearly all” college courses will be taught remotely, the university said, so that all students can enroll in them. New York Times

It would be one thing if all students were regularly tested for covid-19, but in the long-awaited plan released yesterday by Virginia Tech, students are at most being “asked” to obtain a negative result within 5 days of returning to campus–with the exception of students living in a campus residence, who will be offered tests when they arrive. Getting tested is also being “strongly advised”.

If they test positive, they are asked to self-isolate (with the number of days not indicated). A student would need to begin the process of seeking a test several weeks prior to the start of class to ensure at least a 14-day isolation (even though asymptomatics are known to be infectious for longer). But my main concern is that even vigilant students would face obstacles to qualifying for testing, given the current criteria. A student who does not currently have symptoms would not meet the criteria for testing in Virginia, or in the vast majority of other states, unless they had been in close contact with infected persons. (There are exceptions, such as NYC.) This could be rectified if Virginia Tech could get the Virginia Department of Health to include “returning to campus” under their provision to test those “entering congregate settings”–currently limited to long-term care facilities, prisons, and the like.

It is now known that a large percentage of people with Covid-19 are asymptomatic. “Among more than 3,000 prison inmates in four states who tested positive for the coronavirus, the figure was astronomical: 96 percent asymptomatic.”(Link).

An extensive review in the Annals of Internal Medicine, suggests that asymptomatic infections may account for 45 percent of all COVID-19 cases:

“The likelihood that approximately 40% to 45% of those infected with SARS-CoV-2 will remain asymptomatic suggests that the virus might have greater potential than previously estimated to spread silently and deeply through human populations. Asymptomatic persons can transmit SARS-CoV-2 to others for an extended period, perhaps longer than 14 days.

The focus of testing programs for SARS-CoV-2 should be substantially broadened to include persons who do not have symptoms of COVID-19.”  

III.  An easy solution would seem to be to turn to “pooled testing”. It’s an old statistical idea, but it’s only now gaining traction [1] In the July 1 NYT:

The method, called pooled testing, signals a paradigm shift. Instead of carefully rationing tests to only those with symptoms, pooled testing would enable frequent surveillance of asymptomatic people. Mass identification of coronavirus infections could hasten the reopening of schools, offices and factories.

“We’re in intensive discussions about how we’re going to do it,” Dr. Anthony S. Fauci, the country’s leading infectious disease expert, said in an interview. “We hope to get this off the ground as soon as possible.”

…Here’s how the technique works: A university, for example, takes samples from every one of its thousands of students by nasal swab, or perhaps saliva. Setting aside part of each individual’s sample, the lab combines the rest into a batch holding five to 10 samples each. The pooled sample is tested for coronavirus infection. Barring an unexpected outbreak, just 1 percent or 2 percent of the students are likely to be infected, so the overwhelming majority of pools are likely to test negative.

But if a pool yields a positive result, the lab would retest the reserved parts of each individual sample that went into the pool, pinpointing the infected student. The strategy could be employed for as little as $3 per person per day, according an estimate from economists at the University of California, Berkeley.

The FDA has set out guidelines for adopting pooled testing, which employs the same PCR technology as individual diagnostic tests (link).

Universities should consider what they will do once a certain number of positive covid cases emerge. The Virginia Tech plan proposes to house infected students in a single dorm, but what about the majority of students who live off campus?  At what point would they switch to remote teaching? As much as everyone wants to return to normalcy, a class of masked students, 6 feet apart, doesn’t obviously create a better learning environment than zoom. By regularly conducting pooled tests, the university would become aware of increased spread as soon as a higher proportion of the pools return positive results– before we see an increase in serious cases and hospitalizations.

Chris Bilder, a statisticians at University of Nebraska–Lincoln has been advising the Nebraska Public Health Laboratory on its use of group testing since April. He and his colleagues have developed a newly released app to determine precisely how best to conduct the pooling for a chosen reduction in testing, and given estimate of prevalence. (Link)

I will add to this over the next few days, as new reports become available. Please share your thoughts and related articles, in the comments.

[1]I first heard it discussed weeks ago by someone on Andrew Gelman’s blog, but I don’t know if it was the same idea.

Categories: covid-19 | Tags: | 8 Comments

David Hand: Trustworthiness of Statistical Analysis (LSE PH 500 presentation)

This was David Hand’s guest presentation (25 June) at our zoomed graduate research seminar (LSE PH500) on Current Controversies in Phil Stat (~30 min.)  I’ll make some remarks in the comments, and invite yours.


Trustworthiness of Statistical Analysis

David Hand

Abstract: Trust in statistical conclusions derives from the trustworthiness of the data and analysis methods. Trustworthiness of the analysis methods can be compromised by misunderstanding and incorrect application. However, that should stimulate a call for education and regulation, to ensure that methods are used correctly. The alternative of banning potentially useful methods, on the grounds that they are often misunderstood and misused is short-sighted, unscientific, and Procrustean. It damages the capability of science to advance, and feeds into public mistrust of the discipline.

Below are Prof.Hand’s slides w/o audio, followed by a video w/audio. You can also view them on the Meeting #6 post on the PhilStatWars blog (



VIDEO: (Viewing in full screen mode helps with buffering issues.)

Categories: LSE PH 500 | Tags: , , , , , , | 7 Comments

Bonus meeting: Graduate Research Seminar: Current Controversies in Phil Stat: LSE PH 500: 25 June 2020

Ship StatInfasSt

We’re holding a bonus, 6th, meeting of the graduate research seminar PH500 for the Philosophy, Logic & Scientific Method Department at the LSE:

(Remote 10am-12 EST, 15:00 – 17:00 London time; Thursday, June 25)

VI. (June 25) BONUS: Power, shpower, severity, positive predictive value (diagnostic model) & a Continuation of The Statistics Wars and Their Casualties

There will also be a guest speaker: Professor David Hand (Imperial College, London). Here is Professor Hand’s presentation (click on “present” to hear sound)

The main readings are on the blog page for the seminar.


Categories: Graduate Seminar PH500 LSE, power | Leave a comment

“On the Importance of testing a random sample (for Covid)”, an article from Significance magazine


Nearly 3 months ago I tweeted “Stat people: shouldn’t they be testing a largish random sample of people [w/o symptoms] to assess rates, alert those infected, rather than only high risk, symptomatic people, in the U.S.?” I was surprised that nearly all the stat and medical people I know expressed the view that it wouldn’t be feasible or even very informative. Really? Granted, testing was and is limited, but had it been made a priority, it could have been done. In the new issue of Significance (June 2020) that I just received, James J. Cochran writes “on the importance of testing a random sample.” [1] 

In the United States (as of 9 April 2020), President Donald Trump has said that testing for novel coronavirus infection will be limited to people who believe they may be infected. But if we only test people who believe they may be infected, we cannot understand how deep the virus has reached into the population. The only way this could work is if those who believe they may be infected are representative of the population with respect to novel coronavirus infection. Does anyone believe this is so? The common characteristic of those who believe they may be infected is that they all show some outward symptoms of infection by the virus. In other words, people who are being tested for the novel coronavirus are disproportionately showing severe symptoms. This would not be a problem if someone who is infected by the novel coronavirus immediately shows symptoms, but this is not the case. We have strong evidence that some people develop mild cases, show no symptoms, and carry the virus without knowing it because they are asymptomatic. Thus, efforts to understand the virus’s penetration into the population must include observation of the asymptomatic.

Indeed, a recent assessment (the Annals of Internal Medicine) is that at least 40% of people with covid 19 are (and remain) asymptomatic. (An overview is in Time). Oddly, while remaining asymptomatic, some still show damage to the lungs or other organs. 

The estimate of the proportion of the population who are infected can be calculated as:


So, we need data from a random sample of the entire population in order to gather data from infected people who are showing symptoms, infected people who are asymptomatic, and people who are not infected. All have some probability of being included in a true random sample of the population.

As of 23 April, leaders in Germany and New York State (see and had moved to implement random testing to assess how widespread the virus is, but there has been resistance from leaders elsewhere. This could be due to ignorance, disregard, or lack of appreciation of statistical principles – a consequence of the lack of statistical literacy that pervades the general population. (If the general population insisted on the use of random sampling to assess how widespread the virus is, leaders would not likely resist.) Or it could reflect concern over the limited availability of tests and a desire to devote all of these limited tests to those who show symptoms of novel coronavirus infection.

Unfortunately, this might be inadvertently helping the novel coronavirus spread. If a society does not understand the extent of infection in the general population or the virus’s infectivity, how can it prepare and optimally devote its resources to slow the spread of the virus? How does it decide what preventive measures are appropriate or necessary? How does it minimise the likelihood that the virus spreads to the point that the capacity of the hospital system is overwhelmed? Most crucially, how does it know if it is making progress or if conditions are deteriorating?

Without the evidence that a random sample of the general population would provide, we are operating in the dark. While we operate in the dark, preventable deaths will accumulate, and we will continue to take measures that are not only ineffective, but also unnecessarily costly.

Most of the world still lacks the ability to test a large number of people, and this understandably makes even those leaders who appreciate sampling hesitant to test a random sample of the general population. But the bottom line is, we need more coronavirus tests than we think we need.

We should add to this the need for a random sample of tests of antibodies. Perhaps we’ll have some better numbers now that states are opening up and having to test  employees.
[1] The journal comes out every other month; this is the first with a large section devoted to coronavirus. 
Categories: random sample | 13 Comments

Birthday of Allan Birnbaum: Foundations of Probability and Statistics (27 May 1923 – 1 July 1976)

27 May 1923-1 July 1976

27 May 1923-1 July 1976

Today is Allan Birnbaum’s birthday. In honor of his birthday, I’m posting the articles in the Synthese volume that was dedicated to his memory in 1977. The editors describe it as their way of  “paying homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics”. I had posted the volume before, but there are several articles that are very worth rereading. I paste a few snippets from the articles by Giere and Birnbaum. If you’re interested in statistical foundations, and are unfamiliar with Birnbaum, here’s a chance to catch up. (Even if you are, you may be unaware of some of these key papers.)


Synthese Volume 36, No. 1 Sept 1977: Foundations of Probability and Statistics, Part I

Editorial Introduction:

This special issue of Synthese on the foundations of probability and statistics is dedicated to the memory of Professor Allan Birnbaum. Professor Birnbaum’s essay ‘The Neyman-Pearson Theory as Decision Theory; and as Inference Theory; with a Criticism of the Lindley-Savage Argument for Bayesian Theory’ was received by the editors of Synthese in October, 1975, and a decision was made to publish a special symposium consisting of this paper together with several invited comments and related papers. The sad news about Professor Birnbaum’s death reached us in the summer of 1976, but the editorial project could nevertheless be completed according to the original plan. By publishing this special issue we wish to pay homage to Professor Birnbaum’s penetrating and stimulating work on the foundations of statistics. We are grateful to Professor Ronald Giere who wrote an introductory essay on Professor Birnbaum’s concept of statistical evidence and who compiled a list of Professor Birnbaum’s publications.


Table of Contents

SUFFICIENCY, CONDITIONALLY AND LIKELIHOOD In December of 1961 Birnbaum presented the paper ‘On the Foundations, of Statistical Inference’ (Birnbaum [19]) at a special discussion meeting of the American Statistical Association. Among the discussants was L. J. Savage who pronounced it “a landmark in statistics”. Explicitly denying any “intent to speak with exaggeration or rhetorically”, Savage described the occasion as “momentous in the history of statistics”. “It would be hard”, he said, “to point to even a handful of comparable events” (Birnbaum [19], pp. 307-8). The reasons for Savage’s enthusiasm are obvious. Birnbaum claimed to have shown that two principles widely held by non-Bayesian statisticians (sufficiency and conditionality) jointly imply an important consequence of Bayesian statistics (likelihood).”[1]
INTRODUCTION AND SUMMARY ….Two contrasting interpretations of the decision concept are formulated: behavioral, applicable to ‘decisions’ in a concrete literal sense as in acceptance sampling; and evidential, applicable to ‘decisions’ such as ‘reject H in a research context, where the pattern and strength of statistical evidence concerning statistical hypotheses is of central interest. Typical standard practice is characterized as based on the confidence concept of statistical evidence, which is defined in terms of evidential interpretations of the ‘decisions’ of decision theory. These concepts are illustrated by simple formal examples with interpretations in genetic research, and are traced in the writings of Neyman, Pearson, and other writers. The Lindley-Savage argument for Bayesian theory is shown to have no direct cogency as a criticism of typical standard practice, since it is based on a behavioral, not an evidential, interpretation of decisions.

[1]By “likelihood” here, Giere means the (strong) Likelihood Principle (SLP). Dotted through the first 3 years of this blog are a number of (formal and informal) posts on his SLP result, and my argument as to why it is unsound. I wrote a paper on this that appeared in Statistical Science 2014. You can find it along with a number of comments and my rejoinder in this post: Statistical Science: The Likelihood Principle Issue is Out.The consequences of having found his proof unsound gives a new lease on life to statistical foundations, or so I argue in my rejoinder.

Categories: Birnbaum, Likelihood Principle, Statistics, strong likelihood principle | Tags: | 3 Comments

Graduate Research Seminar: Current Controversies in Phil Stat: LSE PH 500: 21 May – 18 June 2020


Ship StatInfasST will embark on a new journey from 21 May – 18 June, a graduate research seminar for the Philosophy, Logic & Scientific Method Department at the LSE, but given the pandemic has shut down cruise ships, it will remain at dock in the U.S. and use zoom. If you care to follow any of the 5 sessions, nearly all of the materials will be linked here collected from excerpts already on this blog. If you are interested in observing on zoom beginning 28 May, please follow the directions here

For the updated schedule, see the seminar web page.

Topic: Current Controversies in Phil Stat
(LSE, Remote 10am-12 EST, 15:00 – 17:00 London time; Thursdays 21 May-18 June)

Main Text SIST: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars CUP, 2018):

I. (May 21)  Introduction: Controversies in Phil Stat:  

SIST: Preface, Excursion 1
Excursion 1 Tour I
Excursion 1 Tour II

Notes/Outline of Excursion 1
Postcard: Souvenir A

II. (May 28) N-P and Fisherian Tests, Severe Testing:

SIST: Excursion 3 Tour I (focus on pages up to p. 152)

Recommended: Excursion 2 Tour II pp. 92-100

Optional: I will (try to) answer questions on demarcation of science, induction, falsification, Popper from Excursion 2 Tour II

Handout: Areas Under the Standard Normal Curve

III. (June 4) Deeper Concepts: Confidence Intervals and Tests: Higgs’ Discovery:

SIST: Excursion 3 Tour III

Optional: I will answer questions on Excursion 3 Tour II: Howlers and Chestnuts of Tests 

IV. (June 11) Rejection Fallacies: Do P-values exaggerate evidence?
      Jeffreys-Lindley paradox or Bayes/Fisher disagreement:

SIST: Excursion 4 Tour II

           SIST: Excursion 4 Tour II

          Recommended (if time)Excursion 4 Tour I: The Myth of “The Myth of Objectivity” 

V. (June 18) The Statistics Wars and Their Casualties:

SIST: Excursion 4 Tour III: pp. 267-286; Farewell Keepsakepp. 436-444
-Amrhein, V., Greenland, S., & McShane, B., (2019). Comment: Retire Statistical Significance, Nature, 567: 305-308.
-Ioannidis J. (2019). “The Importance of Predefined Rules and Prespecified Statistical Analyses: Do Not Abandon Significance.” JAMA. 321(21): 2067–2068. doi:10.1001/jama.2019.4582
-Ioannidis, J. (2019). Correspondence: Retiring statistical significance would give bias a free pass. Nature, 567, 461.
-Mayo, DG. (2019), P‐value thresholds: Forfeit at your peril. Eur J Clin Invest, 49: e13170. doi: 10.1111/eci.13170


Information Items for SIST

-References: Captain’s Bibliography
-Summaries of 16 Tours (abstracts & keywords)
Excerpts & Mementos on Error Statistics Philosophy Blog (I will link to items from excerpted proofs for interested blog followers as we proceed)
Schaum’s Appendix 2Areas Under the Standard Normal Curve from 0-Z

DELAYED: JUNE 19-20 Workshop: The Statistics Wars and Their Casualties

Categories: Announcement, SIST | Leave a comment

Final part of B. Haig’s ‘What can psych stat reformers learn from the error-stat perspective?’ (Bayesian stats)


Here’s the final part of Brian Haig’s recent paper ‘What can psychology’s statistics reformers learn from the error-statistical perspective?’ in Methods in Psychology 2 (Nov. 2020). The full article, which is open access, is here. I will make some remarks in the comments.

5. The error-statistical perspective and the nature of science


As noted at the outset, the error-statistical perspective has made significant contributions to our philosophical understanding of the nature of science. These are achieved, in good part, by employing insights about the nature and place of statistical inference in experimental science. The achievements include deliberations on important philosophical topics, such as the demarcation of science from non-science, the underdetermination of theories by evidence, the nature of scientific progress, and the perplexities of inductive inference. In this article, I restrict my attention to two such topics: The process of falsification and the structure of modeling.

5.1. Falsificationism

The best known account of scientific method is the so-called hypothetico-deductive method. According to its most popular description, the scientist takes an existing hypothesis or theory and tests indirectly by deriving one or more observational predictions that are subjected to direct empirical test. Successful predictions are taken to provide inductive confirmation of the theory; failed predictions are said to provide disconfirming evidence for the theory. In psychology, NHST is often embedded within such a hypothetico-deductive structure and contributes to weak tests of theories.

Also well known is Karl Popper’s falsificationist construal of the hypothetico-deductive method, which is understood as a general strategy of conjecture and refutation. Although it has been roundly criticised by philosophers of science, it is frequently cited with approval by scientists, including psychologists, even though they do not, indeed could not, employ it in testing their theories. The major reason for this is that Popper does not provide them with sufficient methodological resources to do so.

One of the most important features of the error-statistical philosophy is its presentation of a falsificationist view of scientific inquiry, with error statistics serving an indispensable role in testing. From a sympathetic, but critical, reading of Popper, Mayo endorses his strategy of developing scientific knowledge by identifying and correcting errors through strong tests of scientific claims. Making good on Popper’s lack of knowledge of statistics, Mayo shows how one can properly employ a range of, often familiar, error-statistical methods to implement her all-important severity requirement. Stated minimally, and informally, this requirement says, “A claim is severely tested to the extent that it has been subjected to and passes a test that probably would have found flaws, were they present.” (Mayo, 2018, p. xii) Further, in marked contrast with Popper, who deemed deductive inference to be the only legitimate form of inference, Mayo’s conception of falsification stresses the importance of inductive, or content-increasing, inference in science. We have here, then, a viable account of falsification, which goes well beyond Popper’s account with its lack of operational detail about how to construct strong tests. It is worth noting that the error-statistical stance offers a constructive interpretation of Fisher’s oft-cited remark that the null hypothesis is never proved, only possibly disproved.

5.2. A hierarchy of models

In the past, philosophers of science tended to characterize scientific inquiry by focusing on the general relationship between evidence and theory. Similarly, scientists, even today, commonly speak in general terms of the relationship between data and theory. However, due in good part to the labors of experimentally-oriented philosophers of science, we now know that this coarse-grained depiction is a poor portrayal of science. The error-statistical perspective is one such philosophy that offers a more fine-grained parsing of the scientific process.

Building on Patrick Suppes’ (1962) important insight that science employs a hierarchy of models that ranges from experimental experience to theory, Mayo’s (1996) error-statistical philosophy initially adopted a framework in which three different types of models are interconnected and serve to structure error-statistical inquiry: Primary models, experimental models, and data models. Primary models, which are at the top of the hierarchy, break down a research problem, or question, into a set of local hypotheses that can be investigated using reliable methods. Experimental models take the mid-positon on the hierarchy and structure the particular models at hand. They serve to link primary models to data models. And, data models, which are at the bottom of the hierarchy, generate and model raw data, put them in canonical form, and check whether the data satisfy the assumptions of the experimental models. It should be mentioned that the error-statistical approach has been extended to primary models and theories of a more global nature (Mayo and Spanos, 2010) and, now, also includes a consideration of experimental design and the analysis and generation of data (Mayo, 2018).

This hierarchy of models facilitates the achievement of a number of goals that are important to the error-statistician. These include piecemeal strong testing of local hypotheses rather than broad theories, and employing the model hierarchy as a structuring device to knowingly move back and forth between statistical and scientific hypotheses. The error-statistical perspective insists on maintaining a clear distinction between statistical and scientific hypotheses, pointing out that psychologists often mistakenly take tests of significance to have direct implications for substantive hypotheses and theories.

6. The philosophy of statistics

A heartening attitude that comes through in the error-statistical corpus is the firm belief that the philosophy of statistics is an important part of statistical thinking. This emphasis on the conceptual foundations of the subject contrasts markedly with much of statistical theory, and most of statistical practice. It is encouraging, therefore, that Mayo’s philosophical work has influenced a number of prominent statisticians, who have contributed to the foundations of their discipline. Gelman’s error-statistical philosophy canvassed earlier is a prominent case in point. Through both precept and practice, Mayo’s work makes clear that philosophy can have a direct impact on statistical practice. Given that statisticians operate with an implicit philosophy, whether they know it or not, it is better that they avail themselves of an explicitly thought-out philosophy that serves their thinking and practice in useful ways. More particularly, statistical reformers recommend methods and strategies that have underlying philosophical commitments. It is important that they are identified, described, and evaluated.

The tools used by the philosopher of statistics in order to improve our understanding and use of statistical methods are considerable (Mayo, 2011). They include clarifying disputed concepts, evaluating arguments employed in statistical debates, including the core commitments of rival schools of thought, and probing the deep structure of statistical methods themselves. In doing this work, the philosopher of statistics, as philosopher, ascends to a meta-level to get purchase on their objects of study. This second-order inquiry is a proper part of scientific methodology.

It is important to appreciate that the error-statistical outlook is a scientific methodology in the proper sense of the term. Briefly stated, methodology is the interdisciplinary field that draws from disciplines that include statistics, philosophy of science, history of science, as well as indigenous contributions from the various substantive disciplines. As such, it is the key to a proper understanding of statistical and scientific methods. Mayo’s focus on the role of error statistics in science is deeply informed about the philosophy, history, and theory of statistics, as well as statistical practice. It is for this reason that the error-statistical perspective is strategically positioned to help the reader to go beyond the statistics wars.

7. Conclusion

The error-statistical outlook provides researchers, methodologists, and statisticians with a distinctive and illuminating perspective on statistical inference. Its Popper-inspired emphasis on strong tests is a welcome antidote to the widespread practice of weak statistical hypothesis testing that still pervades psychological research. More generally, the error-statistical standpoint affords psychologists an informative perspective on the nature of good statistical practice in science that will help them understand and transcend the statistics wars into which they have been drawn. Importantly, psychologists should know about the error-statistical perspective as a genuine alternative to the new statistics and Bayesian statistics. The new statisticians, Bayesians statisticians, and those with other preferences should address the challenges to their outlooks on statistics that the error-statistical viewpoint provides. Taking these challenges seriously would enrich psychology’s methodological landscape.

*This article is based on an invited commentary on Deborah Mayo’s book, Statistical inference as severe testing: How to get beyond the statistics wars (Cambridge University Press, 2018), which appeared at It is adapted with permission. I thank Mayo for helpful feedback on an earlier draft.

Refer to the paper for the references. I invite your comments and questions.


Categories: Brian Haig, SIST | 3 Comments

Part 2 of B. Haig’s ‘What can psych stat reformers learn from the error-stat perspective?’ (Bayesian stats)


Here’s a picture of ripping open the first box of (rush) copies of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars*, and here’s a continuation of Brian Haig’s recent paper ‘What can psychology’s statistics reformers learn from the error-statistical perspective?’ in Methods in Psychology 2 (Nov. 2020). Haig contrasts error statistics, the “new statistics”, and Bayesian statistics from the perspective of the statistics wars in psychology. The full article, which is open access, is here. I will make several points in the comments.


4. Bayesian statistics

Despite its early presence, and prominence, in the history of statistics, the Bayesian outlook has taken an age to assert itself in psychology. However, a cadre of methodologists has recently advocated the use of Bayesian statistical methods as a superior alternative to the messy frequentist practice that dominates psychology’s research landscape (e.g., Dienes, 2011; Kruschke and Liddell, 2018; Wagenmakers, 2007). These Bayesians criticize NHST, often advocate the use of Bayes factors for hypothesis testing, and rehearse a number of other well-known Bayesian objections to frequentist statistical practice.

Of course, there are challenges for Bayesians from the error-statistical perspective, just as there are for the new statisticians. For example, the frequently made claim that p values exaggerate the evidence against the null hypothesis, but Bayes factors do not, is shown by Mayo not to be the case. She also makes the important point that Bayes factors, as they are currently used, do not have the ability to probe errors and, thus, violate the requirement for severe tests. Bayesians, therefore need to rethink whether Bayes factors can be deployed in some way to provide strong tests of hypotheses through error control. As with the new statisticians, Bayesians also need to reckon with the coherent hybrid NHST afforded by the error-statistical perspective, and argue against it, rather than the common inchoate hybrids, if they want to justify abandoning NHST. Finally, I note in passing that Bayesians should consider, among other challenges, Mayo’s critique of the controversial Likelihood Principle, a principle which ignores the post-data consideration of sampling plans.

4.1. Contrasts between the Bayesian and error-statistical perspectives

One of the major achievements of the philosophy of error-statistics is that it provides a comprehensive critical evaluation of the major variants of Bayesian statistical thinking, including the classical subjectivist, “default”, pragmatist, and eclectic options within the Bayesian corpus. Whether the adoption of Bayesian methods in psychology will overcome the disorders of current frequentist practice remains to be seen. What is clear from reading the error-statistical literature, however, is that the foundational options for Bayesians are numerous, convoluted, and potentially bewildering. It would be a worthwhile exercise to chart how these foundational options are distributed across the prominent Bayesian statisticians in psychology. For example, the increasing use of Bayes factors for hypothesis testing purposes is accompanied by disorderliness at the foundational level, just as it is in the Bayesian literature more generally. Alongside the fact that some Bayesians are sceptical of the worth of Bayes factors, we find disagreement about the comparative merits of the subjectivist and default Bayesianism outlooks on Bayes factors in psychology (Wagenmakers et al., 2018).

The philosophy of error-statistics contains many challenges for Bayesians to consider. Here, I want to draw attention to three basic features of Bayesian thinking, which are rejected by the error-statistical approach. First, the error-statistical approach rejects the Bayesian insistence on characterizing the evidential relation between hypothesis and evidence in a universal and logical manner in terms of Bayes’ theorem. Instead, it formulates the relation in terms of the substantive and specific nature of the hypothesis and the evidence with regards to their origin, modeling, and analysis. This is a consequence of a strong commitment to a piecemeal, contextual approach to testing, using the most appropriate frequentist methods available for the task at hand. This contextual attitude to testing is taken up in Section 5.2, where one finds a discussion of the role different models play in structuring and decomposing inquiry.

Second, the error-statistical philosophy also rejects the classical Bayesian commitment to the subjective nature of prior probabilities, which the agent is free to choose, in favour of the more objective process of establishing error probabilities understood in frequentist terms. It also finds unsatisfactory the turn to the more popular objective, or “default”, Bayesian option, in which the agent’s appropriate degrees of belief are constrained by relevant empirical evidence. The error-statistician rejects this default option because it fails in its attempts to unify Bayesian and frequentist ways of determining probabilities.

And, third, the error-statistical outlook employs probabilities to measure how effectively methods facilitate the detection of error, and how those methods enable us to choose between alternative hypotheses. By contrast, orthodox Bayesians use probabilities to measure belief in hypotheses or degrees of confirmation. As noted earlier, most Bayesians are not concerned with error probabilities at all. It is for this reason that error-statisticians will say about Bayesian methods that, without supplementation with error probabilities, they are not capable of providing stringent tests of hypotheses.

4.2. The Bayesian remove from scientific practice

Two additional features of the Bayesian focus on beliefs, which have been noted by philosophers of science and statistics, draw attention to their outlook on science. First, Kevin Kelly and Clark Glymour worry that “Bayesian methods assign numbers to answers instead of producing answers outright.” (2004, p. 112) Their concern is that the focus on the scientist’s beliefs “screens off” the scientist’s direct engagement with the empirical and theoretical activities that are involved in the phenomenology of science. Mayo agrees that we should focus on the scientific phenomena of interest, not the associated epiphenomena of degrees of belief. This preference stems directly from the error-statistician’s conviction that probabilities properly quantify the performance of methods, not the scientist’s degrees of belief.

Second, Henry Kyburg is puzzled by the Bayesian’s desire to “replace the fabric of science… with a vastly more complicated representation in which each statement of science is accompanied by its probability, for each of us.” (1992, p.149) Kyburg’s puzzlement prompts the question, ‘Why should we be interested in each other’s probabilities?’ This is a question raised by David Cox about prior probabilities, and noted by Mayo (2018).

This Bayesian remove from science contrasts with the willingness of the error-statistical perspective to engage more directly with science. Mayo is a philosopher of science as well as statistics, and has a keen eye for scientific practice. Given that contemporary philosophers of science tend to take scientific practice seriously, it comes as no surprise that she brings it to the fore when dealing with statistical concepts and issues. Indeed, her error-statistical philosophy should be seen as a significant contribution to the so-called new experimentalism, with its strong focus, not just on experimental practice in science, but also on the role of statistics in such practice. Her discussion of the place of frequentist statistics in the discovery of the Higgs boson in particle physics is an instructive case in point.

Taken together, these just-mentioned points of difference between the Bayesian and error-statistical philosophies constitute a major challenge to Bayesian thinking that methodologists, statisticians, and researchers in psychology need to confront.

4.3. Bayesian statistics with error-statistical foundations

One important modern variant of Bayesian thinking, which now receives attention within the error-statistical framework, is the falsificationist Bayesianism of Andrew Gelman, which received its major formulation in Gelman and Shalizi (2013). Interestingly, Gelman regards his Bayesian philosophy as essentially error-statistical in nature – an intriguing claim, given the anti-Bayesian preferences of both Mayo and Gelman’s co-author, Cosma Shalizi. Gelman’s philosophy of Bayesian statistics is also significantly influenced by Popper’s view that scientific propositions are to be submitted to repeated criticism in the form of strong empirical tests. For Gelman, best Bayesian statistical practice involves formulating models using Bayesian statistical methods, and then checking them through hypothetico-deductive attempts to falsify and modify those models.

Both the error-statistical and neo-Popperian Bayesian philosophies of statistics extend and modify Popper’s conception of the hypotheticodeductive method, while at the same time offering alternatives to received views of statistical inference. The error-statistical philosophy injects into the hypothetico-deductive method an account of statistical induction that employs a panoply of frequentist statistical methods to detect and control for errors. For its part, Gelman’s Bayesian alternative involves formulating models using Bayesian statistical methods, and then checking them through attempts to falsify and modify those models. This clearly differs from the received philosophy of Bayesian statistical modeling, which is regarded as a formal inductive process.

From the wide-ranging error-statistical evaluation of the major varieties of Bayesian statistical thought on offer, Mayo concludes that Bayesian statistics needs new foundations: In short, those provided by her error-statistical perspective. Gelman acknowledges that his falsificationist Bayesian philosophy is underdeveloped, so it will be interesting to learn how its further development relates to Mayo’s error-statistical perspective. It will also be interesting to see if Bayesian thinkers in psychology engage with Gelman’s brand of Bayesian thinking. Despite the appearance of his work in a prominent psychology journal, they have yet to do so. However, Borsboom and Haig (2013) and Haig (2018) provide sympathetic critical evaluations of Gelman’s philosophy of statistics.

It is notable that in her treatment of Gelman’s philosophy, Mayo emphasizes that she is willing to allow a decoupling of statistical outlooks and their traditional philosophical foundations in favour of different foundations, which are judged more appropriate. It is an important achievement of Mayo’s work that she has been able to consider the current statistics wars without taking a particular side in the debates. She achieves this by examining methods, both Bayesian and frequentist, in terms of whether they violate her minimal severity requirement of “bad evidence, no test”.

I invite your comments and questions.

*This picture was taken by Diana Gillooly, Senior Editor for Mathematical Sciences, Cambridge University Press, at the book display for the Sept. 2018 meeting of the Royal Statistical Society in Cardiff. She also had the honor of doing the ripping. A blogpost on the session I was in is here.

Categories: Brian Haig, SIST | 6 Comments

‘What can psychology’s statistics reformers learn from the error-statistical perspective?’


This is the title of Brian Haig’s recent paper in Methods in Psychology 2 (Nov. 2020). Haig is a professor emeritus of psychology at the University of Canterbury. Here he provides both a thorough and insightful review of my book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2018) as well as an excellent overview of the high points of today’s statistics wars and the replication crisis, especially from the perspective of psychology. I’ll excerpt from his article in a couple of posts. The full article, which is open access, is here

Abstract: In this article, I critically evaluate two major contemporary proposals for reforming statistical thinking in psychology: The recommendation that psychology should employ the “new statistics” in its research practice, and the alternative proposal that it should embrace Bayesian statistics. I do this from the vantage point of the modern error-statistical perspective, which emphasizes the importance of the severe testing of knowledge claims. I also show how this error-statistical perspective improves our understanding of the nature of science by adopting a workable process of falsification and by structuring inquiry in terms of a hierarchy of models. Before concluding, I briefly discuss the importance of the philosophy of statistics for improving our understanding of statistical thinking.

Brian Haig

Keywords: The error-statistical perspective, The new statistics, Bayesian statistics, Falsificationism, Hierarchy of models, Philosophy of statistics Continue reading

Categories: Brian Haig, Statistical Inference as Severe Testing–Review | 12 Comments

S. Senn: Randomisation is not about balance, nor about homogeneity but about randomness (Guest Post)


Stephen Senn
Consultant Statistician

The intellectual illness of clinical drug evaluation that I have discussed here can be cured, and it will be cured when we restore intellectual primacy to the questions we ask, not the methods by which we answer them. Lewis Sheiner1

Cause for concern

In their recent essay Causal Evidence and Dispositions in Medicine and Public Health2, Elena Rocca and Rani Lill Anjum challenge, ‘the epistemic primacy of randomised controlled trials (RCTs) for establishing causality in medicine and public health’. That an otherwise stimulating essay by two philosophers, experts on causality, which makes many excellent points on the nature of evidence, repeats a common misunderstanding about randomised clinical trials, is grounds enough for me to address this topic again.  Before, however, explaining why I disagree with Rocca and Anjum on RCTs, I want to make clear that I agree with much of what they say. I loathe these pyramids of evidence, beloved by some members of the evidence-based movement, which have RCTs at the apex or possibly occupying a second place just underneath meta-analyses of RCTs. In fact, although I am a great fan of RCTs and (usually) of intention to treat analysis, I am convinced that RCTs alone are not enough. My thinking on this was profoundly affected by Lewis Sheiner’s essay of nearly thirty years ago (from which the quote at the beginning of this blog is taken). Lewis was interested in many aspects of investigating the effects of drugs and would, I am sure, have approved of Rocca and Anjum’s insistence that there are many layers of understanding how and why things work, and that means of investigating them may have to range from basic laboratory experiments to patient narratives via RCTs. Rocca and Anjum’s essay provides a good discussion of the various ‘causal tasks’ that need to be addressed and backs this up with some excellent examples. Continue reading

Categories: RCTs, S. Senn | 28 Comments

Paradigm Shift in Pandemic (Vent) Protocols?

Lung Scans[0]


As much as doctors and hospitals are raising alarms about a shortage of ventilators for Covid-19 patients, some doctors have begun to call for entirely reassessing the standard paradigm for their use–according to a cluster of articles to appear in the last week. “What’s driving this reassessment is a baffling observation about Covid-19: Many patients have blood oxygen levels so low they should be dead. But they’re not gasping for air, their hearts aren’t racing, and their brains show no signs of blinking off from lack of oxygen.”[1] Within that group of patients, some doctors wonder if the standard use of mechanical ventilators does more harm than good.[2] The issue is controversial; I’ll just report what I find in the articles over the past week. Please share ongoing updates in the comments. Continue reading

Categories: covid-19 | 61 Comments

A. Spanos:  Isaac Newton and his two years in quarantine:  how science could germinate in bewildering ways (Guest post)


Aris Spanos
Wilson Schmidt Professor of Economics
Department of Economics
Virginia Tech

Beyond the plenitude of misery and suffering that pandemics bring down on humanity, occasionally they contribute to the betterment of humankind by (inadvertently) boosting creative activity that leads to knowledge, and not just in epidemiology. A case in point is that of Isaac Newton and the pandemic of 1665-6.  Continue reading

Categories: quarantine, Spanos | 14 Comments

April 1, 2020: Memory Lane of April 1’s past



My “April 1” posts for the past 8 years have been so close to the truth or possible truth that they weren’t always spotted as April Fool’s pranks, which is what made them genuine April Fool’s pranks. (After a few days I either labeled them as such, e.g., “check date!”, or revealed it in a comment). Given the level of current chaos and stress, I decided against putting up a planned post for today, so I’m just doing a memory lane of past posts. (You can tell from reading the comments which had most people fooled.) Continue reading

Categories: Comedy, Statistics | Leave a comment

The Corona Princess: Learning from a petri dish cruise (i)


Q. Was it a mistake to quarantine the passengers aboard the Diamond Princess in Japan?

A. The original statement, which is not unreasonable, was that the best thing to do with these people was to keep them safely quarantined in an infection-control manner on the ship. As it turned out, that was very ineffective in preventing spread on the ship. So the quarantine process failed. I mean, I’d like to sugarcoat it and try to be diplomatic about it, but it failed. I mean, there were people getting infected on that ship. So something went awry in the process of quarantining on that ship. I don’t know what it was, but a lot of people got infected on that ship. (Dr. A Fauci, Feb 17, 2020)

This is part of an interview of Dr. Anthony Fauci, the coronavirus point person we’ve been seeing so much of lately. Fauci has been the director of the National Institute of Allergy and Infectious Diseases since all the way back to 1984! You might find his surprise surprising. Even before getting our recent cram course on coronavirus transmission, tales of cruises being hit with viral outbreaks are familiar enough. The horror stories from passengers on the floating petri dish were well known by this Feb 17 interview. Even if everything had gone as planned, the quarantine was really only for the (approximately 3700) passengers because the 1000 or so crew members still had to run the ship, as well as cook and deliver food to the passenger’s cabins. Moreover, the ventilation systems on cruise ships can’t filter out particles smaller than 5000 or 1000 nanometers.[1] Continue reading

Categories: covid-19 | 52 Comments

Stephen Senn: Being Just about Adjustment (Guest Post)



Stephen Senn
Consultant Statistician

Correcting errors about corrected estimates

Randomised clinical trials are a powerful tool for investigating the effects of treatments. Given appropriate design, conduct and analysis they can deliver good estimates of effects. The key feature is concurrent control. Without concurrent control, randomisation is impossible. Randomisation is necessary, although not sufficient, for effective blinding. It also is an appropriate way to deal with unmeasured predictors, that is to say suspected but unobserved factors that might also affect outcome. It does this by ensuring that, in the absence of any treatment effect, the expected value of variation between and within groups is the same. Furthermore, probabilities regarding the relative variation can be delivered and this is what is necessary for valid inference. Continue reading

Categories: randomization, S. Senn | 6 Comments

My Phil Stat Events at LSE



I will run a graduate Research Seminar at the LSE on Thursdays from May 21-June 18:


(See my new blog for specifics (
I am co-running a workshop
from 19-20 June, 2020 at LSE (Center for the Philosophy of Natural and Social Sciences CPNSS), with Roman Frigg. Participants include:
Alexander Bird (King’s College London), Mark Burgman (Imperial College London), Daniele Fanelli (LSE), David Hand (Imperial College London), Christian Hennig (University of Bologna), Katrin Hohl (City University London), Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland).
If you have a particular Phil Stat event you’d like me to advertise, please send it to me.
Categories: Announcement, Philosophy of Statistics | Leave a comment

Blog at