*The eighth meeting of our Phil Stat Forum*:*

**The Statistics Wars
and Their Casualties**

**22 April 2021**

**TIME: 15:00-16:45 (London); 10:00-11:45 (New York, EST)**

**For information about the Phil Stat Wars forum and how to join, click on this link.**

**“How an information metric could bring truce to the statistics wars****“**

**Daniele Fanelli**

**Abstract: **Both sides of debates on P-values, reproducibility, and other meta-scientific issues are entrenched in traditional methodological assumptions. For example, they often implicitly endorse rigid dichotomies (e.g. published findings are either “true” or “false”, replications either “succeed” or “fail”, research practices are either “good” or “bad”), or make simplifying and monistic assumptions about the nature of research (e.g. publication bias is generally a problem, all results should replicate, data should always be shared).

Thinking about knowledge in terms of information may clear a common ground on which all sides can meet, leaving behind partisan methodological assumptions. In particular, I will argue that a metric of knowledge that I call “K” helps examine research problems in a more genuinely “meta-“ scientific way, giving rise to a methodology that is distinct, more general, and yet compatible with multiple statistical philosophies and methodological traditions.

This talk will present statistical, philosophical and scientific arguments in favour of K, and will give a few examples of its practical applications.

**Daniele Fanelli **is a London School of Economics Fellow in Quantitative Methodology, Department of Methodology, London School of Economics and Political Science. He graduated in Natural Sciences, earned a PhD in Behavioural Ecology and trained as a science communicator, before devoting his postdoctoral career to studying the nature of science itself – a field increasingly known as meta-science or meta-research. He has been primarily interested in assessing and explaining the prevalence, causes and remedies to problems that may affect research and publication practices, across the natural and social sciences. Fanelli helps answer these and other questions by analysing patterns in the scientific literature using meta- analysis, regression and any other suitable methodology. He is a member of the Research Ethics and Bioethics Advisory Committee of Italy’s National Research Council, for which he developed the first research integrity guidelines, and of the Research Integrity Committee of the Luxembourg Agency for Research Integrity (LARI).

**Fanelli D** (2019) *A theory and methodology to quantify knowledge. *Royal Society Open Science – doi.org/10.1098/rsos.181055. (PDF)

4 page Background: **Fanelli D** (2018) *Is science really facing a reproducibility crisis, and do we need it to? *PNAS –doi.org/10.1073/pnas.1708272114. (PDF)

**See Phil-Stat-Wars.com**

*****Meeting 16 of our the general Phil Stat series which began with the LSE Seminar PH500 on May 21

*A Statistical Model as a Chance Mechanism
*

**Jerzy Neyman** **(April 16, 1894 – August 5, 1981)**, was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for non-random samples. Fisher’s original parametric statistical model M_{θ}(**x**) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data **x**_{0}:=(x_{1},x_{2},…,x_{n}) can be viewed as a ‘truly representative sample’ from that ‘population’:

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data **x**_{0} come from sample surveys or it can be viewed as a typical realization of a random sample **X**:=(X_{1},X_{2},…,X_{n}), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population.

This ‘infinite population’ metaphor, however, is of limited value in most applied disciplines relying on observational data. To see how inept this metaphor is consider the question: what is the hypothetical ‘population’ when modeling the gyrations of stock market prices? More generally, what is observed in such cases is a certain on-going process and not a fixed population from which we can select a representative sample. For that very reason, most economists in the 1930s considered Fisher’s statistical modeling irrelevant for economic data!

Due primarily to Neyman’s experience with empirical modeling in a number of applied fields, including genetics, agriculture, epidemiology, biology, astronomy and economics, his notion of a statistical model, evolved beyond Fisher’s ‘infinite populations’ in the 1930s into Neyman’s frequentist ‘chance mechanisms’ (see Neyman, 1950, 1952):

Guessing and then verifying the ‘chance mechanism’, the repeated operation of which produces the observed frequencies. This is a problem of ‘frequentist probability theory’. Occasionally, this step is labeled ‘model building’. Naturally, the guessed chance mechanism is hypothetical. (Neyman, 1977, p. 99)

From my perspective, this was a major step forward for several reasons, including the following.

*First*, the notion of a statistical model as a ‘chance mechanism’ extended the intended scope of statistical modeling to include dynamic phenomena that give rise to data from non-IID samples, i.e. data that exhibit both dependence and heterogeneity, like stock prices.

*Second*, the notion of a statistical model as a ‘chance mechanism’ is not only of metaphorical value, but it can be operationalized in the context of a statistical model, formalized by:

M_{θ}(**x**)={f(**x**;θ), θ∈Θ**}**, **x**∈R^{n }, Θ⊂R^{m}; m << n,

where the distribution of the sample f(**x**;θ) describes the probabilistic assumptions of the statistical model. This takes the form of a statistical Generating Mechanism (GM), stemming from f(**x**;θ), that can be used to generate simulated data on a computer. An example of such a Statistical GM is:

X_{t} = α_{0} + α_{1}X_{t-1} + σε_{t}, *t=1,2,…,n*

This indicates how one can use *pseudo-random* numbers for the error term ε_{t} ~NIID(0,1) to simulate data for the Normal, AutoRegressive [AR(1)] Model. One can generate numerous sample realizations, say N=100000, of sample size *n* in nanoseconds on a PC.

*Third*, the notion of a statistical model as a ‘chance mechanism’ puts a totally different spin on another metaphor widely used by uninformed critics of frequentist inference. This is the ‘long-run’ metaphor associated with the relevant error probabilities used to calibrate frequentist inferences. The operationalization of the statistical GM reveals that the temporal aspect of this metaphor is totally irrelevant for the frequentist inference; remember Keynes’s catch phrase “In the long run we are all dead”? Instead, what matters in practice is its *repeatability in principle*, not over time! For instance, one can use the above statistical GM to generate the empirical sampling distributions for any test statistic, and thus render operational, not only the pre-data error probabilities like the type I-II as well as the power of a test, but also the post-data probabilities associated with the severity evaluation; see Mayo (1996).

**I have restored all available links to the following references.**

For further discussion on the above issues see:

Spanos, A. (2013), “A Frequentist Interpretation of Probability for Model-Based Inductive Inference,” in *Synthese.*

Fisher, R. A. (1922), “On the mathematical foundations of theoretical statistics,” *Philosophical Transactions of the Royal Society* A, 222: 309-368.

Mayo, D. G. (1996), *Error and the Growth of Experimental Knowledge*, The University of Chicago Press, Chicago.

Neyman, J. (1950), *First Course in Probability and Statistics*, Henry Holt, NY.

Neyman, J. (1952), *Lectures and Conferences on Mathematical Statistics and Probability*, 2nd ed. U.S. Department of Agriculture, Washington.

Neyman, J. (1977), “Frequentist Probability and Frequentist Statistics,” *Synthese*, 36, 97-131.

[i]He was born in an area that was part of Russia.

]]>**Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). **I’m posting a link to a quirky paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper is Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute *a priori* distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as typically thought. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum.

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.

**HAPPY BIRTHDAY NEYMAN!**

What doesn’t Neyman like about Birnbaum’s advocacy of a Principle of Sufficiency S (p. 25)? He doesn’t like that it is advanced as a normative principle (e.g., about when evidence is or ought to be deemed equivalent) rather than a criterion that does something for you, such as control errors. (Presumably it is relevant to a type of context, say parametric inference within a model.) S is put forward as a kind of principle of rationality, rather than one with a rationale in solving some statistical problem

“The principle of sufficiency (S): If E is specified experiment, with outcomes x; if t = t (x) is any sufficient statistic; and if E’ is the experiment, derived from E, in which any outcome x of E is represented only by the corresponding value t = t (x) of the sufficient statistic; then for each x, Ev (E, x) = Ev (E’, t) where t = t (x)… (S) may be described informally as asserting the ‘irrelevance of observations independent of a sufficient statistic’.”

Ev(E, x) is a metalogical symbol referring to the evidence from experiment E with result x. The very idea that there is such a thing as an evidence function is never explained, but to Birnbaum “inferential theory” required such things. (At least that’s how he started out.) The view is very philosophical and it inherits much from logical positivism and logics of induction.The principle S, and also other principles of Birnbaum, have a normative character: Birnbaum considers them “compellingly appropriate”.

“The principles of Birnbaum appear as a kind of substitutes for known theorems” Neyman says. For example, various authors proved theorems to the general effect that the use of sufficient statistics will minimize the frequency of errors. But if you just start with the rationale (minimizing the frequency of errors, say) you wouldn’t need these”principles” from on high as it were. That’s what Neyman seems to be saying in his criticism of them in this paper. Do you agree? He has the same gripe concerning Cornfield’s conception of a default-type Bayesian account akin to Jeffreys. Why?

[i] I am grateful to @omaclaran for reminding me of this paper on twitter in 2018.

[ii] Or so I argue in my *Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars*, 2018, CUP.

[iii] Do you think Neyman is using “breakthrough” here in reference to Savage’s description of Birnbaum’s “proof” of the (strong) Likelihood Principle? Or is it the other way round? Or neither? Please weigh in.

REFERENCES

**Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘, Revue De l’Institut International De Statistique / Review of the International Statistical Institute, 30(1), 11-27.**

Where do journal editors look to find someone to referee your manuscript (in the typical “double blind” review system in academic journals)? One obvious place to look is the reference list in your paper. After all, if you’ve cited them, they must know about the topic of your paper, putting them in a good position to write a useful review. The problem is that if your paper is on a topic of ardent disagreement, and you argue in favor of one side of the debates, then your reference list is likely to include those with actual or perceived conflicts of interest. After all, if someone has a strong standpoint on an issue of some controversy, and a strong interest in persuading others to accept their side, it creates an *intellectual conflict of interest*, if that person has power to uphold that view. Since your referee is in a position of significant power to do just that, it follows that they have a conflict of interest (COI). A lot of attention is paid to author’s conflicts of interest, but little into intellectual or ideological conflicts of interests of reviewers. At most, the concern is with the reviewer having special reasons to favor the author, usually thought to be indicated by having been a previous co-author. We’ve been talking about journal editors conflicts of interest as of late (e.g., with Mark Burgman’s presentation at the last Phil Stat Forum) and this brings to mind another one.

But is it true that just because a reviewer is put in a position of competing interests (staunchly believing in a position opposed to yours, while under an obligation to provide a fair and unbiased review) that their fairness in executing the latter is compromised? I surmise that your answer to this question will depend on which of two scenarios you imagine yourself in: In the first, you imagine yourself reviewing a paper that argues in favor of a position that you oppose. In the second, you imagine that your paper, which argues in favor of a view, has been sent to a reviewer with a vested interest in opposing that view.

In other words, if the paper argues in favor of a position, call it position X, and you oppose X, I’m guessing you imagine you’d have no trouble giving fair and constructive assessments of arguments in favor of X. You would not dismiss arguments in favor of X, just because you sincerely oppose X. You’d give solid *reasons*. You’d be much more likely to question if a reviewer, staunchly opposed to position X, will be an unbiased reviewer of *your* paper in favor of X. I’m not biased, but they are.

I think the truth is that reviewers with a strong standpoint on a controversial issue, are likely to have an intellectual conflict of interest in reviewing a paper in favor of a position they oppose. Recall that it suffices, according to standard definitions of an individual having a COI, that reasonable grounds exist to question whether their judgments and decisions can be unbiased. (For example, investment advisors avoid recommending stocks they themselves own, to avoid a conflict of interest.) If this is correct, does it follow that opponents of a contentious issue should not serve as reviewers of papers that take an opposite stance? I say no because an author can learn a lot from a biased review about how to present their argument in the strongest possible terms, and how to zero in on the misunderstandings and confusions underlying objections to the view. Authors will almost surely not persuade such a reviewer by means of a revised paper, but they will be in possession of an argument that may enable them to persuade others.

A reviewer who deeply opposes position X will indeed, almost certainly, raise criticisms of a paper that favors X, but it does not follow that they are not objective or valid criticisms. Nevertheless, if *all* the reviewers come from this group, the result is still an unbalanced and unfair assessment, especially in that–objective or not–the critical assessment is more likely to accentuate the negative. If the position X happens to be currently unpopular, and opposing X the “received” position extolled by leaders of associations, journals, and institutions, then restricting reviewers to those opposed to X would obstruct intellectual progress. Progress comes from challenging the status quo and the tendency of people to groupthink and to jump on the bandwagon endorsed by many influential thought leaders of the day. Thus it would make sense for authors to have an opportunity to point out ahead of time to journal editors–who might not be aware of the particular controversy–the subset of references with a vested intellectual interest against the view for which they are arguing. If the paper is nevertheless sent to those reviewers, a judicious journal editor should weigh very heavily the author’s retorts and rejoinders. [1]

Here’s an example from outside of academia–the origins of the Coronavirus. The president of an organization that is directly involved with and heavily supported by funds for experimenting on coronaviruses, Peter Daszak, has a vested interest in blocking hypotheses of lab leaks or lab errors. Such hypotheses, if accepted, would have huge and adverse effects on that research and its regulation. When he is appointed to investigate Coronavirus origins, he has a conflict of interest. See this post.

Molecular biologist, Richard Ebright, one of the scientists to __Call for a Full and Unrestricted International Forensic Investigation into the Origins of COVID-19 __claims “the fact that the WHO named Daszak as a member of its mission, and the fact that the WHO retained Daszak as a member of its mission after being informed of his conflicts of interest, make it clear that the WHO study cannot be considered a credible, independent investigation.” (LINK) If all the reviewers of a paper in support of a lab association come from team Daszak, the paper is scarcely being given a fair shake.

*Do you agree? Share your thoughts in the comments.*

[1] The problem is compounded by the fact that today there are more journal submissions than ever, and with the difficulty in getting volunteers, there’s pressure on the journal editor not to dismiss the views of referees. My guess is that anonymity doesn’t play a big role most of the time.

]]>

Many of you have written of instances in which authors and journal editors—and even some ASA members—have mistakenly assumed this [Wasserstein et al. (2019)] editorial represented ASA policy. The mistake is understandable: The editorial was co-authored by an official of the ASA.

… To address these issues, I hope to establish a working group that will prepare a thoughtful and concise piece … without leaving the impression that p-values and hypothesis tests…have no role in ‘good statistical practice’. (K. Kafadar, President’s Corner, 2019, p. 4)

Thus the Task Force on Statistical Significance and Replicability was born. Meanwhile, its recommendations remain under wraps. The one principle mentioned in Kafadar’s JSM presentation is that there be a disclaimer on all publications, articles, editorials authored by ASA staff, making it clear that the views presented are theirs and not the associations. It is good that we can now count on seeing the original recommendations. Were they only to have appeared in a distinct publication, perhaps in a non-statistics journal, we would never actually know if we were getting to see the original recommendations, or some modified version of them.

For a blogpost that provides the background to this episode, see “Why hasn’t the ASA board revealed the recommendations of its new task force on statistical significance and replicability?”

[1] Members of the ASA Task Force on Statistical Significance and Replicability

**Linda Young,** National Agricultural Statistics Service and University of Florida (Co-Chair)

**Xuming He,** University of Michigan (Co-Chair)

**Yoav Benjamini,** Tel Aviv University

**Dick De Veaux,** Williams College (ASA Vice President)

**Bradley Efron,** Stanford University

**Scott Evans,** The George Washington University (ASA Publications Representative)

**Mark Glickman**, Harvard University (ASA Section Representative)

**Barry Graubard,** National Cancer Institute

**Xiao-Li Meng,** Harvard University

**Vijay Nair,** Wells Fargo and University of Michigan

**Nancy Reid,** University of Toronto

**Stephen Stigler,** The University of Chicago

**Stephen Vardeman,** Iowa State University

**Chris Wikle,** University of Missouri

**CHECK DATE OF THIS POST**

**REFERENCES:**

Kafadar, K. Presidents Corner “The Year in Review … And More to Come” AMSTATNEWS 1 December 2019.

“Highlights of the November 2019 ASA Board of Directors Meeting”, AMSTATNEWS 1 January 2020.

Kafadar, K. “Task Force on Statistical Significance and Replicability Created”, AMSTATNEWS 1 February 2020.

]]>Like most wars, the Statistics Wars continues to have casualties. Some of the reforms thought to improve reliability and replication may actually create obstacles to methods known to improve on reliability and replication. At each one of our meeting of the Phil Stat Forum: “The Statistics Wars and Their Casualties,” I take 5 -10 minutes to draw out a proper subset of casualties associated with the topic of the presenter for the day. (The associated workshop that I have been organizing with Roman Frigg at the London School of Economics (CPNSS) now has a date for a hoped for in-person meeting in London: 24-25 September 2021.) Of course we’re interested not just in casualties but in positive contributions, though what counts as a casualty and what a contribution is itself a focus of philosophy of statistics battles.

At our last meeting, Thursday, 25 March, **Mark Burgman**, Director of the Centre for Environmental Policy at Imperial College London and Editor-in-Chief of the journal Conservation Biology, spoke on “*How should applied science journal editors deal with statistical controversies?*“. His slides are here: (pdf). The casualty I focussed on is how the statistics wars may put journal editors in positions of conflicts of interest that can get in the way of transparency and avoidance of bias. I presented it in terms of 4 questions (nothing to do with the fact that it’s currently Passover):

D. Mayo’s Casualties: **Intellectual Conflicts of Interest: Questions for Burgman**

- In an applied field such as conservation science, where statistical inferences often are the basis for controversial policy decisions, should editors and editorial policies avoid endorsing one side of the long-standing debate revolving around statistical significance tests? Or should they adopt and promote a favored methodology?
- If editors should avoid taking a side in setting author’s guidelines and reviewing papers, what policies should be adopted to avoid deferring to the calls of those wanting them to change their author’s guidelines? Have you ever been encouraged to do so?
- If one has a strong philosophical statistical standpoint and a strong interest in persuading others to accept it, does it create a conflict of interest, if that person has power to enforce that philosophy (especially in a group already driven by perverse incentives)? If so, what is your journal doing to take account of and prevent conflicts of interest?
- What do you think of the March 2019 Editorial of
*The American Statistician*(Wasserstein et al., 2019) Don’t say “statistical significance” and don’t use predesignated p-value thresholds in interpreting data (e.g., .05, .01, .005).

(While not an ASA policy document, Wasserstein’s status as ASA executive director gave it a lot of clout. Should he have issued a disclaimer that the article only represents the authors’ views?) [1]

This is the first of some posts on intellectual conflicts of interest that I’ll be writing shortly. [2]

Mark Burgman’s presentation (Link)

D. Mayo’s Casualties (Link)

[1] For those who don’t know the story: Because no disclaimer was issued, the ASA Board appointed a new task force on Statistical Significance and Reproducibility in 2019 to provide recommendations. These have thus far not been made public. For the background, see this post.

Burgman said that he had received a request to follow the “don’t say significance, don’t use P-value thresholds”, but upon considering it with colleagues, they decided against it. Why not include, as part of journal information shared with authors, that the editors consider it important to retain a variety of statistical methodologies–correctly used–and have explicitly rejected the call to ban any of them (even if they come with official association letterhead).

[2] WordPress has just sprung a radical change on bloggers, and as I haven’t figured it out yet, and my blog assistant is unavailable, I’ve cut this post short.

]]>*The seventh meeting of our Phil Stat Forum*:*

**The Statistics Wars
and Their Casualties
**

**25 March, 2021**

**TIME: 15:00-16:45 (London); 11:00-12:45 (New York, NOTE TIME CHANGE TO MATCH UK TIME**)**

**For information about the Phil Stat Wars forum and how to join, click on this link.**

**“ How should applied science journal editors deal with statistical controversies?**

**Mark Burgman**

**Mark Burgman **is the Director of the Centre for Environmental Policy at Imperial College London and Editor-in-Chief of the journal Conservation Biology, Chair in Risk Analysis & Environmental Policy. Previously, he was Adrienne Clarke Chair of Botany at the University of Melbourne, Australia. He works on expert judgement, ecological modelling, conservation biology and risk assessment. He has written models for biosecurity, medicine regulation, marine fisheries, forestry, irrigation, electrical power utilities, mining, and national park planning. He received a BSc from the University of New South Wales (1974), an MSc from Macquarie University, Sydney (1981), and a PhD from the State University of New York at Stony Brook (1987). He worked as a consultant ecologist and research scientist in Australia, the United States and Switzerland during the 1980’s before joining the University of Melbourne in 1990. He joined CEP in February, 2017. He has published over two hundred and fifty refereed papers and book chapters and seven authored books. He was elected to the Australian Academy of Science in 2006.

**Abstract: **Applied sciences come with different focuses. In environmental science, as in epidemiology, the framing and context of problems is often in crises. Decisions are imminent, data and understanding are incomplete, and ramifications of decisions are substantial. This context makes the implications of inferences from data especially poignant. It also makes the claims made by fervent and dedicated authors especially challenging. The full gamut of potential statistical foibles and psychological frailties are on display. In this presentation, I will outline and summarise the kinds of errors of reasoning that are especially prevalent in ecology and conservation biology. I will outline how these things appear to be changing, providing some recent examples. Finally, I will describe some implications of alternative editorial policies.

Some questions:

*Would it be a good thing to dispense with p-values, either through encouragement or through strict editorial policy?

*Would it be a good thing to insist on confidence intervals?

*Should editors of journals in a broad discipline, band together and post common editorial policies for statistical inference?

*Should all papers be reviewed by a professional statistician?

If so, which kind?

Professor Burgman is developing this topic anew, so we don’t have the usual background reading. However, we do have his slides:

*Mark Burgman’s Draft Slides: “How should applied science journal editors deal with statistical controversies?” (pdf)

*D. Mayo’s Slides: “The Statistics Wars and Their Casualties for Journal Editors: Intellectual Conflicts of Interest: Questions for Burgman” (pdf)

*A paper of mine from the Joint Statistical Meetings, “Rejecting Statistical Significance Tests: Defanging the Arguments”, discusses an episode that is relevant for the general topic of how journal editors should deal with statistical controversies.

Mark Burgman’s presentation:

- Link to paste into browser: https://philstatwars.files.wordpress.com/2021/03/burgman-main-presentation_v2.mp4

D. Mayo’s Casualties:

- Link to paste into browser: https://philstatwars.files.wordpress.com/2021/03/burgman_mayo-casualities-and-reply.mp4

**Please feel free to continue the discussion by posting questions or thoughts in the comments section on this PhilStatWars post.**

*****Meeting 15 of our the general Phil Stat series which began with the LSE Seminar PH500 on May 21

**UK doesn’t change their clock until March 28.

]]>Last week, giving a long postponed talk for the NY/NY Metro Area Philosophers of Science Group (MAPS), I mentioned how my book *Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars* (2018, CUP) invites the reader to see themselves on a special interest cruise as we revisit old and new controversies in the philosophy of statistics–noting that I had no idea in writing the book that cruise ships would themselves become controversial in just a few years. The first thing I wrote during early pandemic days last March was this post on the Diamond Princess. The statistics gleaned from the ship remain important resources which haven’t been far off in many ways. I reblog it here.

Q. Was it a mistake to quarantine the passengers aboard the Diamond Princess in Japan?

A.The original statement, which is not unreasonable, was that the best thing to do with these people was to keep them safely quarantined in an infection-control manner on the ship. As it turned out, that was very ineffective in preventing spread on the ship. So the quarantine process failed. I mean, I’d like to sugarcoat it and try to be diplomatic about it, but it failed. I mean, there were people getting infected on that ship. So something went awry in the process of quarantining on that ship. I don’t know what it was, but a lot of people got infected on that ship. (Dr. A Fauci, Feb 17, 2020)

This is part of an interview of Dr. Anthony Fauci, the coronavirus point person we’ve been seeing so much of lately. Fauci has been the director of the National Institute of Allergy and Infectious Diseases since all the way back to 1984! You might find his surprise surprising. Even before getting our recent cram course on coronavirus transmission, tales of cruises being hit with viral outbreaks are familiar enough. The horror stories from passengers on the floating petri dish were well known by this Feb 17 interview. Even if everything had gone as planned, the quarantine was really only for the (approximately 3700) passengers because the 1000 or so crew members still had to run the ship, as well as cook and deliver food to the passenger’s cabins. Moreover, the ventilation systems on cruise ships can’t filter out particles smaller than 5000 or 1000 nanometers.[1]

“If the coronavirus is about the same size as SARS [severe acute respiratory syndrome], which is 120 nanometers in diameter, then the air conditioning system would be carrying the virus to every cabin,” according to Purdue researcher, Qingyan Chen, who specializes in how air particles spread in different passenger crafts. (His estimate was correct: the coronavirus is 120 nanometers.) Halfway through the quarantine, after passenger complaints, they began circulating only fresh air–which would have been preferable from the start. By then, however, it is too late: the ventilation system is already likely filled with the virus, says Chen.[2] Arthur Caplan, the bioethicist who is famous for issuing rulings on such matters, declares that

“Boats are notorious places for being incubators for viruses. It’s only morally justified to keep people on the boat if there are no other options.”

Admittedly, it is hard to see an alternative option to accommodate so many passengers for a 2 week quarantine on land, and there was the possible danger of any infections spreading to the local population in Japan. So, by his assessment, it may be considered morally justified.

*The upshot*: As of 19 March 2020, at least 712 out of the 3,711 passengers and crew had tested positive for covid-19; 9 of those who were on board have died from the disease (all over the age of 70). As I was writing this, I noted a new CDC report on the Diamond Princess as well as other cruise ships; they state 9 deaths.[3] A table on the distribution of ages of passengers on the Diamond Princess is in Note [4].

*So how did the Diamond Princess cruise ship become a floating petri dish for the coronavirus from Feb 4-Feb 20?*

**The Quarantine**

It was their last night of a 2-week luxury cruise aboard the Diamond Princess in Japan (Feb 3) when the captain came on the intercom. He announced: a passenger on this ship who disembarked in Hong Kong 9 days ago (Jan 25) has tested positive for the coronovirus. (He was on board for 5 days.) Everyone will have to stay on board an extra day to be examined by the Japanese health authorities. A new slate of activities was arranged to occupy passengers during the day of health screening–later mostly dropped. But on the evening of February 3, things continued on the ship more or less as before the intercom message.

“The response aboard the Diamond Princess reflected concern, but not a major one. The buffets remained open as usual. Onboard celebrations, opera performances and goodbye parties continued”. (NYT, March 8)

The next day, as health officials went door to door to screen passengers, guests still circulated on board, lined up for buffets, and used communal spaces. But then, the following morning (Feb 5), as guests were heading to breakfast, the captain came over the intercom again. He announced that 10 people had tested positive for the coronavirus and would be taken off the ship. Everyone else would now have to be quarantined in their cabins for 14 days. The second day of the quarantine (Feb 6) it was announced that 20 people more had tested positive, then on day three, 41 more, then 64 more, and on and on. By the end of the quarantine on February 19 at least 621 on the ship had tested positive for the virus.

Adding to the stress, “we quickly learned that our tests were part of an initial batch of 273 samples and that the first 10 cases reported on day one were only from the first 31 samples that had been processed” from the passengers with highest risk. (U.S. passenger, Spencer Fehrenbacher, interviewed on the ship)

As the number of infected ballooned, passengers were not always informed right away; some took to counting ambulances lined up outside to find out how many new cases would be announced at some point. I wonder if the passengers were told that the very first person to test positive was a crew member responsible for preparing food. In fact, by February 9, around 20 of the crew members tested positive, *15 of which were workers preparing food*. Crew members lived in close quarters, shared rooms and continued to eat their meals together buffet-style. They had no choice but to keep running the ship as best as they could.

“Feverish passengers were left in their rooms for days without being tested for the virus. Health officials and even some medical professionals worked on board without full protective gear. [Several got infected.] Sick crew members slept in cabins with roommates who continued their duties across the ship, undercutting the quarantine”. (NYT Feb 22)

Passengers in cabins without windows (and later, others) were allowed to walk on deck, six feet apart, for a short time daily. Unfortunately, presumed infection-free “green zones” were not rigidly separated from potentially contaminated “red zones”, and people walked back and forth between them. Gay Courter, a writer from the U.S. who, as it happens, situated one of her murder mysteries on a cruise ship, told *Time* “It feels like I’m in a bad movie. I tell myself, ‘Wake up, wake up, this isn’t really happening.’” (Time, Feb 11). This is the same bad movie we are all in now, except our horror tale has gotten much worse than on Feb 10.

At some point, I think Feb 10, the ship became the largest concentration of Covid-19 cases outside China, which is why you’ll notice the Diamond Princess has own category in the data compiled by the World Health Organization (Worldometer).

In a Science Today article, a Japanese infectious disease specialist regretted the patchwork way in which passenger testing was done:

Japan has missed a chance to answer important epidemiological questions about the new virus and the illness it causes. For instance, a rigorous investigation that tested all passengers at the start of the quarantine and followed them through to the end could have provided information on when infections occurred and answered questions about transmission, the course of the illness, and the behavior of the virus.

(They were only able to test people in stages.) A similar paucity of testing in the U.S. robs us from crucial information for understanding and controlling the coronavirus. However, there is a fair amount being gleaned from the Diamond Princess, as you can see in the references below. (Please share additional references in the comments.) More is bound to follow.

**Estimates from the Diamond Princess**

“Data from the *Diamond Princess* cruise ship outbreak provides a unique snapshot of the true mortality and symptomatology of the disease, given that everyone on board was tested, regardless of symptoms”–or at least virtually all. [link] The estimates (from the Diamond Princess) I’ve seen are based on those from the London School of Hygiene and Tropical Medicine, in a paper still in preprint form,”Estimating the infection and case fatality ratio for COVID-19 using age-adjusted data from the outbreak on the Diamond Princess cruise ship”.

Adjusting for delay from confirmation-to-death, we estimated case and infection fatality ratios (CFR, IFR) for COVID-19 on the Diamond Princess ship as 2.3% (0.75%-5.3%) [among symptomatic] and 1.2% (0.38-2.7%) [all cases]. Comparing deaths onboard with expected deaths based on naive CFR estimates using China data, we estimate IFR and CFR in China to be 0.5% (95% CI: 0.2-1.2%) and 1.1% (95% CI: 0.3-2.4%) respectively. (PDF)

(For definitions and computations, see the article.) These are lower than the numbers we are often hearing. They used their lower fatality estimates to adjust (down) the estimates from China data. The paper lists a number of caveats.[5] I hope readers will have a look at it (it’s just a few pages) and share their thoughts in the comments. (Their estimates are in sync with an article by Fauci et al., to come out this week in *NEJM*; but whatever the numbers turn out to be, we know our healthcare system, in many places, is being overloaded. [6])

Another study takes the daily reports of infections on the Diamond Princes to attempt to evaluate the impact of the quarantine, as imperfect as it was, in comparison to a counterfactual situation where nothing was done, including not removing infected people from the ship. They estimate nearly 80%, rather than 17% would have been infected. [link]

We found that the reproductive number [R

_{0}] of COVID-19 in the cruise ship situation of 3,700 persons confined to a limited space was around 4 times higher than in the epicenter in Wuhan, where was estimated to have a mean of 3.7.[7]The interventions that included the removal of all persons with confirmed COVID-19 disease combined with the quarantine of all passengers substantially reduced the anticipated number of new COVID-19 cases compared to a scenario without any interventions (17% attack rate with intervention versus 79% without intervention) … However, the main conclusion from our modelling is that evacuating all passengers and crew early on in the outbreak would have prevented many more passengers and crew members from getting infected.” [link]

Only 76, rather than 621 would have been infected, they estimate. [8]

Conclusions: The cruise ship conditions clearly amplified an already highly transmissible disease. The public health measures prevented more than 2000 additional cases compared to no interventions. However, evacuating all passengers and crew early on in the outbreak would have prevented many more passengers and crew from infection.

These studies and models are of interest, although I’m in no position to evaluate them. Please share your thoughts and information, and point out any errors you find. I will indicate updates in the title of this post.

**Optimism**

I leave off with the remark of one of the U.S. passengers interviewed while still on the Diamond Princess:

“Being knee deep in the middle of a crisis leaves a person with two options — optimism or pessimism. The former gives a person strength, and the latter gives rise to fear.” (link)

He, like the others who were evacuated, faced an additional 2 weeks of quarantine.[9] He has since returned home and remains infection free.

*****

[1] As a noteworthy aside, Fauci was able to assure the interviewer that the “danger of getting coronavirus now is just minusculely low” (in the U.S. on Feb. 17). What a difference 2 weeks can make.

[2] In a 2015 paper, Chen and colleagues found a cruise ship’s ventilation spread particles from cabin to cabin. They found that 1 infected person typically led to more than 40 cases a week later on a 2000 passenger cruise. By contrast, the coronavirus, with a reproductive rate of 2 cases per infected person, would only lead to 3 new cases during that time. Planes rely on high-strength air filters and are designed to circulate air within cabin sections.

[3] In a March 23 CDC report: Among 3,711 Diamond Princess passengers and crew, 712 (19.2%) had positive test results for SARS-CoV-2. Of these, 331 (46.5%) were asymptomatic at the time of testing. Among 381 symptomatic patients, 37 (9.7%) required intensive care, and nine (1.3%) died (*8*).

They found coronavirus in Diamond Princess cabins 17 days after passengers disembarked (prior to cleaning).

[4] A table from the Japanese National Institute of Infectious Diseases (NIID) (Source LINK):

[5]

“There were some limitations to our analysis. Cruise ship passengers may have a different health status to the general population of their home countries, due to health requirements to embark on a multi-week holiday, or differences related to socio-economic status or comorbidities. Deaths only occurred in individuals 70 years or older, so we were not able to generate age-specific cCFRs; the fatality risk may also be influenced by differences in healthcare between countries”.

[6] In a March 26 article by Fauci and others, Covid-19 — Navigating the Uncharted, we read:

“If one assumes that the number of asymptomatic or minimally symptomatic cases is several times as high as the number of reported cases, the case fatality rate may be considerably less than 1%.”

[7] R_{0} may be viewed as the expected number of cases generated directly by 1 case in a susceptible population.

[8] The number in the most recent report is 712, but that would be after the quarantine ended on Feb 19.

[9] I read today that one of the U. S. evacuated passengers just entered a clinical trial on remdesivir. This would be over a month since the end of the first quarantine.

If you want to see the comments from the March, 2020 post, it’s here.

**REFERENCES:**

- Fauci interview: ‘Danger of getting coronavirus now is just minusculely low‘

- Giwa, A., LLB, MD, MBA, FACEP, FAAEM; Desai, A., MD; Duca, A., MD; Translation by: Sabrina Paula Rodera Zorita, MD (2020). “Novel 2019 Coronavirus SARS-CoV-2 (COVID-19): An Updated Overview for Emergency Clinicians – 03-23-20”
*EBMedicine.net*; Pub Med ID: 32207910; (LINK)

- Japanese National Institute of Infectious Diseases (NIID). “Field Briefing: Diamond Princess COVID-19 Cases, 20 Feb Update” (LINK)

- Russell, T., Hellewell, J.,Jarvis, C., van-Zandvoort, K.Abbott, S.,Ratnayake, R., Flasche, S., Eggo, R. & Kucharski, A. (2020). “Estimating the infection and case fatality ratio for COVID-19 using age-adjusted data from the outbreak on the Diamond Princess cruise ship.”
*MedRXIV: The preprint server for the Health Sciences*. (March 9, 2020). (PDF)

- Zheng, L., Chen, Q., Xu, J., & Wu, F. (2016). Evaluation of intervention measures for respiratory disease transmission on cruise ships.
*Indoor and Built Environment*, 25(8), 1267–1278. (First Published online August 28, 2015 ). (PDF)

*The seventh meeting of our Phil Stat Forum*:*

**The Statistics Wars
and Their Casualties
**

**25 March, 2021**

**TIME: 15:00-16:45 (London); 11:00-12:45 (New York, NOTE TIME CHANGE)**

**For information about the Phil Stat Wars forum and how to join, click on this link.**

**“ How should applied science journal editors deal with statistical controversies?**

**Mark Burgman**

**Mark Burgman **is the Director of the Centre for Environmental Policy at Imperial College London and Editor-in-Chief of the journal Conservation Biology, Chair in Risk Analysis & Environmental Policy. Previously, he was Adrienne Clarke Chair of Botany at the University of Melbourne, Australia. He works on expert judgement, ecological modelling, conservation biology and risk assessment. He has written models for biosecurity, medicine regulation, marine fisheries, forestry, irrigation, electrical power utilities, mining, and national park planning. He received a BSc from the University of New South Wales (1974), an MSc from Macquarie University, Sydney (1981), and a PhD from the State University of New York at Stony Brook (1987). He worked as a consultant ecologist and research scientist in Australia, the United States and Switzerland during the 1980’s before joining the University of Melbourne in 1990. He joined CEP in February, 2017. He has published over two hundred and fifty refereed papers and book chapters and seven authored books. He was elected to the Australian Academy of Science in 2006.

**Abstract: **Applied sciences come with different focuses. In environmental science, as in epidemiology, the framing and context of problems is often in crises. Decisions are imminent, data and understanding are incomplete, and ramifications of decisions are substantial. This context makes the implications of inferences from data especially poignant. It also makes the claims made by fervent and dedicated authors especially challenging. The full gamut of potential statistical foibles and psychological frailties are on display. In this presentation, I will outline and summarise the kinds of errors of reasoning that are especially prevalent in ecology and conservation biology. I will outline how these things appear to be changing, providing some recent examples. Finally, I will describe some implications of alternative editorial policies.

Some questions:

*Would it be a good thing to dispense with p-values, either through encouragement or through strict editorial policy?

*Would it be a good thing to insist on confidence intervals?

*Should editors of journals in a broad discipline, band together and post common editorial policies for statistical inference?

*Should all papers be reviewed by a professional statistician?

If so, which kind?

Professor Burgman is developing this topic anew, so we don’t have the usual background reading. However, the following paper of mine from the Joint Statistical Meetings, “Rejecting Statistical Significance Tests: Defanging the Arguments“, discusses an episode that is relevant for the general topic of how journal editors should deal with statistical controversies.

Mark Burgman’s Draft Slides (pdf)

**Mayo’s Memos: **Any info or events that arise that seem relevant to share with y’all before the meeting. Please check back closer to the meeting day.

*****Meeting 15 of our the general Phil Stat series which began with the LSE Seminar PH500 on May 21

Have you ever wondered if people read Master’s (or even Ph.D) theses a decade out? Whether or not you have, I think you will be intrigued to learn the story of why an obscure Master’s thesis from 2012, translated from Chinese in 2020, is now an integral key for unravelling the puzzle of the global controversy about the mechanism and origins of Covid-19. The Master’s thesis by a doctor, Li Xu [**1**], “*The Analysis of 6 Patients with Severe Pneumonia Caused by Unknown Viruses*”, describes 6 patients he helped to treat after they entered a hospital in 2012, one after the other, suffering from an atypical pneumonia from cleaning up after bats in an abandoned copper mine in China. Given the keen interest in finding the origin of the 2002–2003 severe acute respiratory syndrome (SARS) outbreak, Li wrote: “This makes the research of the bats in the mine where the six miners worked and later suffered from severe pneumonia caused by unknown virus a significant research topic”. He and the other doctors treating the mine cleaners hypothesized that their diseases were caused by a SARS-like coronavirus from having been in close proximity to the bats in the mine.

Jonathan Latham and Allison Wilson, scientists at the Bioscience Resource Project in Ithaca, decided Li Xu’s master’s thesis was important enough to translate from Chinese.

The evidence it contains has led us to reconsider everything we thought we knew about the origins of the COVID-19 pandemic. It has also led us to theorise a plausible route by which an apparently isolated disease outbreak in a mine in 2012 led to a global pandemic in 2019. (Latham & Wilson 2020)

They dubbed it the Mojiang Miner’s theory, because the mineshaft is located in Mojiang, in Yunnan province, China, 1000 miles from Wuhan. One of the mine cleaners from 2012, they speculate, might even have been patient zero of the current pandemic! But except for a brief sketch in note **5,** I put that aside for this post and turn to the article that first sparked my interest in the Mojiang mine from the *Times of London* July 4, 2020. Its subtitle is: ‘The world’s closest known relative to the Covid-19 virus was found in 2013 by Chinese scientists in an abandoned mine where it was linked to deaths caused by a coronavirus-type respiratory illness’. For a long time, it was one of the only articles on the mysteries that came to light with this Master’s thesis: now the mine mysteries are mentioned in every critical discussion of Covid-19 origins.

I will likely write updates to this post (following with (i), (ii), etc in the title), and possibly follow-up posts. I started it weeks ago, and as I learned more, I decided it was too much for one post. Please share corrections in the comments.

**1. The Mojiang Mine**

The Times authors set the scene in their picturesque opening:

In the monsoon season of August 2012 a small team of scientists travelled to southwest China to investigate a new and mysteriously lethal illness. After driving through terraced tea plantations, they reached their destination: an abandoned copper mine where — in white hazmat suits and respirator masks — they ventured into the darkness. Instantly, they were struck by the stench. Overhead, bats roosted. Underfoot, rats and shrews scurried through thick layers of their droppings. It was a breeding ground for mutated micro-organisms and pathogens deadly to human beings. There was a reason to take extra care. Weeks earlier, six men who had entered the mine had been struck down by an illness that caused an uncontrollable pneumonia. Three of them died.

Today [back in July 2020], as deaths from the Covid-19 pandemic exceed half a million and economies totter, the bats’ repellent lair has taken on global significance.

Evidence seen by

The Sunday Times suggeststhat a virus found in its depths — part of a faecal sample that was frozen and sent to a Chinese laboratory for analysis and storage — is the closest known match to the virus that causes Covid-19. (LondonTimes)

The lab to which the sample was sent was the Wuhan Institute of Virology (WIV), home of a world renown site for bat coronavirus research, led by Shi Zhengli, often called “batwoman” in recognition of her years of bat coronavirus research.

The pneumonia the miners were suffering from was deemed sufficiently serious and unusual to immediately call in an acclaimed virologist, Professor Zhong Nanshan, who had led China’s efforts against the first SARS, referred to now as SARS-CoV-1 to distinguish it from SARS-CoV-2, the virus that causes Covid-19.

The Wuhan Institute of Virology (WIV) …was called in to test the four survivors. These produced a remarkable finding: while none had tested positive for Sars, all four had antibodies against another, unknown Sars-like coronavirus. (

LondonTimes)

The detailed description of their symptoms and disease progression in the Master’s thesis exactly echoes what we now see in those with Covid-19: high fevers, coughs, difficulty in breathing, and many of the treatments tried are also in sync with those used today, including one found to be one of the most successful: steroids.

Shi Zhengli was in the midst of researching bat caves around 200 miles from the Mojiang mine when her team was alerted to the miners. Given their main research focus is SARS-related coronaviruses, especially from bats, this was clearly of great interest to them. So they immediately turned to investigate the Mojiang Mine.

Over the next year, the scientists took faecal samples from 276 bats. The samples were stored at minus 80C in a special solution and dispatched to the Wuhan institute, where molecular studies and analysis were conducted. (

LondonTimes)

One, from a horshoe bat was of special interest because it was considered a brand new strain of a SARS-related virus. In a February 2016 article that Shi co-authored, the bat sample was named **RaBtCoV/4991**. Oddly, the paper, titled “Coexistence of multiple coronaviruses in several bat colonies in an abandoned mineshaft,” makes no mention of the reason the whole study took place: no mention of the miners or the fact that three died from pneumonia contracted from bats in the mine where the sample was found (Mystery #1). But what really raised an alarm for me is the fact that Shi, when asked about the miners (in an interview in the March-April 2020 issue of *Scientific American, *hereafter* SA 2020*) averred the miners were killed by a fungus and not a virus (Mystery #2). [See Note **7**, added March 11, 2021.]

Shi describes the mine as “a flying factory for new viruses” due to finding that often “multiple viral strains had infected a single animal.” While claiming it was a fungus that killed the minors, “she says it would have been only a matter of time before they caught the coronaviruses if the mine had not been promptly shut” (*SA 2020*). [I was struck to hear she thought they’d be directly infected, since from day 1 there’s often been an assumption that an intermediate species was needed.]

**2. December 30, 2019 and the current pandemic**

All that was pre-SARS-CoV-2. Away at a conference, Shi receives a call on Dec 30, 2019 that there’s a new coronavirus running rampant in Wuhan. Shi recalls the WIV director saying: “Drop whatever you are doing and deal with it now.” Her first thought as she makes her way back to Wuhan is ‘If coronaviruses were the culprit, she remembers thinking, ‘Could they have come from our lab?’”*(SA 2020)*

Her musing that the new virus might have come from her lab is, in one sense, unsurprising, given Wuhan contains three labs specializing in the study of bat coronaviruses, hers being the only one at biosafety level 4.

Shi breathed a sigh of relief when the results came back: none of the sequences matched those of the viruses her team had sampled from bat caves. ‘That really took a load off my mind,’ she says. ‘I had not slept a wink for days.’ …The genomic sequence of the virus, eventually named SARS-CoV-2, was 96 percent identical to that of a coronavirus the researchers had identified in horseshoe bats in Yunnan. Their results appeared in a paper published online on February 3 2020 in

Nature.(SA 2020)

They dubbed it batcoronavirus** RaTG13**. In this 2020 article, co-authored by Shi, they write:

RaTG13 is the closest relative of [SARS-CoV-2] … The close phylogenetic relationship to RaTG13 provides evidence that [SARS-CoV-2] may have originated in bats.…On the basis of these findings, we propose that the disease could be transmitted by airborne transmission, although we cannot rule out other possible routes of transmission. (Zhou, Yang,…Shi,

Nature 2020 article)

But wait, let’s go back. Why a sigh of relief that SARS-CoV-2 is only 96% identical to one of the bat samples? What about the numerous specimens taken from the Mojiang miners? How close are *they* to SARS-CoV-2? Frustratingly, to this day we’re never told. (Mystery #3) Moreover, while RaTG13 is described as being found in a cave in Yunnan, there is no mention of BtCoV/4991. Nor is there a citation of the initial 2016 article describing BtCoV/4991, even though it was co-authored by Shi (Mystery #4).

* It turns out that RaBtCoV/4991 is identical to RaTG13!* However, it required independent groups to sleuth this out [

In fact, researchers in India and Austria have compared the partial genome of the mine sample that was published in the 2016 paper and found it is a 100% match with the same sequence for RaTG13. The same partial sequence for the mine sample is a 98.7% match with the Covid-19 virus. (

LondonTimes)

Why would the 2020 paper describing the closest relative to SARS-CoV-2 fail to mention that it is one and the same as the virus unearthed from the mine where 3 people died, and had already been cited in the 2016 paper, both with Shi as co-author? It’s one thing to rename it, but to fail to note this goes against typical publishing norms.

My initial attitude to the whole business of the origins of Covid-19 is that we’d probably never find out and anyway, the most important thing was trying to find treatments, prophylactics and vaccines, to understand the mechanism of Covid-19 and especially to prevent future pandemics. But it became clear that those goals hinge on the information that mysteriously was being hidden by the research groups being funded (by the U.S.) precisely to provide surveillance and monitoring about pandemics. Without being able to pinpoint all the individuals involved, I will just allude to the *WIV research group* from the time of the Mojiang Miners. (See also note **3**.)

So what did the WIV research group do with RaBtCoV/4991 in the ensuing years between finding it in 2013 (2016 article) and the revelation in the early pandemic (2020 article)? According to them, not much: it was said to have been stowed away in a freezer and only taken out *after* cases of Covid-19 appeared in Wuhan at the end of December 2019.

Other scientists find the initial indifference about a new strain of the coronavirus hard to understand. Nikolai Petrovsky, professor of medicine at Flinders University in Adelaide, South Australia, said it was “simply not credible” that the WIV would have failed to carry out any further analysis on RaBtCoV/4991, especially as it had been linked to the deaths of three miners.

‘If you really thought you had a novel virus that had caused an outbreak that killed humans then there is nothing you wouldn’t do — given that was their whole reason for being [there] — to get to the bottom of that, even if that meant exhausting the sample and then going back to get more,’ he said. (

London Times)

So it seems the WIV research group failed at “their whole reason for being” there, since the sample simply sat in a freezer for 6 years. Maybe if they had investigated RaBtCoV/4991 in relation to the virus the miners died of they might have prevented the pandemic the world is now struggling under.

Perhaps it was to downplay the fact that they fell down on the job that they opted for a name switch (from **RaBtCoV/4991** in 2016 to **RaTG13 **in 2020), and lack of citation of the 2016 paper. Nothing more sinister is suggested or needed for my argument to go through. There is apparently no way to study the sample of RaTG13 further, since it it is said to have disintegrated upon being sequenced. (I will just call it RaTG13 in what follows.) 8 other SARS-related bat coronaviruses from the mine remain unpublished, to my knowledge.

**3****. More Mysteries **

Not only is it incredible that no work had been done on RaTG13 in the ensuing years between its discovery and the SARS-CoV-2 outbreak, it turns out to be false! Alina Chan, who describes herself as a molecular biologist turned detective (into origins of SARS-CoV-2), “pointed to an online database showing that the WIV had been genetically sequencing the mine virus in 2017 and 2018, analyzing it in a way they had done in the past with other viruses in preparation for running experiments with them.” (Boston Magazine) (Mystery #6)

But now that we know RaTG13 was sequenced and experimented upon in 2017 and 2018, we are still struck with the mysteries as why they had claimed only to sequence it after the world is hit with the Covid-19 pandemic, and why her close collaborator, Peter Daszak, who for years has funneled money from NIH grants to support the WIV bat coronavirus research, was reporting that the sample was ignored in a freezer for 6 years.[**3**] Only after the earlier sequencing was revealed did Daszak admit he was wrong. Likewise, it took considerable pressure on *Nature* before the appearance of a December **2020 addendum** to the 2020 article where they admit the earlier experimentation. All very mysterious given that such experimentation would have been expected, since their charge was to investigate specimens with pandemic spillover potential, and since RaTG13 was described *by them* as having such potential in 2016. So what kind of research were they engaged in?

Some of the experiments — “gain of function” experiments — aimed to create new, more virulent, or more infectious strains of diseases in an effort to predict and therefore defend against threats that might conceivably arise in nature. The term

gain of functionis itself a euphemism; the Obama White House more accurately described this work as ‘experiments that may be reasonably anticipated to confer attributes to influenza, MERS, or SARS viruses such that the virus would have enhanced pathogenicity and/or transmissibility in mammals via the respiratory route.’ The virologists who carried out these experiments have accomplished amazing feats of genetic transmutation, no question, and there have been very few publicized accidents over the years. But there have been some. (NY Magazine)

There was a moratorium on such research in the U.S. in 2014, but funding was restored in 2017. Money from U.S. agencies are funneled through Daszak’s organization, the EcoHealth Alliance, to the WIV research team. [The latest award was cut in April 2020, then restored in August 2020.]

Those engaged in such research aver that it is necessary to provide disease surveillance systems to alert us if viruses with pandemic potential are making the jump to humans. Maybe so. But the 2016 paper hid the main details that might have been of use for this. The question isn’t whether this kind of gain of function research *could theoretically* be useful, but rather whether a specific research group, here, WIV-EcoHealth research, has shown itself to be committed to the transparent behavior necessary to warrant support. It has not.

**4. Falsifying the hypothesis of trusted research**

What we have is strong, independent pieces of evidence to falsify the group’s claim to good faith commitments to responsibly conduct such research, or even communicate honestly what is known. Were they reliable partners in pandemic research, in the face of the real pandemic we are suffering, they would have bent over backwards to supply explanations for the conflicting admissions, rather than add more obfuscation. Note that nothing more is required to ground my inference. It’s not a matter of showing a lab error or accidental leak. The evidence that falsifies their being good faith stewards to whom we may look to inform, surveil, and help prevent future pandemics is ample. The onus would be on the WIV-EcoHealth research group to come forward with explanations–something one would expect them to be keen to do in order to support the continued research into bat coronaviruses. *Until and unless they do, we can’t trust much of the key data coming out of the group.*

Here’s what we *know* about the value of the WIV-EcoHealth research when it comes to preventing and informing about actual pandemics. We find out that deaths which it turns out they knew from the start were due to a virus–“We suspected that the patients had been infected by an unknown virus” (2020 Addendum)–are not broadcast and in fact there’s a news blackout about the case. A published paper on bat viruses found (2016) does not mention the deaths. Then when a real honest-to-God pandemic from a SARS-like coronavirus comes to light in the city that does major research in the area, the virus is sequenced but given a new name with no mention of the earlier name, let alone the connection with the miners. No it’s worse, there is confusion or prevarication amongst the researchers as to when it has been sequenced, when it has crumbled, and deliberate attempts to conceal records, including taking the central WIV data base off line, preventing further checks. In each case there are denials that only later, after revelations by independent sleuths, result in about-faces. But having declared one thing, it doesn’t ameliorate the situation when the opposite is conceded only in the face of undeniable demonstrations of its falsity. We are still left with conflicting declarations and no explanation for the earlier, opposite stance.

These strange and unscientific actions have obscured the origins of the closest viral relatives of SARS-CoV-2, viruses that are suspected to have caused a COVID-like illness in 2012 and which may be key to understanding not just the origin of the COVID-19 pandemic but the future behaviour of SARS-CoV-2.” (Latham and Wilson)

If it weren’t for the Master’s thesis, the admissions that have come forward might never have occurred.

A co-author of an expert guide to investigating outbreak origins, Dr. Filippa Lentzos, said,

We also need to take a hard look in the mirror. It is our own virologists, funders and publishers who are driving and endorsing the practice of actively hunting for viruses and the high-risk research of deliberately making viruses more dangerous to humans. We need to be more open about the heavily vested interests of some of the scientists given prominent platforms to make claims about the pandemic’s origins. [Chan and Ridley 2021]

The WIV research group has gained the knowledge of how to make a virus more transmissible.[**4**],[**5**] One of the existing patents, I read, are for methods that could result in turning a SARS-related coronavirus into SARS-CoV-2. That knowledge hasn’t helped the world control SARS-CoV-2. Good faith sharing of the earlier research would at least have shown the commitment to transparency and ethical research norms. When it comes to the question of the trust that is necessary to endorse future research, the known facts here are actually *more* troubling than previous cases of lab leaks that were openly admitted and followed by the adoption of improved methods and clear oversights. If this is how a research group behaves when there’s no association between the lab and the pandemic, how much worse can we expect in the case of an actual lab error?

Share your thoughts, links and corrections in the comments.

Mar 4, 2021: I’m adding a new note [**6**] on the W.H.O. investigation.

**Notes**

**[1]** His supervisor, Professor Qian Chuanyun, worked in the emergency department that treated the men. Other details were found in a PhD thesis by a student of the director of the Chinese Centre for Disease Control and Prevention. The full Master’s thesis can also be read in Latham and Wilson 2020 (No paywall).

**[2]** Details were filled in by independent sleuths throughout the world and “an anonymous Twitter user known as ‘The Seeker’ and a group going by the name of DRASTIC” (Ridley and Chan (2021)). One of the first articles to delineate a possible lab leak is Sirotkin, K. & Sirotkin, D. (2020) https://doi.org/10.1002/bies.202000091.

Rossana Segreto et al. (2020), who identified the identity of RaTG13 and 4991, write:

In late July 2020, Zhengli Shi, the leading CoV researcher from WIV, in an email interview asserted the renaming of the RaTG13 sample and unexpectedly declared that the full sequencing of RaTG13 has been carried out as far back as in 2018 and not after the SARS‐CoV‐2 outbreak, as stated in [her own joint article in February of 2020].

I make no claims about having identified who first found what, as this is not my research area, but if you have an item you think I should reference, I’ll be glad to look at it. Use the comments. Here’s one sent in a comment yesterday by one of the authors:

Rahalkar, M.C.; Bahulikar, R.A. Understanding the Origin of ‘BatCoVRaTG13’, a Virus Closest to SARS-CoV-2. *Preprints* **2020**

**[3]** Daszak runs a non-government group called the EcoHealth Alliance which disburses funds for research into coronaviruses and other pathogens from U.S. agencies to labs throughout the world. A portion of these grants go to his outfit and he’s one of the most vocal supporters for their continuation. We might even call the research group the *WIV-EcoHealth Alliance *research group. Understandably many scientists find conflicts of interest in having Daszak leading enquiries into possible Covid lab leak. But he continues to be a key player. *Link: https://gmwatch.org/en/news/latest-news/19538-scientists-outraged-by-peter-daszak-leading-enquiry-into-possible-covid-lab-leak.*

The worst fears of conflicts of Interest came true upon reading the recent reports on Covid origins. See Mallapaty, S. et al. (2021).

“To find genuinely critical analysis of COVID-19 origin theories one has to go to Twitter, blog posts, and preprint servers. The malaise runs deep when even scientists start to complain that they don’t trust science.”(Latham and Wilson)

**[4]** Another important name at the cutting edge of gain of function work on bat coronaviruses is Ralph Baric (from UNC). He was perhaps the first to show how to transfer viruses from one species to another. “Not only that, but they’d figured out how to perform their assembly seamlessly, without any signs of human handiwork. Nobody would know if the virus had been fabricated in a laboratory or grown in nature. Baric called this the “no-see’m method.” (New York Magazine).

An eye-opening, excellent (< 10 min) **video from leading coronavirologists who know directly of the gain of function experiments**. English subtitles. Link: https://twitter.com/learnfromerror/status/1365124271786369025?s=20

**[5]** Latham and Wilson theorize that the initial virus evolved in the miners themselves during the months-long infection suffered by some of the miners, mimicking the process of serial passaging. This

is a standard virological technique for adapting viruses to new species, tissues, or cell types. It is normally done by deliberately infecting a new host species or a new host cell type with a high dose of virus. This initial viral infection would ordinarily die out because the host’s immune system vanquishes the ill-adapted virus. But, in passaging, before it does die out a sample is extracted and transferred to a new identical tissue, where viral infection restarts. Done iteratively, this technique … intensively selects for viruses adapted to the new host or cell type. ….We propose that, when frozen samples derived from the miners were eventually opened in the Wuhan lab they were already highly adapted to humans to an extent possibly not anticipated by the researchers. One small mistake or mechanical breakdown could have led directly to the first human infection in late 2019.

(However, there’s no knowledge that the miners transmitted their virus to others around them, but it might be that those around them wore sufficiently protective gear.)

Latham and Wilson’s theory shares analogies with the viral evolution seen in immunocompromised patients https://www.scientificamerican.com/article/covid-variants-may-arise-in-people-with-compromised-immune-systems/.

The same principle underlies the worry about extending the time lag between doses of vaccines. There’s a risk that subimmune individuals with enough antibodies to slow the virus, and perhaps remain asymptomatic, but not enough to wipe it out, could harbor viral variants. https://www.sciencemag.org/news/2021/01/could-too-much-time-between-doses-drive-coronavirus-outwit-vaccines

For another theory, please see note [**8**] added March 17, 2021.

[**6**] This post does not continue into the recent investigation of Covid origins (organized, but not necessarily endorsed, by W.H.O.), but it’s clear that the facts discussed here are at the heart of the charges that the inquiry was biased and sorely inadequate. (Problems with alternative zoonotic and frozen food hypotheses add much fuel to the fire.) As the investigation was incapable of uncovering a lab accident or leak, it cannot rule out that hypothesis with any kind of severity. See my comment from March 4 for links to articles out today, and a letter from a group of scientists calling for a brand new, international investigation.

[**7**] Added March 11, 2021. This article, “A New Killer Virus in China” (Science 2012) is noteworthy because it describes (what I assume is) a different group, who identified what they called the Mojiang virus that killed the 3 miners. For one thing, it corroborates that it was known the miners died of a virus from the start, but I don’t know its relation to RaTG13. Was their finding from a bat or a rat? What was the relation between this group and Shi’s group? I’d be grateful to hear from people in the know. https://www.sciencemag.org/news/2014/03/new-killer-virus-china

This 2017 article on (what they’re calling) the Mojiang virus and the miners is also telling: https://www.nature.com/articles/ncomms16060

[**8**] March 17 addition: Today I read of a different theory about a possible accidental recombination in the lab.

“Petrovsky leans towards another potential scenario, namely that SARS-CoV-2 might be evolved from coronaviruses that snuck into lab cultures. Related viruses in the same culture, he explains, such as one optimized for human ACE2 binding and another not, can swap genetic material to create new strains. … Viruses are evolving the whole time and it’s easy for a virus to get into your culture without you knowing it.” Petrovsky and several co-authors speculated in a paper published as a non-peer-reviewed preprint in May of last year as to whether the virus was “completely natural” or whether it originated with “a recombination event that occurred inadvertently or intentionally in a laboratory handling coronaviruses.”

https://undark.org/2021/03/17/lab-leak-science-lost-in-politics/

**Acknowledgement**: I thank Jean Miller for many useful comments, suggestions and corrections on earlier drafts of this post.

**References**

Arbuthnott, G., Calvert, J., & Sherwell, P. (2020). Revealed: Seven year coronavirus trail from mine deaths to a Wuhan lab. ** The London Times**, UK, (July 4, 220 The Sunday Times Insight Investigation).

Baker, N. (2021) The Lab Leak Hypothesis, ** New York Magazine **(January 4, 2021).

Butler, C., Canard, B., Cap, H., et al. (2021). OPEN LETTER: Call for a Full and Unrestricted International Forensic Investigation into the Origins of COVID-19. Signed by 26 scientists, social scientists and science communicators. *March 4, 2021.*

Chan, A. Tweetorials on Covid-19 origins: https://twitter.com/Ayjchan/status/1320344055230963712

Chan, A. & Ridley, M. (2021). The World Needs a Real Investigation Into the Origins of Covid-19. *The Wall Street Journal* (January 15, 2021).

Ge, XY., Wang, N., Zhang, W. …**Shi, Z-L . **(2016). Coexistence of multiple coronaviruses in several bat colonies in an abandoned mineshaft.

Jacobsen, R. (2020). Could COVID-19 Have Escaped from a Lab? *Boston Magazine *(September 9, 2020).

Latham, J. & Wilson, A. (2020). A proposed Origin for SARS-CoV-2 and the COVID-19 Pandemic. *Independent Science News for Food and Agriculture* website (July 15, 2020).

Mallapaty, S., Maxmen A., & Callaway, E. (2021). “’Major stones unturned’: COVID origin search must continue after WHO report, say scientists”, *Nature* (February 10, 2021).

**[Shi interview, SA] **Qui, J. (2020) (June 1, 2020). How China’s ‘Bat Woman’ Hunted Down Viruses from SARS to the New Coronavirus. *Scientific American*.

Ridley, M. & Chan, A. (2021). Did the Covid-19 virus really escape from a Wuhan lab?. *The Telegraph* (UK) (February 6, 2021).

Segreto R. & Deigin, Y. (2020). The genetic structure of SARS‐CoV‐2 does not rule out a laboratory origin. *Bioessays* (November 17, 2020). Link: https://doi.org/10.1002/bies.202000240

Sirotkin, K. & Sirotkin, D. (2020). Might SARS‐CoV‐2 Have Arisen via Serial Passage through an Animal Host or Cell Culture?

Xu, L. (2013). *The Analysis of Six Patients With Severe Pneumonia Caused By Unknown Viruses *(Master’s Thesis). School of Clinical Medicine, Kun Ming Medical University. Translation into English commissioned *by Independent Science News*, completed June 23, 2020. Link: https://assets.documentcloud.org/documents/6981198/Analysis-of-Six-Patients-With-Unknown-Viruses.pdf

Zhou, P., Yang, XL., Wang, XG. …**Shi, Z**.(2020).A pneumonia outbreak associated with a new coronavirus of probable bat origin. *Nature ***579, **270–273. (February 3, 2020). Link: https://doi.org/10.1038/s41586-020-2012-7

Zhou, P., Yang, XL., Wang, XG. *…* **Shi, Z**.(2020) Addendum: A pneumonia outbreak associated with a new coronavirus of probable bat origin. *Nature* **588, **E6 (2020). (December 3, 2020). Link: https://doi.org/10.1038/s41586-020-2951-z

**Other relevant resources**

A fascinating video including some of the key bat coronavirus researchers (short 9 min with English subtitles) https://www.youtube.com/watch?v=-kt9pVYgqkI

]]>

**Aris Spanos
**Wilson Schmidt Professor of Economics

Virginia Tech

The following guest post (**link to updated PDF) **was written in response to C. Hennig’s presentation at our Phil Stat Wars Forum on 18 February, 2021: “Testing With Models That Are Not True”.

*“Statistical Methods and Scientific Induction“*

*by Sir Ronald Fisher (1955)
*

**SUMMARY**

The attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to “decisions” in Wald’s sense, originated in several misapprehensions and has led, apparently, to several more.

The three phrases examined here, with a view to elucidating they fallacies they embody, are:

- “Repeated sampling from the same population”,
- Errors of the “second kind”,
- “Inductive behavior”.

Mathematicians without personal contact with the Natural Sciences have often been misled by such phrases. The errors to which they lead are not only numerical.

To continue reading Fisher’s paper.

**“Note on an Article by Sir Ronald Fisher“**

**by Jerzy Neyman (1956)**

**Summary**

(1) FISHER’S allegation that, contrary to some passages in the introduction and on the cover of the book by Wald, this book does not really deal with experimental design is unfounded. In actual fact, the book is permeated with problems of experimentation. (2) Without consideration of hypotheses alternative to the one under test and without the study of probabilities of the two kinds, no purely probabilistic theory of tests is possible.

(3) The conceptual fallacy of the notion of fiducial distribution rests upon the lack of recognition that valid probability statements about random variables usually cease to be valid if the random variables are replaced by their particular values. The notorious multitude of “paradoxes” of fiducial theory is a consequence of this oversight. (4) The idea of a “cost function for faulty judgments” appears to be due to Laplace, followed by Gauss.

**“Statistical Concepts in Their Relation to Reality****“.**

**by E.S. Pearson (1955)**

Controversies in the field of mathematical statistics seem largely to have arisen because statisticians have been unable to agree upon how theory is to provide, in terms of probability statements, the numerical measures most helpful to those who have to draw conclusions from observational data. We are concerned here with the ways in which mathematical theory may be put, as it were, into gear with the common processes of rational thought, and there seems no reason to suppose that there is one best way in which this can be done. If, therefore, Sir Ronald Fisher recapitulates and enlarges on his views upon statistical methods and scientific induction we can all only be grateful, but when he takes this opportunity to criticize the work of others through misapprehension of their views as he has done in his recent contribution to this *Journal* (Fisher 1955 “Statistical Methods and Scientific Induction” ), it is impossible to leave him altogether unanswered.

In the first place it seems unfortunate that much of Fisher’s criticism of Neyman and Pearson’s approach to the testing of statistical hypotheses should be built upon a “penetrating observation” ascribed to Professor G.A. Barnard, the assumption involved in which happens to be historically incorrect. There was no question of a difference in point of view having “originated” when Neyman “reinterpreted” Fisher’s early work on tests of significance “in terms of that technological and commercial apparatus which is known as an acceptance procedure”. There was no sudden descent upon British soil of Russian ideas regarding the function of science in relation to technology and to five-year plans. It was really much simpler–or worse. *The original heresy, as we shall see, was a Pearson one!…*

Use this link to continue reading, “Statistical Concepts in Their Relation to Reality“.

]]>This is a belated birthday post for R.A. Fisher (17 February, 1890-29 July, 1962)–it’s a guest post from earlier on this blog by Aris Spanos that has gotten the highest number of hits over the years.

**Happy belated birthday to R.A. Fisher!**

**‘R. A. Fisher: How an Outsider Revolutionized Statistics’**

by **Aris Spanos**

Few statisticians will dispute that R. A. Fisher **(February 17, 1890 – July 29, 1962)** is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of *optimal estimation* based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of *optimal testing* in the early 1930s. According to Hald (1998)

“Fisher was a genius who almost single-handedly created the foundations for modern statistical science, without detailed study of his predecessors. When young he was ignorant not only of the Continental contributions but even of contemporary publications in English.” (p. 738)

What is not so well known is that Fisher was the *ultimate outsider* when he brought about this change of paradigms in statistical science. As an undergraduate, he studied mathematics at Cambridge, and then did graduate work in statistical mechanics and quantum theory. His meager knowledge of statistics came from his study of astronomy; see Box (1978). That, however did not stop him from publishing his first paper in statistics in 1912 (still an undergraduate) on “curve fitting”, questioning Karl Pearson’s method of moments and proposing a new method that was eventually to become the likelihood method in his 1921 paper.

After graduating from Cambridge he drifted into a series of jobs, including subsistence farming and teaching high school mathematics and physics, until his temporary appointment as a statistician at Rothamsted Experimental Station in 1919. During the period 1912-1919 his interest in statistics was driven by his passion for eugenics and a realization that his mathematical knowledge of n-dimensional geometry can be put to good use in deriving finite sample distributions for estimators and tests in the spirit of Gosset’s (1908) paper. Encouraged by his early correspondence with Gosset, he derived the finite sampling distribution of the sample correlation coefficient which he published in 1915 in Biometrika; the only statistics journal at the time, edited by Karl Pearson. To put this result in a proper context, Pearson was working on this problem for two decades and published more than a dozen papers with several assistants on approximating the first two moments of the sample correlation coefficient; Fisher derived the relevant distribution, not just the first two moments.

Due to its importance, the 1915 paper provided Fisher’s first skirmish with the ‘statistical establishment’. Karl Pearson would not accept being overrun by a ‘newcomer’ lightly. So, he prepared a critical paper with four of his assistants that became known as “the cooperative study”, questioning Fisher’s result as stemming from a misuse of Bayes theorem. He proceeded to publish it in Biometrika in 1917 without bothering to let Fisher know before publication. Fisher was furious at K.Pearson’s move and prepared his answer in a highly polemical style which Pearson promptly refused to publish in his journal. Eventually Fisher was able to publish his answer after tempering the style in *Metron*, a brand new statistics journal. As a result of this skirmish, Fisher pledged never to send another paper to *Biometrika*, and declared a war against K.Pearson’s perspective on statistics. Fisher, not only questioned his method of moments as giving rise to inefficient estimators, but also his derivation of the degrees of freedom of his chi-square test. Several, highly critical published papers ensued.[i]

Between 1922 and 1930 Fisher did most of his influential work in recasting statistics, including publishing a highly successful textbook in 1925, but the ‘statistical establishment’ kept him ‘in his place’; a statistician at an experimental station. All his attempts to find an academic position, including a position in Social Biology at the London School of Economics (LSE), were unsuccessful (see Box, 1978, p. 202). Being turned down for the LSE position was not unrelated to the fact that the professor of statistics at the LSE was Arthur Bowley (1869-1957); second only to Pearson in statistical high priesthood.[ii]

Coming of age as a statistician during the 1920s in England, was being awarded the Guy medal in gold, silver or bronze, or at least receiving an invitation to present your work to the Royal Statistical Society (RSS). Despite his fundamental contributions to the field, Fisher’s invitation to RSS would not come until 1934. To put that in perspective, Jerzy Neyman, his junior by some distance, was invited six months earlier! Indeed, one can make a strong case that the statistical establishment kept Fisher away for as long as they could get away with it. However, by 1933 they must have felt that they had to invite Fisher after he accepted a professorship at University College, London. The position was created after Karl Pearson retired and the College decided to split his chair into a statistics position that went to Egon Pearson (Pearson’s son) and a Galton professorship in Eugenics that was offered to Fisher. To make it worse, Fisher’s offer came with a humiliating clause that he was forbidden to teach statistics at University College (see Box, 1978, p. 258); the father of modern statistics was explicitly told to keep his views on statistics to himself!

Fisher’s presentation to the Royal Statistical Society, on December 18th, 1934, entitled “The Logic of Inductive Inference”, was an attempt to summarize and explain his published work on recasting the problem of statistical induction since his classic 1922 paper. Bowley was (self?) appointed to move the traditional vote of thanks and open the discussion. After some begrudging thanks for Fisher’s ‘contributions to statistics in general’, he went on to disparage his new approach to statistical inference based on the likelihood function by describing it as abstruse, arbitrary and misleading. His comments were predominantly sarcastic and discourteous, and went as far as to accuse Fisher of plagiarism, by not acknowledging Edgeworth’s priority on the likelihood function idea (see Fisher, 1935, pp. 55-7). The litany of churlish comments continued with the rest of the old guard: Isserlis, Irwin and the philosopher Wolf (1935, pp. 57-64), who was brought in by Bowley to undermine Fisher’s philosophical discussion on induction. Jeffreys complained about Fisher’s criticisms of the Bayesian approach (1935, pp. 70-2).

To Fisher’s support came … Egon Pearson, Neyman and Bartlett. E. Pearson argued that:

“When these ideas [on statistical induction] were fully understood … it would be realized that statistical science owed a very great deal to the stimulus Professor Fisher had provided in many directions.” (Fisher, 1935, pp. 64-5)

Neyman too came to Fisher’s support, praising Fisher’s path-breaking contributions, and explaining Bowley’s reaction to Fisher’s critical review of the traditional view of statistics as an understandable attachment to old ideas (1935, p. 73).

Fisher, in his reply to Bowley and the old guard, was equally contemptuous:

“The acerbity, to use no stronger term, with which the customary vote of thanks has been moved and seconded … does not, I confess, surprise me. From the fact that thirteen years have elapsed between the publication, by the Royal Society, of my first rough outline of the developments, which are the subject of to-day’s discussion, and the occurrence of that discussion itself, it is a fair inference that some at least of the Society’s authorities on matters theoretical viewed these developments with disfavour, and admitted with reluctance. … However true it may be that Professor Bowley is left very much where he was, the quotations show at least that Dr. Neyman and myself have not been left in his company. … For the rest, I find that Professor Bowley is offended with me for “introducing misleading ideas”. He does not, however, find it necessary to demonstrate that any such idea is, in fact, misleading. It must be inferred that my real crime, in the eyes of his academic eminence, must be that of “introducing ideas”. (Fisher, 1935, pp. 76-82)[iii]

In summary, the pioneering work of Fisher and later supplemented by Egon Pearson and Neyman, was largely ignored by the Royal Statistical Society (RSS) establishment until the early 1930s. By 1933 it was difficult to ignore their contributions, published primarily in other journals, and the ‘establishment’ of the RSS decided to display its tolerance to their work by creating ‘the Industrial and Agricultural Research Section’, under the auspices of which both papers by Neyman and Fisher were presented in 1934 and 1935, respectively. [iv]

In 1943, Fisher was offered the Balfour Chair of Genetics at the University of Cambridge. Recognition from the RSS came in 1946 with the Guy medal in gold, and he became its president in 1952-1954, just after he was knighted! Sir Ronald Fisher retired from Cambridge in 1957. The father of modern statistics never held an academic position in statistics!

You can read more in Spanos 2008 (below)

**References**

Bowley, A. L. (1902, 1920, 1926, 1937) *Elements of Statistics*, 2nd, 4th, 5th and 6th editions, Staples Press, London.

Box, J. F. (1978) *The Life of a Scientist: R. A. Fisher*, Wiley, NY.

Fisher, R. A. (1912), “On an Absolute Criterion for Fitting Frequency Curves,” *Messenger of Mathematics*, 41, 155-160.

Fisher, R. A. (1915) “Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population,” *Biometrika,* 10, 507-21.

Fisher, R. A. (1921) “On the ‘probable error’ of a coefficient deduced from a small sample,” *Metron* 1, 2-32.

Fisher, R. A. (1922) “On the mathematical foundations of theoretical statistics,” *Philosophical Transactions of the Royal Society*, A 222, 309-68.

Fisher, R. A. (1922a) “On the interpretation of *c*^{2} from contingency tables, and the calculation of p, “*Journal of the Royal Statistical Society* 85, 87–94.

Fisher, R. A. (1922b) “The goodness of fit of regression formulae and the distribution of regression coefficients,” *Journal of the Royal Statistical Society,* 85, 597–612.

Fisher, R. A. (1924) “The conditions under which the x2 measures the discrepancy between observation and hypothesis,” *Journal of the Royal Statistical Society*, 87, 442-450.

Fisher, R. A. (1925) *Statistical Methods for Research Workers*, Oliver & Boyd, Edinburgh.

Fisher, R. A. (1935) “The logic of inductive inference,” *Journal of the Royal Statistical Society* 98, 39-54, discussion 55-82.

Fisher, R. A. (1937), “Professor Karl Pearson and the Method of Moments,” *Annals of Eugenics*, 7, 303-318.

Gossett, W. S. (1908) “The probable error of the mean,” *Biometrika*, 6, 1-25.

Hald, A. (1998) *A History of Mathematical Statistics from 1750 to 1930*, Wiley, NY.

Hotelling, H. (1930) “British statistics and statisticians today,” *Journal of the American Statistical Association*, 25, 186-90.

Neyman, J. (1934) “On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection,” *Journal of the Royal Statistical Society,* 97, 558-625.

Rao, C. R. (1992), “ R. A. Fisher: The Founder of Modern, *Statistical Science*, 7, 34-48.

RSS (Royal Statistical Society) (1934) *Annals of the Royal Statistical Society* 1834-1934, The Royal Statistical Society, London.

Savage, L . J. (1976) “On re-reading R. A. Fisher,” *Annals of Statistics*, 4, 441-500.

Spanos, A. (2008), “Statistics and Economics,” pp. 1057-1097 in *The New Palgrave Dictionary of Economics*, Second Edition. Eds. Steven N. Durlauf and Lawrence E. Blume, Palgrave Macmillan.

Tippet, L. H. C. (1931) *The Methods of Statistics*, Williams & Norgate, London.

[i] Fisher (1937), published a year after Pearson’s death, is particularly acerbic. In Fisher’s mind, Karl Pearson went after a young Indian statistician – totally unfairly – just the way he went after him in 1917.

[ii] Bowley received the Guy Medal in silver from the Royal Statistical Society (RSS) as early as 1895, and became a member of the Council of the RSS in 1898. He was awarded the society’s highest honor, the Guy Medal in gold, in 1935.

[iii] It is important to note that Bowley revised his textbook in statistics for the last time in 1937, and predictably, he missed the whole change of paradigms brought about by Fisher, Neyman and Pearson.

Spanos-2008[iv] In their centennial volume published in 1934, the RSS acknowledged the development of ‘mathematical statistics’, referring to Galton, Edgeworth, Karl Pearson, Yule and Bowley as the main pioneers, and listed the most important contributions in this sub-field which appeared in its Journal during the period 1909-33, but the three important papers by Fisher (1922a-b; 1924) are conspicuously absent from that list. The list itself is dominated by contributions in vital, commercial, financial and labour statistics (see RSS, 1934, pp. 208-23). There is a single reference to Egon Pearson.

This was first posted on 17, Feb. 2013 here.

**HAPPY BIRTHDAY R.A. FISHER!**

*The sixth meeting of our Phil Stat Forum*:*

**The Statistics Wars
and Their Casualties**

**18 February, 2021**

**TIME: 15:00-16:45 (London); 10-11:45 a.m. (New York, EST)**

**For information about the Phil Stat Wars forum and how to join, click on this link. **

**“****Testing with Models that Are Not True****“**

**ABSTRACT:**The starting point of my presentation is the apparently popular idea that in order to do hypothesis testing (and more generally frequentist model-based inference) we need to believe that the model is true, and the model assumptions need to be fulfilled. I will argue that this is a misconception. Models are, by their very nature, not “true” in reality. Mathematical results secure favourable characteristics of inference in an artificial model world in which the model assumptions are fulfilled. For using a model in reality we need to ask what happens if the model is violated in a “realistic” way. One key approach is to model a situation in which certain model assumptions of, e.g., the model-based test that we want to apply, are violated, in order to find out what happens then. This, somewhat inconveniently, depends strongly on what we assume, how the model assumptions are violated, whether we make an effort to check them, how we do that, and what alternative actions we take if we find them wanting. I will discuss what we know and what we can’t know regarding the appropriateness of the models that we “assume”, and how to interpret them appropriately, including new results on conditions for model assumption checking to work well, and on untestable assumptions.

**Christian Hennig **is a Professor in the Department of Statistical Sciences,“Paolo Fortunati”, at the University of Bologna since November 2018. Hennig’s research interests are cluster analysis, multivariate data analysis incl. classification and data visualisation, robust statistics, foundations and philosophy of statistics, statistical modelling and applications. He was Senior Lecturer in Statistics at UCL, London, 2005- 2018. Hennig studied Mathematics in Hamburg and Statistics in Dortmund. He was promoted at the University of Hamburg in 1997 and habilitated in 2005. In 2017 Hennig got his Italian habilitation. After having obtained his PhD, he worked as research assistant and lecturer at the University of Hamburg and ETH Zuerich.

Iqbal Shamsudheen, Christian Hennig(2020) Should we test the model assumptions before running a model-based test? (PDF)

Mayo D. (2018). “Section 4.8 All Models Are False” excerpt from Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, CUP. (pp. 296-301)

Christian Hennig’s **slides:** Testing In Models That Are Not True

Christian** Hennig Presentation**

**Link**to paste in browser: https://philstatwars.files.wordpress.com/2021/02/hennig_presentation.mp4

Christian **Hennig** **Discussion**

**Link**to paste in browser: https://philstatwars.files.wordpress.com/2021/02/hennig_discussion.mp4

**Mayo’s Memos: **Any info or events that arise that seem relevant to share with y’all before the meeting. Please check back closer to the meeting day.

If you only have time for reading his paper or the slides, I recommend working through his slides.

*****Meeting 14 of our the general Phil Stat series which began with the LSE Seminar PH500 on May 21

**Stephen Senn **Consultant Statistician

Edinburgh, Scotland

During an exchange on Twitter, Lawrence Lynn drew my attention to a paper by Laffey and Kavanagh[1]. This makes an interesting, useful and very depressing assessment of the situation as regards clinical trials in critical care. The authors make various claims that RCTs in this field are not useful as currently conducted. I don’t agree with the authors’ logic here although, perhaps, surprisingly, I consider that their conclusion might be true. I propose to discuss this here.

My ‘knowledge’ of critical care is non-existent. The closest I have come to being involved is membership of various data-safety monitoring boards for sepsis. This is not at all the same as having been involved in planning trials in sepsis let alone critical care more widely.

In my opinion, which the reader should check for themselves, the argument of Laffey and Kavanagh has the following steps.

- The proportion of positive trials in critical care is low. (One out of 20 large trials that concluded recently was positive.)
- Given that such large trials are preceded by extensive testing, the number of effective treatments being tested in these large trials ought to be at least 50%.
- “On a statistical basis, it is unlikely that most negative RCTs represent fair conclusions.” (p657).
- Some may say that this low success rate is due to patients numbers being too few.
- “But if the intervention is not matched to the patients being studied (eg, the biological target is absent) then the limitation is biological—not statistical—and insistence on greater numbers (or more stringent p-values) will have no effect.” (p657)
- The entry criteria for most trials rely on consensus definitions, which have high sensitivity, which is what you want for screening but do not have specificity, which is what you need for recruiting for clinical trials.
- Hence clinical trials include many patients the treatment could not reasonably be expected to help.
- This may cause a trial to be a ‘false negative’.
- Trials in which patients are screened based on mechanism will be more promising.
- Designers of RCTs and clinical trials groups should ensure that proposed RCTs in critical care identify subgroups of patients that match the specific intervention being tested.

I can agree with a number of these points but some of them are vary debatable. For example no evidence is given for 3. Instead a confusing argument, based on the ‘one successful trial in 20’ statistic is produced as follows.

For example, if each hypothesis tested had the same a priori probability that was as low as a coin toss, we would expect 50% of RCTs to be positive. The likelihood that 19 out of 20 consecutive such coin tosses would be negative is less than 1 in half a million (1 ÷ 2

^{19}). (P657)

This is wrong on several counts. First a minor point is that the probability should not be calculated this way. The probability they calculate applies to the case where 19 trials were run, all of which were failures. For 19 out of 20, we have to allow for the fact that there are 20 possible different selections of one trial from 20 and that we perhaps ought to include the more extreme case of 20 out of 20 anyway. The calculation should thus be (20+1) x ½^{19} x ½ = 21/2^{20} ≈ 10/2^{19}. This probability is thus about 10 times higher than the one they calculated and therefore about 1 in 50,000. Still, even so, this is a low probability.

*Figure 1 Expected percentage of positive trials as a function of power for various probabilities of the treatment being effective. *

The second problem, however, is more serious and is that the calculation takes no account of the power involved. In fact, if half of all null hypotheses are false and the threshold for significance is 2.5% (one sided). Only trials with 97.5% power have a 50% chance of being positive (or an assurance of 50% to use the technical term proposed by O’Hagan, Stevens and Campbell[2]). The situation is illustrated in Figure 1 for various probabilities of the treatment being effective. A combination of a low probability and a moderate power will yield few significant trials.

The third problem, however is that the argument given in 2 and 7 is weak and inconsistent. It supposes that research in early stages of development can succeed in identifying treatments that we can be reasonably confident are effective for some patients but have no idea who they are.

Quite apart from anything else, the problem with this line of arguments is that the ‘responders’ have to be a small proportion of those included for the power to drop to the sort of level that could explain these results.

*Figure 2 Power for a trial as a function of the proportion of cases capable of responding.*

For Figure 2, I used the following. I took some figures from a recent published protocol for “Low-Molecular Weight Heparin dosages in hospitalized patients with severe COVID-19 pneumonia and coagulopathy not requiring invasive mechanical ventilation (COVID-19 HD)”[3]. It is not a particularly close match for the indication discussed here but the COVID theme makes it topical and it serves to make a statistical point. The response is binary (clinical worsening or not) and for power calculations rates of 30% and 15% are compared yielding power in excess of 80% for 150 patients per arm. (I calculate about 88%.) I then supposed that only a percentage of the patients could benefit, the rest having the response probability of the control group. (See chapter 14 of *Statistical Issues in Drug **Development* for a discussion of this approach[4].)

The figure plots the power against the proportion of true responders recruited. When this is 100%, the power reaches 88%. However, even when only one in every two patients is capable of benefitting, the power is 38%. If we now turn to figure 1 we see that if half the treatment works well but in only half the patients (power of 38%), we still expect to see one in five trials being a ‘success’.

Of course, we can always postulate having even larger proportions of non-responders but at what point does it become ludicrous to claim one has a treatment: it works well when it works, only unfortunately there are almost no patients for whom it works?

The sad truth seems to be that the data suggest that effective treatments are not being found in this field. In a sense I agree with the authors’ conclusions. Success will be a matter of finding the right treatments to treat the right patients. I have no quarrel with that. However, my personal hunch is that it will have more to do with finding good drugs than finding the perfect patients. Of course, doing the former may, indeed, require identifying good druggable targets. Nevertheless, when and if the right drugs are found for the right patients, convincing proof will come from a randomised clinical trial. Drug development is easy compared to drug research and I have always worked in the former. Nevertheless, sometimes the unwelcome but true message from development is that research is even harder than supposed.

- Laffey, J.G. and B.P. Kavanagh,
*Negative trials in critical care: why most research is probably wrong.*Lancet Respir Med, 2018.**6**(9): p. 659-660. - O’Hagan, A., J.W. Stevens, and M.J. Campbell,
*Assurance in clinical trial design.*Pharmaceutical Statistics, 2005.**4**(3): p. 187-201. - Marietta, M., et al.,
*Randomised controlled trial comparing efficacy and safety of high versus low Low-Molecular Weight Heparin dosages in hospitalized patients with severe COVID-19 pneumonia and coagulopathy not requiring invasive mechanical ventilation (COVID-19 HD): a structured summary of a study protocol.*Trials, 2020.**21**(1): p. 574. - Senn, S.J.,
*Statistical Issues in Drug Development*. Statistics in Practice. 2007, Hoboken: Wiley. 498.