Author Archives: Mayo

Falsifying claims of trust in bat coronavirus research: mysteries of the mine (i)(ii)(iii)


Have you ever wondered if people read Master’s (or even Ph.D) theses a decade out? Whether or not you have, I think you will be intrigued to learn the story of why an obscure Master’s thesis from 2012, translated from Chinese in 2020, is now an integral key for unravelling the puzzle of the global controversy about the mechanism and origins of Covid-19. The Master’s thesis by a doctor, Li Xu [1], “The Analysis of 6 Patients with Severe Pneumonia Caused by Unknown Viruses”, describes 6 patients he helped to treat after they entered a hospital in 2012, one after the other, suffering from an atypical pneumonia from cleaning up after bats in an abandoned copper mine in China. Given the keen interest in finding the origin of the 2002–2003 severe acute respiratory syndrome (SARS) outbreak, Li wrote: “This makes the research of the bats in the mine where the six miners worked and later suffered from severe pneumonia caused by unknown virus a significant research topic”. He and the other doctors treating the mine cleaners hypothesized that their diseases were caused by a SARS-like coronavirus from having been in close proximity to the bats in the mine.

Jonathan Latham and Allison Wilson, scientists at the Bioscience Resource Project in Ithaca, decided Li Xu’s master’s thesis was important enough to translate from Chinese.

The evidence it contains has led us to reconsider everything we thought we knew about the origins of the COVID-19 pandemic. It has also led us to theorise a plausible route by which an apparently isolated disease outbreak in a mine in 2012 led to a global pandemic in 2019. (Latham & Wilson 2020)

They dubbed it the Mojiang Miner’s theory, because the mineshaft is located in Mojiang, in Yunnan province, China, 1000 miles from Wuhan. One of the mine cleaners from 2012, they speculate, might even have been patient zero of the current pandemic! But except for a brief sketch in note 5, I put that aside for this post and turn to the article that first sparked my interest in the Mojiang mine from the Times of London July 4, 2020. Its subtitle is: ‘The world’s closest known relative to the Covid-19 virus was found in 2013 by Chinese scientists in an abandoned mine where it was linked to deaths caused by a coronavirus-type respiratory illness’. For a long time, it was one of the only articles on the mysteries that came to light with this Master’s thesis: now the mine mysteries are mentioned in every critical discussion of Covid-19 origins.

I will likely write updates to this post (following with (i), (ii), etc in the title), and possibly follow-up posts. I started it weeks ago, and as I learned more, I decided it was too much for one post. Please share corrections in the comments.

1. The Mojiang Mine

The Times authors set the scene in their picturesque opening:

In the monsoon season of August 2012 a small team of scientists travelled to southwest China to investigate a new and mysteriously lethal illness. After driving through terraced tea plantations, they reached their destination: an abandoned copper mine where — in white hazmat suits and respirator masks — they ventured into the darkness. Instantly, they were struck by the stench. Overhead, bats roosted. Underfoot, rats and shrews scurried through thick layers of their droppings. It was a breeding ground for mutated micro-organisms and pathogens deadly to human beings. There was a reason to take extra care. Weeks earlier, six men who had entered the mine had been struck down by an illness that caused an uncontrollable pneumonia. Three of them died.

Today [back in July 2020], as deaths from the Covid-19 pandemic exceed half a million and economies totter, the bats’ repellent lair has taken on global significance.

Evidence seen by The Sunday Times suggests that a virus found in its depths — part of a faecal sample that was frozen and sent to a Chinese laboratory for analysis and storage — is the closest known match to the virus that causes Covid-19. (London Times)

The lab to which the sample was sent was the Wuhan Institute of Virology (WIV), home of a world renown site for bat coronavirus research, led by Shi Zhengli, often called “batwoman” in recognition of her years of bat coronavirus research.

The pneumonia the miners were suffering from was deemed sufficiently serious and unusual to immediately call in an acclaimed virologist, Professor Zhong Nanshan, who had led China’s efforts against the first SARS, referred to now as SARS-CoV-1 to distinguish it from SARS-CoV-2, the virus that causes Covid-19.

The Wuhan Institute of Virology (WIV) …was called in to test the four survivors. These produced a remarkable finding: while none had tested positive for Sars, all four had antibodies against another, unknown Sars-like coronavirus. (London Times)

The detailed description of their symptoms and disease progression in the Master’s thesis exactly echoes what we now see in those with Covid-19: high fevers, coughs, difficulty in breathing, and many of the treatments tried are also in sync with those used today, including one found to be one of the most successful: steroids.

Shi Zhengli was in the midst of researching bat caves around 200 miles from the Mojiang mine when her team was alerted to the miners. Given their main research focus is SARS-related coronaviruses, especially from bats, this was clearly of great interest to them. So they immediately turned to investigate the Mojiang Mine.

Over the next year, the scientists took faecal samples from 276 bats. The samples were stored at minus 80C in a special solution and dispatched to the Wuhan institute, where molecular studies and analysis were conducted. (London Times)

One, from a horshoe bat was of special interest because it was considered a brand new strain of a SARS-related virus. In a February 2016 article that Shi co-authored, the bat sample was named RaBtCoV/4991. Oddly, the paper, titled “Coexistence of multiple coronaviruses in several bat colonies in an abandoned mineshaft,” makes no mention of the reason the whole study took place: no mention of the miners or the fact that three died from pneumonia contracted from bats in the mine where the sample was found (Mystery #1). But what really raised an alarm for me is the fact that Shi, when asked about the miners (in an interview in the March-April 2020 issue of Scientific American, hereafter SA 2020) averred the miners were killed by a fungus and not a virus (Mystery #2).

Shi describes the mine as “a flying factory for new viruses” due to finding that often “multiple viral strains had infected a single animal.” While claiming it was a fungus that killed the minors, “she says it would have been only a matter of time before they caught the coronaviruses if the mine had not been promptly shut” (SA 2020). [I was struck to hear she thought they’d be directly infected, since from day 1 there’s often been an assumption that an intermediate species was needed.]

2. December 30, 2019 and the current pandemic

All that was pre-SARS-CoV-2. Away at a conference, Shi receives a call on Dec 30, 2019 that there’s a new coronavirus running rampant in Wuhan. Shi recalls the WIV director saying: “Drop whatever you are doing and deal with it now.” Her first thought as she makes her way back to Wuhan is ‘If coronaviruses were the culprit, she remembers thinking, ‘Could they have come from our lab?’”(SA 2020)

Her musing that the new virus might have come from her lab is, in one sense, unsurprising, given Wuhan contains three labs specializing in the study of bat coronaviruses, hers being the only one at biosafety level 4.

Shi breathed a sigh of relief when the results came back: none of the sequences matched those of the viruses her team had sampled from bat caves. ‘That really took a load off my mind,’ she says. ‘I had not slept a wink for days.’ …The genomic sequence of the virus, eventually named SARS-CoV-2, was 96 percent identical to that of a coronavirus the researchers had identified in horseshoe bats in Yunnan. Their results appeared in a paper published online on February 3 2020 in Nature.”

They dubbed it batcoronavirus RaTG13. In this 2020 article, co-authored by Shi, they write:

RaTG13 is the closest relative of [SARS-CoV-2] … The close phylogenetic relationship to RaTG13 provides evidence that [SARS-CoV-2] may have originated in bats.…On the basis of these findings, we propose that the disease could be transmitted by airborne transmission, although we cannot rule out other possible routes of transmission. (Zhou, Yang,…Shi, Nature 2020 article)

But wait, let’s go back. Why a sigh of relief that SARS-CoV-2 is only 96% identical to one of the bat samples? What about the numerous specimens taken from the Mojiang miners? How close are they to SARS-CoV-2? Frustratingly, to this day we’re never told. (Mystery #3) Moreover, while RaTG13 is described as being found in a cave in Yunnan, there is no mention of BtCoV/4991. Nor is there a citation of the initial 2016 article describing BtCoV/4991, even though it was co-authored by Shi (Mystery #4).

It turns out that RaBtCoV/4991 is identical to RaTG13! However, it required independent groups to sleuth this out [2]. (Mystery #5).

In fact, researchers in India and Austria have compared the partial genome of the mine sample that was published in the 2016 paper and found it is a 100% match with the same sequence for RaTG13. The same partial sequence for the mine sample is a 98.7% match with the Covid-19 virus. (London Times)

Why would the 2020 paper describing the closest relative to SARS-CoV-2 fail to mention that it is one and the same as the virus unearthed from the mine where 3 people died, and had already been cited in the 2016 paper, both with Shi as co-author? It’s one thing to rename it, but to fail to note this goes against typical publishing norms.

My initial attitude to the whole business of the origins of Covid-19 is that we’d probably never find out and anyway, the most important thing was trying to find treatments, prophylactics and vaccines, to understand the mechanism of Covid-19 and especially to prevent future pandemics. But it became clear that those goals hinge on the information that mysteriously was being hidden by the research groups being funded (by the U.S.) precisely to provide surveillance and monitoring about pandemics. Without being able to pinpoint all the individuals involved, I will just allude to the WIV research group from the time of the Mojiang Miners. (See also note 3.)

So what did the WIV research group do with RaBtCoV/4991 in the ensuing years between finding it in 2013 (2016 article) and the revelation in the early pandemic (2020 article)? According to them, not much: it was said to have been stowed away in a freezer and only taken out after cases of Covid-19 appeared in Wuhan at the end of December 2019.

Other scientists find the initial indifference about a new strain of the coronavirus hard to understand. Nikolai Petrovsky, professor of medicine at Flinders University in Adelaide, South Australia, said it was “simply not credible” that the WIV would have failed to carry out any further analysis on RaBtCoV/4991, especially as it had been linked to the deaths of three miners.

‘If you really thought you had a novel virus that had caused an outbreak that killed humans then there is nothing you wouldn’t do — given that was their whole reason for being [there] — to get to the bottom of that, even if that meant exhausting the sample and then going back to get more,’ he said. (London Times)

So it seems the WIV research group failed at “their whole reason for being” there, since the sample simply sat in a freezer for 6 years. Maybe if they had investigated RaBtCoV/4991 in relation to the virus the miners died of they might have prevented the pandemic the world is now struggling under.

Perhaps it was to downplay the fact that they fell down on the job that they opted for a name switch (from RaBtCoV/4991 in 2016 to RaTG13 in 2020), and lack of citation of the 2016 paper. Nothing more sinister is suggested or needed for my argument to go through. There is apparently no way to study the sample of RaTG13 further, since it it is said to have disintegrated upon being sequenced. (I will just call it RaTG13 in what follows.) 8 other SARS-related bat coronaviruses from the mine remain unpublished, to my knowledge.

3. More Mysteries 

Not only is it incredible that no work had been done on RaTG13 in the ensuing years between its discovery and the SARS-CoV-2 outbreak, it turns out to be false! Alina Chan, who describes herself as a molecular biologist turned detective (into origins of SARS-CoV-2), “pointed to an online database showing that the WIV had been genetically sequencing the mine virus in 2017 and 2018, analyzing it in a way they had done in the past with other viruses in preparation for running experiments with them.” (Boston Magazine) (Mystery #6)

But now that we know RaTG13 was sequenced and experimented upon in 2017 and 2018, we are still struck with the mysteries as why they had claimed only to sequence it after the world is hit with the Covid-19 pandemic, and why her close collaborator, Peter Daszak, who for years has funneled money from NIH grants to support the WIV bat coronavirus research, was reporting that the sample was ignored in a freezer for 6 years.[3] Only after the earlier sequencing was revealed did Daszak admit he was wrong. Likewise, it took considerable pressure on Nature before the appearance of a December 2020 addendum to the 2020 article where they admit the earlier experimentation. All very mysterious given that such experimentation would have been expected, since their charge was to investigate specimens with pandemic spillover potential, and since RaTG13 was described by them as having such potential in 2016. So what kind of research were they engaged in?

Some of the experiments — “gain of function” experiments — aimed to create new, more virulent, or more infectious strains of diseases in an effort to predict and therefore defend against threats that might conceivably arise in nature. The term gain of function is itself a euphemism; the Obama White House more accurately described this work as ‘experiments that may be reasonably anticipated to confer attributes to influenza, MERS, or SARS viruses such that the virus would have enhanced pathogenicity and/or transmissibility in mammals via the respiratory route.’ The virologists who carried out these experiments have accomplished amazing feats of genetic transmutation, no question, and there have been very few publicized accidents over the years. But there have been some. (NY Magazine)

There was a moratorium on such research in the U.S. in 2014, but funding was restored in 2017.  Money from U.S. agencies are funneled through Daszak’s organization, the EcoHealth Alliance, to the WIV research team. [The latest award was cut in April 2020, then restored in August 2020.]

Those engaged in such research aver that it is necessary to provide disease surveillance systems to alert us if viruses with pandemic potential are making the jump to humans. Maybe so. But the 2016 paper hid the main details that might have been of use for this. The question isn’t whether this kind of gain of function research could theoretically be useful, but rather whether a specific research group, here, WIV-EcoHealth research, has shown itself to be committed to the transparent behavior necessary to warrant support. It has not.

4. Falsifying the hypothesis of trusted research

What we have is strong, independent pieces of evidence to falsify the group’s claim to good faith commitments to responsibly conduct such research, or even communicate honestly what is known. Were they reliable partners in pandemic research, in the face of the real pandemic we are suffering, they would have bent over backwards to supply explanations for the conflicting admissions, rather than add more obfuscation. Note that nothing more is required to ground my inference. It’s not a matter of showing a lab error or accidental leak. The evidence that falsifies their being good faith stewards to whom we may look to inform, surveil, and help prevent future pandemics is ample. The onus would be on the WIV-EcoHealth research group to come forward with explanations–something one would expect them to be keen to do in order to support the continued research into bat coronaviruses.

Here’s what we know about the value of the WIV-EcoHealth research when it comes to preventing and informing about actual pandemics. We find out that deaths which it turns out they knew from the start were due to a virus–“We suspected that the patients had been infected by an unknown virus” (2020 Addendum)–are not broadcast and in fact there’s a news blackout about the case. A published paper on bat viruses found (2016) does not mention the deaths. Then when a real honest-to-God pandemic from a SARS-like coronavirus comes to light in the city that does major research in the area, the virus is sequenced but given a new name with no mention of the earlier name, let alone the connection with the miners. No it’s worse, there is confusion or prevarication amongst the researchers as to when it has been sequenced, when it has crumbled, and deliberate attempts to conceal records, including taking the central WIV data base off line, preventing further checks. In each case there are denials that only later, after revelations by independent sleuths, result in about-faces. But having declared one thing, it doesn’t ameliorate the situation when the opposite is conceded only in the face of undeniable demonstrations of its falsity. We are still left with conflicting declarations and no explanation for the earlier, opposite stance.

These strange and unscientific actions have obscured the origins of the closest viral relatives of SARS-CoV-2, viruses that are suspected to have caused a COVID-like illness in 2012 and which may be key to understanding not just the origin of the COVID-19 pandemic but the future behaviour of SARS-CoV-2.” (Latham and Wilson)

If it weren’t for the Master’s thesis, the admissions that have come forward might never have occurred.

A co-author of an expert guide to investigating outbreak origins, Dr. Filippa Lentzos, said,

We also need to take a hard look in the mirror. It is our own virologists, funders and publishers who are driving and endorsing the practice of actively hunting for viruses and the high-risk research of deliberately making viruses more dangerous to humans. We need to be more open about the heavily vested interests of some of the scientists given prominent platforms to make claims about the pandemic’s origins. [Chan and Ridley 2021]

The WIV research group has gained the knowledge of how to make a virus more transmissible.[4],[5] One of the existing patents, I read, are for methods that could result in turning a SARS-related coronavirus into SARS-CoV-2. That knowledge hasn’t helped the world control SARS-CoV-2. Good faith sharing of the earlier research would at least have shown the commitment to transparency and ethical research norms. When it comes to the question of the trust that is necessary to endorse future research, the known facts here are actually more troubling than previous cases of lab leaks that were openly admitted and followed by the adoption of improved methods and clear oversights. If this is how a research group behaves when there’s no association between the lab and the pandemic, how much worse can we expect in the case of an actual lab error?

Share your thoughts, links and corrections in the comments.

Mar 4, 2021: I’m adding a new note [6] on the W.H.O. investigation.


[1] His supervisor, Professor Qian Chuanyun, worked in the emergency department that treated the men. Other details were found in a PhD thesis by a student of the director of the Chinese Centre for Disease Control and Prevention. The full Master’s thesis can also be read in Latham and Wilson 2020 (No paywall).

[2] Details were filled in by independent sleuths throughout the world and “an anonymous Twitter user known as ‘The Seeker’ and a group going by the name of DRASTIC” (Ridley and Chan (2021)). One of the first articles to delineate a possible lab leak is Sirotkin, K. & Sirotkin, D. (2020)

Rossana Segreto et al. (2020), who identified the identity of RaTG13 and 4991, write:

In late July 2020, Zhengli Shi, the leading CoV researcher from WIV, in an email interview asserted the renaming of the RaTG13 sample and unexpectedly declared that the full sequencing of RaTG13 has been carried out as far back as in 2018 and not after the SARS‐CoV‐2 outbreak, as stated in [her own joint article in February of 2020].

I make no claims about having identified who first found what, as this is not my research area, but if you have an item you think I should reference, I’ll be glad to look at it. Use the comments. Here’s one sent in a comment yesterday by one of the authors:

Rahalkar, M.C.; Bahulikar, R.A. Understanding the Origin of ‘BatCoVRaTG13’, a Virus Closest to SARS-CoV-2. Preprints 2020

[3] Daszak runs a non-government group called the EcoHealth Alliance which disburses funds for research into coronaviruses and other pathogens from U.S. agencies to labs throughout the world. A portion of these grants go to his outfit and he’s one of the most vocal supporters for their continuation. We might even call the research group the WIV-EcoHealth Alliance research group. Understandably many scientists find conflicts of interest in having Daszak leading enquiries into possible Covid lab leak. But he continues to be a key player. Link:

The worst fears of conflicts of Interest came true upon reading the recent reports on Covid origins. See Mallapaty, S. et al. (2021).

“To find genuinely critical analysis of COVID-19 origin theories one has to go to Twitter, blog posts, and preprint servers. The malaise runs deep when even scientists start to complain that they don’t trust science.”(Latham and Wilson)

[4] Another important name at the cutting edge of gain of function work on bat coronaviruses is Ralph Baric (from UNC). He was perhaps the first to show how to transfer viruses from one species to another. “Not only that, but they’d figured out how to perform their assembly seamlessly, without any signs of human handiwork. Nobody would know if the virus had been fabricated in a laboratory or grown in nature. Baric called this the “no-see’m method.” (New York Magazine).

An eye-opening, excellent (< 10 min) video from leading coronavirologists who know directly of the gain of function experiments. English subtitles. Link:

[5] Latham and Wilson theorize that the initial virus evolved in the miners themselves during the months-long infection suffered by some of the miners, mimicking the process of serial passaging. This

is a standard virological technique for adapting viruses to new species, tissues, or cell types. It is normally done by deliberately infecting a new host species or a new host cell type with a high dose of virus. This initial viral infection would ordinarily die out because the host’s immune system vanquishes the ill-adapted virus. But, in passaging, before it does die out a sample is extracted and transferred to a new identical tissue, where viral infection restarts. Done iteratively, this technique … intensively selects for viruses adapted to the new host or cell type. ….We propose that, when frozen samples derived from the miners were eventually opened in the Wuhan lab they were already highly adapted to humans to an extent possibly not anticipated by the researchers. One small mistake or mechanical breakdown could have led directly to the first human infection in late 2019.

(However, there’s no knowledge that the miners transmitted their virus to others around them)

Latham and Wilson’s theory shares analogies with the viral evolution seen in immunocompromised patients

The same principle underlies the worry about extending the time lag between doses of vaccines. There’s a risk that subimmune individuals with enough antibodies to slow the virus, and perhaps remain asymptomatic, but not enough to wipe it out, could harbor viral variants.

[6] This post does not continue into the recent investigation of Covid origins (organized, but not necessarily endorsed, by W.H.O.), but it’s clear that the facts discussed here are at the heart of the charges that the inquiry was biased and sorely inadequate. (Problems with alternative zoonotic and frozen food hypotheses add much fuel to the fire.) As the investigation was incapable of uncovering a lab accident or leak, it cannot rule out that hypothesis with any kind of severity. See my comment from March 4 for links to articles out today, and a letter from a group of scientists calling for a brand new, international investigation.

Acknowledgement: I thank Jean Miller for many useful comments, suggestions and corrections on earlier drafts of this post.


Arbuthnott, G., Calvert, J.,  & Sherwell, P. (2020). Revealed: Seven year coronavirus trail from mine deaths to a Wuhan lab. The London Times, UK, (July 4, 220 The Sunday Times Insight Investigation).

Baker, N. (2021) The Lab Leak Hypothesis, New York Magazine (January 4, 2021).

Butler, C., Canard, B., Cap, H., et al. (2021).  OPEN LETTER: Call for a Full and Unrestricted International Forensic Investigation into the Origins of COVID-19. Signed by 26 scientists, social scientists and science communicators. March 4, 2021.

Chan, A. Tweetorials on Covid-19 origins:

Chan, A. & Ridley, M. (2021). The World Needs a Real Investigation Into the Origins of Covid-19The Wall Street Journal  (January 15, 2021).

Ge, XY., Wang, N., Zhang, W. …Shi, Z-L. (2016). Coexistence of multiple coronaviruses in several bat colonies in an abandoned mineshaft. Virol. Sin. 31, 31–40 (2016). Link:

Jacobsen, R. (2020). Could COVID-19 Have Escaped from a Lab? Boston Magazine (September 9, 2020).

Latham, J. & Wilson, A. (2020). A proposed Origin for SARS-CoV-2 and the COVID-19 PandemicIndependent Science News for Food and Agriculture website (July 15, 2020).

Mallapaty, S., Maxmen A.,  &  Callaway, E. (2021). “’Major stones unturned’: COVID origin search must continue after WHO report, say scientists”, Nature (February 10, 2021).

[Shi interview, SA] Qui, J. (2020) (June 1, 2020). How China’s ‘Bat Woman’ Hunted Down Viruses from SARS to the New Coronavirus. Scientific American.

Ridley, M. & Chan, A. (2021). Did the Covid-19 virus really escape from a Wuhan lab?. The Telegraph (UK) (February 6, 2021).

Segreto R. & Deigin, Y. (2020). The genetic structure of SARS‐CoV‐2 does not rule out a laboratory origin. Bioessays (November 17, 2020). Link:

Sirotkin, K. & Sirotkin, D. (2020). Might SARS‐CoV‐2 Have Arisen via Serial Passage through an Animal Host or Cell Culture?

Xu, L. (2013). The Analysis of Six Patients With Severe Pneumonia Caused By Unknown Viruses (Master’s Thesis). School of Clinical Medicine, Kun Ming Medical University. Translation into English commissioned by Independent Science News, completed June 23, 2020. Link:

Zhou, P., Yang, XL., Wang, XG. …Shi, Z.(2020).A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273. (February 3, 2020). Link:

Zhou, P., Yang, XL., Wang, XG. Shi, Z.(2020) Addendum: A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 588, E6 (2020). (December 3, 2020). Link:

Other relevant resources

A fascinating video including come of the key bat coronavirus researchers (short 9 min with English subtitles)






Categories: covid-19, falsification, science communication | 7 Comments

Aris Spanos: Modeling vs. Inference in Frequentist Statistics (guest post)


Aris Spanos
Wilson Schmidt Professor of Economics
Department of Economics
Virginia Tech

The following guest post (link to PDF) was written in response to C. Hennig’s presentation at our Phil Stat Wars Forum on 18 February, 2021: “Testing With Models That Are Not True”.

Categories: misspecification testing, Spanos, stat wars and their casualties | 11 Comments

R.A. Fisher: “Statistical methods and Scientific Induction” with replies by Neyman and E.S. Pearson

In Recognition of Fisher’s birthday (Feb 17), I reblog his contribution to the “Triad”–an exchange between  Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. My favorite is the reply by E.S. Pearson, but all are chock full of gems for different reasons. They are each very short and are worth your rereading. Continue reading

Categories: E.S. Pearson, Fisher, Neyman, phil/history of stat | Leave a comment

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)



This is a belated birthday post for R.A. Fisher (17 February, 1890-29 July, 1962)–it’s a guest post from earlier on this blog by Aris Spanos that has gotten the highest number of hits over the years. 

Happy belated birthday to R.A. Fisher!

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998) Continue reading

Categories: Fisher, phil/history of stat, Spanos | 2 Comments

Reminder: February 18 “Testing with models that are not true” (Christian Hennig)

The sixth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

18 February, 2021

TIME: 15:00-16:45 (London); 10-11:45 a.m. (New York, EST)

For information about the Phil Stat Wars forum and how to join, click on this link. 


Testing with Models that Are Not True Continue reading

Categories: Phil Stat Forum | Leave a comment

S. Senn: The Power of Negative Thinking (guest post)



Stephen Senn
Consultant Statistician
Edinburgh, Scotland

Sepsis sceptic

During an exchange on Twitter, Lawrence Lynn drew my attention to a paper by Laffey and Kavanagh[1]. This makes an interesting, useful and very depressing assessment of the situation as regards clinical trials in critical care. The authors make various claims that RCTs in this field are not useful as currently conducted. I don’t agree with the authors’ logic here although, perhaps, surprisingly, I consider that their conclusion might be true. I propose to discuss this here. Continue reading

Categories: power, randomization | 5 Comments

February 18 “Testing with models that are not true” (Christian Hennig)

The sixth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

18 February, 2021

TIME: 15:00-16:45 (London); 10-11:45 a.m. (New York, EST)

For information about the Phil Stat Wars forum and how to join, click on this link. 


Testing with Models that Are Not True

Christian Hennig

Continue reading

Categories: Phil Stat Forum | 1 Comment

The Covid-19 Mask Wars : Hi-Fi Mask Asks


Effective yesterday, February 1, it is a violation of federal law not to wear a mask on a public conveyance or in a transit hub, including taxis, trains and commercial trucks (The 11 page mandate is here.)

The “mask wars” are a major source of disagreement and politicizing science during the current pandemic, but my interest here is not of clashes between pro-and anti-mask culture warriors, but the clashing recommendations among science policy officials and scientists wearing their policy hats. A recent Washington Post editorial by Joseph Allen, (director of the Healthy Buildings program at the Harvard T.H. Chan School of Public Health), declares “Everyone should be wearing N95 masks now”. In his view: Continue reading

Categories: covid-19 | 27 Comments

January 28 Phil Stat Forum “How Can We Improve Replicability?” (Alexander Bird)

The fifth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

28 January, 2021

TIME: 15:00-16:45 (London); 10-11:45 a.m. (New York, EST)


“How can we improve replicability?”

Alexander Bird 

Continue reading

Categories: Phil Stat Forum | 1 Comment

S. Senn: “Beta testing”: The Pfizer/BioNTech statistical analysis of their Covid-19 vaccine trial (guest post)


Stephen Senn

Consultant Statistician
Edinburgh, Scotland

The usual warning

Although I have researched on clinical trial design for many years, prior to the COVID-19 epidemic I had had nothing to do with vaccines. The only object of these amateur musings is to amuse amateurs by raising some issues I have pondered and found interesting. Continue reading

Categories: covid-19, PhilStat/Med, S. Senn | 16 Comments

Why hasn’t the ASA Board revealed the recommendations of its new task force on statistical significance and replicability?

something’s not revealed

A little over a year ago, the board of the American Statistical Association (ASA) appointed a new Task Force on Statistical Significance and Replicability (under then president, Karen Kafadar), to provide it with recommendations. [Its members are here (i).] You might remember my blogpost at the time, “Les Stats C’est Moi”. The Task Force worked quickly, despite the pandemic, giving its recommendations to the ASA Board early, in time for the Joint Statistical Meetings at the end of July 2020. But the ASA hasn’t revealed the Task Force’s recommendations, and I just learned yesterday that it has no plans to do so*. A panel session I was in at the JSM, (P-values and ‘Statistical Significance’: Deconstructing the Arguments), grew out of this episode, and papers from the proceedings are now out. The introduction to my contribution gives you the background to my question, while revealing one of the recommendations (I only know of 2). Continue reading

Categories: 2016 ASA Statement on P-values, JSM 2020, replication crisis, statistical significance tests, straw person fallacy | 7 Comments

Next Phil Stat Forum: January 7: D. Mayo: Putting the Brakes on the Breakthrough (or “How I used simple logic to uncover a flaw in …..statistical foundations”)

The fourth meeting of our New Phil Stat Forum*:

The Statistics Wars
and Their Casualties

January 7, 16:00 – 17:30  (London time)
11 am-12:30 pm (New York, ET)**
**note time modification and date change

Putting the Brakes on the Breakthrough,

or “How I used simple logic to uncover a flaw in a controversial 60-year old ‘theorem’ in statistical foundations” 

Deborah G. Mayo


Continue reading
Categories: Birnbaum, Birnbaum Brakes, Likelihood Principle | 5 Comments

Midnight With Birnbaum (Remote, Virtual Happy New Year 2020)!

 Unlike in the past 9 years since I’ve been blogging, I can’t revisit that spot in the road  outside the Elbar Room, looking to get into a strange-looking taxi, to head to “Midnight With Birnbaum”.  Because of the pandemic, I refuse to go out this New Year’s Eve, so the best I can hope for is a zoom link that will take me to a hypothetical party with him. (The pic on the left is the only blurry image I have of the club I’m taken to.) I just keep watching my email, to see if a zoom link arrives. My book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (STINT 2018)  doesn’t rehearse the argument from my Birnbaum article, but there’s much in it that I’d like to discuss with him. The (Strong) Likelihood Principle–whether or not it is named–remains at the heart of many of the criticisms of Neyman-Pearson (N-P) statistics and statistical significance testing in general. Let’s hope that in 2021 the American Statistical Association 9ASA) will finally reveal the recommendations from the ASA Task Force on Statistical Significance and Replicability that the ASA Board itself created one year ago. They completed their recommendations early–back at the end of July 2020–but no response from the ASA has been forthcoming (to my knowledge). As Birnbaum insisted, the “confidence concept” is the “one rock in a shifting scene” of statistical foundations, insofar as there’s interest in controlling the frequency of erroneous interpretations of data. (See my rejoinder.) Birnbaum bemoaned the lack of an explicit evidential interpretation of N-P methods.  I purport to give one in SIST 2018. Maybe it will come to fruition in 2021? Anyway, I was just sent an internet link–but it’s not zoom, not Skype, not Webinex, or anything I’ve ever seen before….no time to describe it now, but I’m recording and the rest of the transcript is live; this year there are some new, relevant additions.  Happy New Year! Continue reading

Categories: Birnbaum Brakes, strong likelihood principle | Tags: , , , | Leave a comment

A Perfect Time to Binge Read the (Strong) Likelihood Principle


An essential component of inference based on familiar frequentist notions: p-values, significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory, or my preferred error statistics, as we get error probabilities from the sampling distribution). This feature results in violations of a principle known as the strong likelihood principle (SLP). To state the SLP roughly, it asserts that all the evidential import in the data (for parametric inference within a model) resides in the likelihoods. If accepted, it would render error probabilities irrelevant post data. Continue reading

Categories: Birnbaum, Birnbaum Brakes, law of likelihood | 3 Comments

Cox’s (1958) Chestnut: You should not get credit (or blame) for something you didn’t do


Just as you keep up your physical exercise during the pandemic (sure), you want to keep up with mental gymnastics too. With that goal in mind, and given we’re just a few days from the New Year (and given especially my promised presentation for January 7), here’s one of the two simple examples that will limber you up for the puzzle to ensue. It’s the famous weighing machine example from Sir David Cox (1958)[1]. It is one of the “chestnuts” in the museum exhibits of “chestnuts and howlers” in Excursion 3 (Tour II) of my book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (SIST, 2018). So block everything else out for a few minutes and consider 3 pages from SIST …  Continue reading

Categories: Birnbaum, Statistical Inference as Severe Testing, strong likelihood principle | 4 Comments

Next Phil Stat Forum: January 7: D. Mayo: Putting the Brakes on the Breakthrough (or “How I used simple logic to uncover a flaw in …..statistical foundations”)

The fourth meeting of our New Phil Stat Forum*:

The Statistics Wars
and Their Casualties

January 7, 16:00 – 17:30  (London time)
11 am-12:30 pm (New York, ET)**
**note time modification and date change

Putting the Brakes on the Breakthrough,

or “How I used simple logic to uncover a flaw in a controversial 60-year old ‘theorem’ in statistical foundations” 

Deborah G. Mayo



ABSTRACT: An essential component of inference based on familiar frequentist (error statistical) notions p-values, statistical significance and confidence levels, is the relevant sampling distribution (hence the term sampling theory). This results in violations of a principle known as the strong likelihood principle (SLP), or just the likelihood principle (LP), which says, in effect, that outcomes other than those observed are irrelevant for inferences within a statistical model. Now Allan Birnbaum was a frequentist (error statistician), but he found himself in a predicament: He seemed to have shown that the LP follows from uncontroversial frequentist principles! Bayesians, such as Savage, heralded his result as a “breakthrough in statistics”! But there’s a flaw in the “proof”, and that’s what I aim to show in my presentation by means of 3 simple examples:

  • Example 1: Trying and Trying Again
  • Example 2: Two instruments with different precisions
    (you shouldn’t get credit/blame for something you didn’t do)
  • The Breakthrough: Don’t Birnbaumize that data my friend

As in the last 9 years, I will post an imaginary dialogue with Allan Birnbaum at the stroke of midnight, New Year’s Eve, and this will be relevant for the talk.

The Phil Stat Forum schedule is at the blog 

Categories: Birnbaum, Birnbaum Brakes, Likelihood Principle | 1 Comment

The Statistics Debate (NISS) in Transcript Form

I constructed, together with Jean Miller, a transcript from the October 15 Statistics Debate (with me, J. Berger and D. Trafimow and moderator D. Jeske), sponsored by NISS. It’s so much easier to access the material this way rather than listening to it on the video. Using this link, you can see the words and hear the video at the same time, as well as pause and jump around. Below, I’ve pasted our responses to Question #1. Have fun and please share your comments.

Dan Jeske: [QUESTION 1] Given the issues surrounding the misuses and abuse of p values, do you think they should continue to be used or not? Why or why not?

Deborah Mayo  03:46

Thank you so much. And thank you for inviting me, I’m very pleased to be here. Yes, I say we should continue to use p values and statistical significance tests. Uses of p values are really just a piece in a rich set of tools intended to assess and control the probabilities of misleading interpretations of data, i.e., error probabilities. They’re the first line of defense against being fooled by randomness as Yoav Benjamini puts it. If even larger, or more extreme effects than you observed are frequently brought about by chance variability alone, i.e., p value not small, clearly you don’t have evidence of incompatibility with the mere chance hypothesis. It’s very straightforward reasoning. Even those who criticize p values you’ll notice will employ them, at least if they care to check their assumptions of their models. And this includes well known Bayesian such as George Box, Andrew Gelman, and Jim Berger. Critics of p values often allege it’s too easy to obtain small p values. But notice the whole replication crisis is about how difficult it is to get small p values with preregistered hypotheses. This shows the problem isn’t p values, but those selection effects and data dredging. However, the same data drenched hypothesis can occur in other methods, likelihood ratios, Bayes factors, Bayesian updating, except that now we lose the direct grounds to criticize inferences for flouting error statistical control. The introduction of prior probabilities, which may also be data dependent, offers further researcher flexibility. Those who reject p values are saying we should reject the method because it can be used badly. And that’s a bad argument. We should reject misuses of p values. But there’s a danger of blindly substituting alternative tools that throw out the error control baby with the bad statistics bathwater.

Dan Jeske  05:58

Thank you, Deborah, Jim, would you like to comment on Deborah’s remarks and offer your own?

Jim Berger  06:06

Okay, yes. Well, I certainly agree with much of what Deborah said, after all, a p value is simply a statistic. And it’s an interesting statistic that does have many legitimate uses, when properly calibrated. And Deborah mentioned one such case is model checking where Bayesians freely use some version of p values for model checking. You know, on the other hand, that one interprets this question, should they continue to be used in the same way that they’re used today? Then my, my answer would be somewhat different. I think p values are commonly misinterpreted today, especially when when they’re used to test a sharp null hypothesis. For instance, of a p value of .05, is commonly interpreted as by many is indicating the evidence is 20 to one in favor of the alternative hypothesis. And that just that just isn’t true. You can show for instance, that if I’m testing with a normal mean of zero versus nonzero, the odds of the alternative hypothesis to the null hypothesis can at most be seven to one. And that’s just a probabilistic fact, doesn’t involve priors or anything. It just is, is a is an answer covering all probability. And so that 20 to one cannot be if it’s, if it’s, if a p value of .05 is interpreted as 20 to one, it’s just, it’s just being interpreted wrongly, and the wrong conclusions are being reached. I’m reminded of an interesting paper that was published some time ago now, which was reporting on a survey that was designed to determine whether clinical practitioners understood what a p value was. The results of the survey were published and were not surprising. Most clinical practitioners interpreted the p value as something like a p value of .05 as something like 20 to one odds against the null hypothesis, which again, is incorrect. The fascinating aspect of the paper is that the authors also got it wrong. Deborah pointed out that the p value is the probability under the null hypothesis of the data or something more extreme. The author’s stated that the correct answer was the p value is the probability of the data under the null hypothesis, they forgot the more extreme. So, I love this article, because the scientists who set out to show that their colleagues did not understand the meaning of p values themselves did not understand the meaning of p values. 

Dan Jeske  08:42


David Trafimow  08:44

Okay. Yeah, Um, like Deborah and Jim, I’m delighted to be here. Thanks for the invitation. Um and I partly agree with what both Deborah and Jim said, um, it’s certainly true that people misuse p values. So, I agree with that. However, I think p values are more problematic than the other speakers have mentioned. And here’s here’s the problem for me. We keep talking about p values relative to hypotheses, but that’s not really true. P values are relative to hypotheses plus additional assumptions. So, if we call, if we use the term model to describe the null hypothesis, plus additional assumptions, then p values are based on models, not on hypotheses, or only partly on hypotheses. Now, here’s the thing. What are these other assumptions? An example would be random selection from the population, an assumption that is not true in any one of the thousands of papers I’ve read in psychology. And there are other assumptions, a lack of systematic error, linearity, and then we can go on and on, people have even published taxonomies of the assumptions because there are so many of them. See, it’s tantamount to impossible that the model is correct, which means that the model is wrong. And so, what you’re in essence doing then, is you’re using the p value to index evidence against a model that is already known to be wrong. And even the part about indexing evidence is questionable, but I’ll go with it for the moment. But the point is the model was wrong. And so, there’s no point in indexing evidence against it. So given that, I don’t really see that there’s any use for them. There’s, p values don’t tell you how close the model is to being right. P values don’t tell you how valuable the model is. P values pretty much don’t tell you anything that researchers might want to know, unless you misuse them. Anytime you draw a conclusion from a p value, you are guilty of misuse. So, I think the misuse problem is much more subtle than is perhaps obvious at firsthand. So, that’s really all I have to say at the moment.

Dan Jeske  11:28

Thank you. Jim, would you like to follow up?

Jim Berger  11:32

Yes,  so, so,  I certainly agree that that assumptions are often made that are wrong. I won’t say that that’s always the case. I mean, I know many scientific disciplines where I think they do a pretty good job, and work with high energy physicists, and they do a pretty good job of checking their assumptions. Excellent job. And they use p values. It’s something to watch out for. But any statistical analysis, you know, can can run into this problem. If the assumptions are wrong, it’s, it’s going to be wrong.

Dan Jeske  12:09


Deborah Mayo  12:11

Okay. Well, Jim thinks that we should evaluate the p value by looking at the Bayes factor when he does, and he finds that they’re exaggerating, but we really shouldn’t expect agreement on numbers from methods that are evaluating different things. This is like supposing that if we switch from a height to a weight standard, that if we use six feet with the height, we should now require six stone, to use an example from Stephen Senn. On David, I think he’s wrong about the worrying assumptions with using the p value since they have the least assumptions of any other method, which is why people and why even Bayesians will say we need to apply them when we need to test our assumptions. And it’s something that we can do, especially with randomized controlled trials, to get the assumptions to work. The idea that we have to misinterpret p values to have them be relevant, only rests on supposing that we need something other than what the p value provides.

Dan Jeske  13:19

David, would you like to give some final thoughts on this question?

David Trafimow  13:23

Sure. As it is, as far as Jim’s point, and Deborah’s point that we can do things to make the assumptions less wrong. The problem is the model is wrong or it isn’t wrong. Now if the model is close, that doesn’t justify the p value because the p value doesn’t give the closeness of the model. And that’s the, that’s the problem. We’re not we’re not using, for example, a sample mean, to estimate a population mean, in which case, yeah, you wouldn’t expect the sample mean to be exactly right. If it’s close, it’s still useful. The problem is that p values don’t tell you p values aren’t being used to estimate anything. So, if you’re not estimating anything, then you’re stuck with either correct or incorrect, and the answer is always incorrect that, you know, this is especially true in psychology, but I suspect it might even be true in physics. I’m not the physicist that Jim is. So, I can’t say that for sure.

Dan Jeske  14:35

Jim, would you like to offer Final Thoughts?

Jim Berger  14:37

Let me comment on Deborah’s comment about Bayes factors are just a different scale of measurement. My my point was that it seems like people invariably think of p values as something like odds or probability of the null hypothesis, if that’s the way they’re thinking, because that’s the way their minds reason. I believe we should provide them with odds. And so, I try to convert p values into odds or Bayes factors, because I think that’s much more readily understandable by people.

Dan Jeske  15:11

Deborah, you have the final word on this question.

Deborah Mayo  15:13

I do think that we need a proper philosophy of statistics to interpret p values. But I think also that what’s missing in the reject p values movement is a major reason for calling in statistics in science is to give us tools to inquire whether an observed phenomena can be a real effect, or just noise in the data and the P values have intrinsic properties for this task, if used properly, other methods don’t, and to reject them is to jeopardize this important role. As Fisher emphasizes, we need randomized control trials precisely to ensure the validity of statistical significance tests, to reject them because they don’t give us posterior probabilities is illicit. In fact, I think that those claims that we want such posteriors need to show for any way we can actually get them, why. 

You can watch the debate at the NISS website or in this blog post.

You can find the complete audio transcript at this LINK:
[There is a play button at the bottom of the page that allows you to start and stop the recording. You can move about in the transcript/recording by using the pause button and moving the cursor to another place in the dialog and then clicking the play button to hear the recording from that point. (The recording is synced to the cursor.)]

Categories: D. Jeske, D. Trafimow, J. Berger, NISS, statistics debate | 1 Comment

Is it impossible to commit Type I errors in statistical significance tests? (i)


While immersed in our fast-paced, remote, NISS debate (October 15) with J. Berger and D. Trafimow, I didn’t immediately catch all that was said by my co-debaters (I will shortly post a transcript). We had all opted for no practice. But  looking over the transcript, I was surprised that David Trafimow was indeed saying the answer to the question in my title is yes. Here are some excerpts from his remarks: Continue reading

Categories: D. Trafimow, J. Berger, National Institute of Statistical Sciences (NISS), Testing Assumptions | 29 Comments

S. Senn: “A Vaccine Trial from A to Z” with a Postscript (guest post)


Stephen Senn
Consultant Statistician
Edinburgh, Scotland

Alpha and Omega (or maybe just Beta)

Well actually, not from A to Z but from AZ. That is to say, the trial I shall consider is the placebo- controlled trial of the Oxford University vaccine for COVID-19 currently being run by AstraZeneca (AZ) under protocol AZD1222 – D8110C00001 and which I considered in a previous blog, Heard Immunity. A summary of the design  features is given in Table 1. The purpose of this blog is to look a little deeper at features of the trial and the way I am going to do so is with the help of geometric representations of the sample space, that is to say the possible results the trial could produce. However, the reader is warned that I am only an amateur in all this. The true professionals are the statisticians at AZ who, together with their life science colleagues in AZ and Oxford, designed the trial. Continue reading

Categories: covid-19, RCTs, Stephen Senn | 14 Comments

Phil Stat Forum: November 19: Stephen Senn, “Randomisation and Control in the Age of Coronavirus?”

For information about the Phil Stat Wars forum and how to join, see this post and this pdf. 

Continue reading

Categories: Error Statistics, randomization | Leave a comment

Blog at