April 22 “How an information metric could bring truce to the statistics wars” (Daniele Fanelli)

The eighth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

22 April 2021

TIME: 15:00-16:45 (London); 10:00-11:45 (New York, EST)

For information about the Phil Stat Wars forum and how to join, click on this link.

“How an information metric could bring truce to the statistics wars

Daniele Fanelli

Abstract: Both sides of debates on P-values, reproducibility, and other meta-scientific issues are entrenched in traditional methodological assumptions. For example, they often implicitly endorse rigid dichotomies (e.g. published findings are either “true” or “false”, replications either “succeed” or “fail”, research practices are either “good” or “bad”), or make simplifying and monistic assumptions about the nature of research (e.g. publication bias is generally a problem, all results should replicate, data should always be shared).

Thinking about knowledge in terms of information may clear a common ground on which all sides can meet, leaving behind partisan methodological assumptions. In particular, I will argue that a metric of knowledge that I call “K” helps examine research problems in a more genuinely “meta-“ scientific way, giving rise to a methodology that is distinct, more general, and yet compatible with multiple statistical philosophies and methodological traditions.

This talk will present statistical, philosophical and scientific arguments in favour of K, and will give a few examples of its practical applications.

Daniele Fanelli is a London School of Economics Fellow in Quantitative Methodology, Department of Methodology, London School of Economics and Political Science. He graduated in Natural Sciences, earned a PhD in Behavioural Ecology and trained as a science communicator, before devoting his postdoctoral career to studying the nature of science itself – a field increasingly known as meta-science or meta-research. He has been primarily interested in assessing and explaining the prevalence, causes and remedies to problems that may affect research and publication practices, across the natural and social sciences. Fanelli helps answer these and other questions by analysing patterns in the scientific literature using meta- analysis, regression and any other suitable methodology. He is a member of the Research Ethics and Bioethics Advisory Committee of Italy’s National Research Council, for which he developed the first research integrity guidelines, and of the Research Integrity Committee of the Luxembourg Agency for Research Integrity (LARI).


Fanelli D (2019) A theory and methodology to quantify knowledge. Royal Society Open Science – doi.org/10.1098/rsos.181055. (PDF)

4 page Background: Fanelli D (2018) Is science really facing a reproducibility crisis, and do we need it to? PNAS –doi.org/10.1073/pnas.1708272114. (PDF)

Slides & Video Links: 

See Phil-Stat-Wars.com

*Meeting 16 of our the general Phil Stat series which began with the LSE Seminar PH500 on May 21

Categories: Phil Stat Forum, replication crisis, stat wars and their casualties | Leave a comment

A. Spanos: Jerzy Neyman and his Enduring Legacy (guest post)

I am reblogging a guest post that Aris Spanos wrote for this blog on Neyman’s birthday some years ago.   

A. Spanos

A Statistical Model as a Chance Mechanism
Aris Spanos 

Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing and his theory of Confidence Intervals. (This article was first posted here.)

Neyman: 16 April

Neyman: 16 April 1894 – 5 Aug 1981

One of Neyman’s most remarkable, but least recognized, achievements was his adapting of Fisher’s (1922) notion of a statistical model to render it pertinent for  non-random samples. Fisher’s original parametric statistical model Mθ(x) was based on the idea of ‘a hypothetical infinite population’, chosen so as to ensure that the observed data x0:=(x1,x2,…,xn) can be viewed as a ‘truly representative sample’ from that ‘population’:

“The postulate of randomness thus resolves itself into the question, Of what population is this a random sample? (ibid., p. 313), underscoring that: the adequacy of our choice may be tested a posteriori.’’ (p. 314)

In cases where data x0 come from sample surveys or it can be viewed as a typical realization of a random sample X:=(X1,X2,…,Xn), i.e. Independent and Identically Distributed (IID) random variables, the ‘population’ metaphor can be helpful in adding some intuitive appeal to the inductive dimension of statistical inference, because one can imagine using a subset of a population (the sample) to draw inferences pertaining to the whole population.

This ‘infinite population’ metaphor, however, is of limited value in most applied disciplines relying on observational data. To see how inept this metaphor is consider the question: what is the hypothetical ‘population’ when modeling the gyrations of stock market prices? More generally, what is observed in such cases is a certain on-going process and not a fixed population from which we can select a representative sample. For that very reason, most economists in the 1930s considered Fisher’s statistical modeling irrelevant for economic data!

Due primarily to Neyman’s experience with empirical modeling in a number of applied fields, including genetics, agriculture, epidemiology, biology, astronomy and economics, his notion of a statistical model, evolved beyond Fisher’s ‘infinite populations’ in the 1930s into Neyman’s frequentist ‘chance mechanisms’ (see Neyman, 1950, 1952):

Guessing and then verifying the ‘chance mechanism’, the repeated operation of which produces the observed frequencies. This is a problem of ‘frequentist probability theory’. Occasionally, this step is labeled ‘model building’. Naturally, the guessed chance mechanism is hypothetical. (Neyman, 1977, p. 99)

From my perspective, this was a major step forward for several reasons, including the following.

First, the notion of a statistical model as a ‘chance mechanism’ extended the intended scope of statistical modeling to include dynamic phenomena that give rise to data from non-IID samples, i.e. data that exhibit both dependence and heterogeneity, like stock prices.

Second, the notion of a statistical model as a ‘chance mechanism’ is not only of metaphorical value, but it can be operationalized in the context of a statistical model, formalized by:

Mθ(x)={f(x;θ), θ∈Θ}, x∈Rn , Θ⊂Rm; m << n,

where the distribution of the sample f(x;θ) describes the probabilistic assumptions of the statistical model. This takes the form of a statistical Generating Mechanism (GM), stemming from  f(x;θ), that can be used to generate simulated data on a computer. An example of such a Statistical GM is:

Xt = α0 + α1Xt-1 + σεt,  t=1,2,…,n

This indicates how one can use pseudo-random numbers for the error term  εt ~NIID(0,1) to simulate data for the Normal, AutoRegressive [AR(1)] Model. One can generate numerous sample realizations, say N=100000, of sample size n in nanoseconds on a PC.

Third, the notion of a statistical model as a ‘chance mechanism’ puts a totally different spin on another metaphor widely used by uninformed critics of frequentist inference. This is the ‘long-run’ metaphor associated with the relevant error probabilities used to calibrate frequentist inferences. The operationalization of the statistical GM reveals that the temporal aspect of this metaphor is totally irrelevant for the frequentist inference; remember Keynes’s catch phrase “In the long run we are all dead”? Instead, what matters in practice is its repeatability in principle, not over time! For instance, one can use the above statistical GM to generate the empirical sampling distributions for any test statistic, and thus render operational, not only the pre-data error probabilities like the type I-II as well as the power of a test, but also the post-data probabilities associated with the severity evaluation; see Mayo (1996).

I have restored all available links to the following references.

For further discussion on the above issues see:

Spanos, A. (2013), “A Frequentist Interpretation of Probability for Model-Based Inductive Inference,” in Synthese.

Fisher, R. A. (1922), “On the mathematical foundations of theoretical statistics,” Philosophical Transactions of the Royal Society A, 222: 309-368.

Mayo, D. G. (1996), Error and the Growth of Experimental Knowledge, The University of Chicago Press, Chicago.

Neyman, J. (1950), First Course in Probability and Statistics, Henry Holt, NY.

Neyman, J. (1952), Lectures and Conferences on Mathematical Statistics and Probability, 2nd ed. U.S. Department of Agriculture, Washington.

Neyman, J. (1977), “Frequentist Probability and Frequentist Statistics,” Synthese, 36, 97-131.

[i]He was born in an area that was part of Russia.

Categories: Neyman, Spanos | Leave a comment

Happy Birthday Neyman: What was Neyman opposing when he opposed the ‘Inferential’ Probabilists?


Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). I’m posting a link to a quirky paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper is Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as typically thought. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum.

Note: In this article,”attacks” on various statistical “fronts” refers to ways of attacking problems in one or another statistical research program.

What doesn’t Neyman like about Birnbaum’s advocacy of a Principle of Sufficiency S (p. 25)? He doesn’t like that it is advanced as a normative principle (e.g., about when evidence is or ought to be deemed equivalent) rather than a criterion that does something for you, such as control errors. (Presumably it is relevant to a type of context, say parametric inference within a model.) S is put forward as a kind of principle of rationality, rather than one with a rationale in solving some statistical problem

“The principle of sufficiency (S): If E is specified experiment, with outcomes x; if t = t (x) is any sufficient statistic; and if E’ is the experiment, derived from E, in which any outcome x of E is represented only by the corresponding value t = t (x) of the sufficient statistic; then for each x, Ev (E, x) = Ev (E’, t) where t = t (x)… (S) may be described informally as asserting the ‘irrelevance of observations independent of a sufficient statistic’.”

Ev(E, x) is a metalogical symbol referring to the evidence from experiment E with result x. The very idea that there is such a thing as an evidence function is never explained, but to Birnbaum “inferential theory” required such things. (At least that’s how he started out.) The view is very philosophical and it inherits much from logical positivism and logics of induction.The principle S, and also other principles of Birnbaum, have a normative character: Birnbaum considers them “compellingly appropriate”.

“The principles of Birnbaum appear as a kind of substitutes for known theorems” Neyman says. For example, various authors proved theorems to the general effect that the use of sufficient statistics will minimize the frequency of errors. But if you just start with the rationale (minimizing the frequency of errors, say) you wouldn’t need these”principles” from on high as it were. That’s what Neyman seems to be saying in his criticism of them in this paper. Do you agree? He has the same gripe concerning Cornfield’s conception of a default-type Bayesian account akin to Jeffreys. Why?

[i] I am grateful to @omaclaran for reminding me of this paper on twitter in 2018.

[ii] Or so I argue in my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, 2018, CUP.

[iii] Do you think Neyman is using “breakthrough” here in reference to Savage’s description of Birnbaum’s “proof” of the (strong) Likelihood Principle? Or is it the other way round? Or neither? Please weigh in.


Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘, Revue De l’Institut International De Statistique / Review of the International Statistical Institute, 30(1), 11-27.

Categories: Bayesian/frequentist, Error Statistics, Neyman | 3 Comments

Intellectual conflicts of interest: Reviewers


Where do journal editors look to find someone to referee your manuscript (in the typical “double blind” review system in academic journals)? One obvious place to look is the reference list in your paper. After all, if you’ve cited them, they must know about the topic of your paper, putting them in a good position to write a useful review. The problem is that if your paper is on a topic of ardent disagreement, and you argue in favor of one side of the debates, then your reference list is likely to include those with actual or perceived conflicts of interest. After all, if someone has a strong standpoint on an issue of some controversy, and a strong interest in persuading others to accept their side, it creates an intellectual conflict of interest, if that person has power to uphold that view. Since your referee is in a position of significant power to do just that, it follows that they have a conflict of interest (COI). A lot of attention is paid to author’s conflicts of interest, but little into intellectual or ideological conflicts of interests of reviewers. At most, the concern is with the reviewer having special reasons to favor the author, usually thought to be indicated by having been a previous co-author. We’ve been talking about journal editors conflicts of interest as of late (e.g., with Mark Burgman’s presentation at the last Phil Stat Forum) and this brings to mind another one.

But is it true that just because a reviewer is put in a position of competing interests (staunchly believing in a position opposed to yours, while under an obligation to provide a fair and unbiased review) that their fairness in executing the latter is compromised? I surmise that your answer to this question will depend on which of two scenarios you imagine yourself in: In the first, you imagine yourself reviewing a paper that argues in favor of a position that you oppose. In the second, you imagine that your paper, which argues in favor of a view, has been sent to a reviewer with a vested interest in opposing that view.

In other words, if the paper argues in favor of a position, call it position X, and you oppose X, I’m guessing you imagine you’d have no trouble giving fair and constructive assessments of arguments in favor of X. You would not dismiss arguments in favor of X, just because you sincerely oppose X. You’d give solid reasons. You’d be much more likely to question if a reviewer, staunchly opposed to position X, will be an unbiased reviewer of your paper in favor of X. I’m not biased, but they are.

I think the truth is that reviewers with a strong standpoint on a controversial issue, are likely to have an intellectual conflict of interest in reviewing a paper in favor of a position they oppose. Recall that it suffices, according to standard definitions of an individual having a COI, that reasonable grounds exist to question whether their judgments and decisions can be unbiased. (For example, investment advisors avoid recommending stocks they themselves own, to avoid a conflict of interest.) If this is correct, does it follow that opponents of a contentious issue should not serve as reviewers of papers that take an opposite stance?  I say no because an author can learn a lot from a biased review about how to present their argument in the strongest possible terms, and how to zero in on the misunderstandings and confusions underlying objections to the view. Authors will almost surely not persuade such a reviewer by means of a revised paper, but they will be in possession of an argument that may enable them to persuade others.

A reviewer who deeply opposes position X will indeed, almost certainly, raise criticisms of a paper that favors X, but it does not follow that they are not objective or valid criticisms. Nevertheless, if all the reviewers come from this group, the result is still an unbalanced and unfair assessment, especially in that–objective or not–the critical assessment is more likely to accentuate the negative. If the position X happens to be currently unpopular, and opposing X the “received” position extolled by leaders of associations, journals, and institutions, then restricting reviewers to those opposed to X would obstruct intellectual progress. Progress comes from challenging the status quo and the tendency of people to groupthink and to jump on the bandwagon endorsed by many influential thought leaders of the day. Thus it would make sense for authors to have an opportunity to point out ahead of time to journal editors–who might not be aware of the particular controversy–the subset of references with a vested intellectual interest against the view for which they are arguing. If the paper is nevertheless sent to those reviewers, a judicious journal editor should weigh very heavily the author’s retorts and rejoinders. [1]

Here’s an example from outside of academia–the origins of the Coronavirus. The president of an organization that is directly involved with and heavily supported by funds for experimenting on coronaviruses, Peter Daszak, has a vested interest in blocking hypotheses of lab leaks or lab errors. Such hypotheses, if accepted, would have huge and adverse effects on that research and its regulation. When he is appointed to investigate Coronavirus origins, he has a conflict of interest. See this post.

Molecular biologist, Richard Ebright, one of the scientists to Call for a Full and Unrestricted International Forensic Investigation into the Origins of COVID-19 claims “the fact that the WHO named Daszak as a member of its mission, and the fact that the WHO retained Daszak as a member of its mission after being informed of his conflicts of interest, make it clear that the WHO study cannot be considered a credible, independent investigation.” (LINK) If all the reviewers of a paper in support of a lab association come from team Daszak, the paper is scarcely being given a fair shake.

Do you agree? Share your thoughts in the comments.

[1] The problem is compounded by the fact that today there are more journal submissions than ever, and with the difficulty in getting volunteers, there’s pressure on the journal editor not to dismiss the views of referees. My guess is that anonymity doesn’t play a big role most of the time.


Categories: conflicts of interest, journal referees | 12 Comments

ASA to Release the Recommendations of its Task Force on Statistical Significance and Replication

The American Statistical Association has announced that it has decided to reverse course and share the recommendations developed by the ASA Task Force on Statistical Significance and Replicability in one of its official channels. The ASA Board created this group [1] in November 2019 “with a charge to develop thoughtful principles and practices that the ASA can endorse and share with scientists and journal editors.” (AMSTATNEWS 1 February 2020). Some members of the ASA Board felt that its earlier decision not to make these recommendations public, but instead to leave the group to publish its recommendations on its own, might give the appearance of a conflict of interest between the obligation of the ASA to represent the wide variety of methodologies used by its members in widely diverse fields, and the advocacy by some members who believe practitioners should stop using the term “statistical significance” and end the practice of using p-value thresholds in interpreting data [the Wasserstein et al. (2019) editorial]. I think that deciding to publicly share the new Task Force recommendations is very welcome, given especially that the Task Force was appointed to avoid just such an apparent conflict of interest. Past ASA President, Karen Kafadar noted:

Many of you have written of instances in which authors and journal editors—and even some ASA members—have mistakenly assumed this [Wasserstein et al. (2019)] editorial represented ASA policy. The mistake is understandable: The editorial was co-authored by an official of the ASA.

… To address these issues, I hope to establish a working group that will prepare a thoughtful and concise piece … without leaving the impression that p-values and hypothesis tests…have no role in ‘good statistical practice’. (K. Kafadar, President’s Corner, 2019, p. 4)

Thus the Task Force on Statistical Significance and Replicability was born. Meanwhile, its recommendations remain under wraps. The one principle mentioned in Kafadar’s JSM presentation is that there be a disclaimer on all publications, articles, editorials authored by ASA staff, making it clear that the views presented are theirs and not the associations. It is good that we can now count on seeing the original recommendations. Were they only to have appeared in a distinct publication, perhaps in a non-statistics journal, we would never actually know if we were getting to see the original recommendations, or some modified version of them.

For a blogpost that provides the background to this episode, see “Why hasn’t the ASA board revealed the recommendations of its new task force on statistical significance and replicability?”


[1] Members of the ASA Task Force on Statistical Significance and Replicability

Linda Young, National Agricultural Statistics Service and University of Florida (Co-Chair)
Xuming He, University of Michigan (Co-Chair)
Yoav Benjamini, Tel Aviv University
Dick De Veaux, Williams College (ASA Vice President)
Bradley Efron, Stanford University
Scott Evans, The George Washington University (ASA Publications Representative)
Mark Glickman, Harvard University (ASA Section Representative)
Barry Graubard, National Cancer Institute
Xiao-Li Meng, Harvard University
Vijay Nair, Wells Fargo and University of Michigan
Nancy Reid, University of Toronto
Stephen Stigler, The University of Chicago
Stephen Vardeman, Iowa State University
Chris Wikle, University of Missouri





Kafadar, K. Presidents Corner “The Year in Review … And More to Come” AMSTATNEWS 1 December 2019.

“Highlights of the November 2019 ASA Board of Directors Meeting”, AMSTATNEWS 1 January 2020.

Kafadar, K. “Task Force on Statistical Significance and Replicability Created”, AMSTATNEWS 1 February 2020.

Categories: conflicts of interest | Leave a comment

The Stat Wars and Intellectual conflicts of interest: Journal Editors


Like most wars, the Statistics Wars continues to have casualties. Some of the reforms thought to improve reliability and replication may actually create obstacles to methods known to improve on reliability and replication. At each one of our meeting of the Phil Stat Forum: “The Statistics Wars and Their Casualties,” I take 5 -10 minutes to draw out a proper subset of casualties associated with the topic of the presenter for the day. (The associated workshop that I have been organizing with Roman Frigg at the London School of Economics (CPNSS) now has a date for a hoped for in-person meeting in London: 24-25 September 2021.) Of course we’re interested not just in casualties but in positive contributions, though what counts as a casualty and what a contribution is itself a focus of philosophy of statistics battles.

At our last meeting, Thursday, 25 March, Mark Burgman, Director of the Centre for Environmental Policy at Imperial College London and Editor-in-Chief of the journal Conservation Biology, spoke on “How should applied science journal editors deal with statistical controversies?“. His slides are here:  (pdf). The casualty I focussed on is how the statistics wars may put journal editors in positions of conflicts of interest that can get in the way of transparency and avoidance of bias. I presented it in terms of 4 questions (nothing to do with the fact that it’s currently Passover):


D. Mayo’s Casualties: Intellectual Conflicts of Interest: Questions for Burgman


  1. In an applied field such as conservation science, where statistical inferences often are the basis for controversial policy decisions, should editors and editorial policies avoid endorsing one side of the long-standing debate revolving around statistical significance tests?  Or should they adopt and promote a favored methodology?
  2. If editors should avoid taking a side in setting author’s guidelines and reviewing papers, what policies should be adopted to avoid deferring to the calls of those wanting them to change their author’s guidelines? Have you ever been encouraged to do so?
  3. If one has a strong philosophical statistical standpoint and a strong interest in persuading others to accept it, does it create a conflict of interest, if that person has power to enforce that philosophy (especially in a group already driven by perverse incentives)? If so, what is your journal doing to take account of and prevent conflicts of interest?
  4. What do you think of the March 2019 Editorial of The American Statistician (Wasserstein et al., 2019) Don’t say “statistical significance” and don’t use predesignated p-value thresholds in interpreting data (e.g., .05, .01, .005).

(While not an ASA policy document, Wasserstein’s status as ASA executive director gave it a lot of clout. Should he have issued a disclaimer that the article only represents the authors’ views?) [1]

This is the first of some posts on intellectual conflicts of interest that I’ll be writing shortly. [2]

Mark Burgman’s presentation (Link)

D. Mayo’s Casualties (Link)

[1] For those who don’t know the story: Because no disclaimer was issued, the ASA Board appointed a new task force on Statistical Significance and Reproducibility in 2019 to provide recommendations. These have thus far not been made public. For the background, see this post.

Burgman said that he had received a request to follow the “don’t say significance, don’t use P-value thresholds”, but upon considering it with colleagues, they decided against it. Why not include, as part of journal information shared with authors, that the editors consider it important to retain a variety of statistical methodologies–correctly used–and have explicitly rejected the call to ban any of them (even if they come with official association letterhead).

[2] WordPress has just sprung a radical change on bloggers, and as I haven’t figured it out yet, and my blog assistant is unavailable, I’ve cut this post short.

Categories: Error Statistics | Leave a comment

Reminder: March 25 “How Should Applied Science Journal Editors Deal With Statistical Controversies?” (Mark Burgman)

The seventh meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

25 March, 2021

TIME: 15:00-16:45 (London); 11:00-12:45 (New York, NOTE TIME CHANGE TO MATCH UK TIME**)

For information about the Phil Stat Wars forum and how to join, click on this link.

How should applied science journal editors deal with statistical controversies?

Mark Burgman

Mark Burgman is the Director of the Centre for Environmental Policy at Imperial College London and Editor-in-Chief of the journal Conservation Biology, Chair in Risk Analysis & Environmental Policy. Previously, he was Adrienne Clarke Chair of Botany at the University of Melbourne, Australia. He works on expert judgement, ecological modelling, conservation biology and risk assessment. He has written models for biosecurity, medicine regulation, marine fisheries, forestry, irrigation, electrical power utilities, mining, and national park planning. He received a BSc from the University of New South Wales (1974), an MSc from Macquarie University, Sydney (1981), and a PhD from the State University of New York at Stony Brook (1987). He worked as a consultant ecologist and research scientist in Australia, the United States and Switzerland during the 1980’s before joining the University of Melbourne in 1990. He joined CEP in February, 2017. He has published over two hundred and fifty refereed papers and book chapters and seven authored books. He was elected to the Australian Academy of Science in 2006.

Abstract: Applied sciences come with different focuses. In environmental science, as in epidemiology, the framing and context of problems is often in crises. Decisions are imminent, data and understanding are incomplete, and ramifications of decisions are substantial. This context makes the implications of inferences from data especially poignant. It also makes the claims made by fervent and dedicated authors especially challenging. The full gamut of potential statistical foibles and psychological frailties are on display. In this presentation, I will outline and summarise the kinds of errors of reasoning that are especially prevalent in ecology and conservation biology. I will outline how these things appear to be changing, providing some recent examples. Finally, I will describe some implications of alternative editorial policies.

Some questions:

*Would it be a good thing to dispense with p-values, either through encouragement or through strict editorial policy?

*Would it be a good thing to insist on confidence intervals?

*Should editors of journals in a broad discipline, band together and post common editorial policies for statistical inference?

*Should all papers be reviewed by a professional statistician?

If so, which kind?


Professor Burgman is developing this topic anew, so we don’t have the usual background reading. However, we do have his slides:

*Mark Burgman’s Draft Slides:  “How should applied science journal editors deal with statistical controversies?” (pdf)

*D. Mayo’s Slides: “The Statistics Wars and Their Casualties for Journal Editors: Intellectual Conflicts of Interest: Questions for Burgman” (pdf)

*A paper of mine from the Joint Statistical Meetings, “Rejecting Statistical Significance Tests: Defanging the Arguments”, discusses an episode that is relevant for the general topic of how journal editors should deal with statistical controversies.

Video Links: 

Mark Burgman’s presentation:

D. Mayo’s Casualties:

Please feel free to continue the discussion by posting questions or thoughts in the comments section on this PhilStatWars post.

*Meeting 15 of our the general Phil Stat series which began with the LSE Seminar PH500 on May 21

**UK doesn’t change their clock until March 28.

Categories: ASA Guide to P-values, confidence intervals and tests, P-values, significance tests | Tags: , | 1 Comment

Pandemic Nostalgia: The Corona Princess: Learning from a petri dish cruise (reblog 1yr)


Last week, giving a long postponed talk for the NY/NY Metro Area Philosophers of Science Group (MAPS), I mentioned how my book Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP) invites the reader to see themselves on a special interest cruise as we revisit old and new controversies in the philosophy of statistics–noting that I had no idea in writing the book that cruise ships would themselves become controversial in just a few years. The first thing I wrote during early pandemic days last March was this post on the Diamond Princess. The statistics gleaned from the ship remain important resources which haven’t been far off in many ways. I reblog it here. Continue reading

Categories: covid-19, memory lane | Leave a comment

March 25 “How Should Applied Science Journal Editors Deal With Statistical Controversies?” (Mark Burgman)

The seventh meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

25 March, 2021

TIME: 15:00-16:45 (London); 11:00-12:45 (New York, NOTE TIME CHANGE)

For information about the Phil Stat Wars forum and how to join, click on this link.

How should applied science journal editors deal with statistical controversies?

Mark Burgman Continue reading

Categories: ASA Guide to P-values, confidence intervals and tests, P-values, significance tests | Tags: , | 1 Comment

Falsifying claims of trust in bat coronavirus research: mysteries of the mine (i)-(iv)


Have you ever wondered if people read Master’s (or even Ph.D) theses a decade out? Whether or not you have, I think you will be intrigued to learn the story of why an obscure Master’s thesis from 2012, translated from Chinese in 2020, is now an integral key for unravelling the puzzle of the global controversy about the mechanism and origins of Covid-19. The Master’s thesis by a doctor, Li Xu [1], “The Analysis of 6 Patients with Severe Pneumonia Caused by Unknown Viruses”, describes 6 patients he helped to treat after they entered a hospital in 2012, one after the other, suffering from an atypical pneumonia from cleaning up after bats in an abandoned copper mine in China. Given the keen interest in finding the origin of the 2002–2003 severe acute respiratory syndrome (SARS) outbreak, Li wrote: “This makes the research of the bats in the mine where the six miners worked and later suffered from severe pneumonia caused by unknown virus a significant research topic”. He and the other doctors treating the mine cleaners hypothesized that their diseases were caused by a SARS-like coronavirus from having been in close proximity to the bats in the mine. Continue reading

Categories: covid-19, falsification, science communication | 18 Comments

Aris Spanos: Modeling vs. Inference in Frequentist Statistics (guest post)


Aris Spanos
Wilson Schmidt Professor of Economics
Department of Economics
Virginia Tech

The following guest post (link to updated PDF) was written in response to C. Hennig’s presentation at our Phil Stat Wars Forum on 18 February, 2021: “Testing With Models That Are Not True”. Continue reading

Categories: misspecification testing, Spanos, stat wars and their casualties | 11 Comments

R.A. Fisher: “Statistical methods and Scientific Induction” with replies by Neyman and E.S. Pearson

In Recognition of Fisher’s birthday (Feb 17), I reblog his contribution to the “Triad”–an exchange between  Fisher, Neyman and Pearson 20 years after the Fisher-Neyman break-up. The other two are below. My favorite is the reply by E.S. Pearson, but all are chock full of gems for different reasons. They are each very short and are worth your rereading. Continue reading

Categories: E.S. Pearson, Fisher, Neyman, phil/history of stat | Leave a comment

R. A. Fisher: How an Outsider Revolutionized Statistics (Aris Spanos)



This is a belated birthday post for R.A. Fisher (17 February, 1890-29 July, 1962)–it’s a guest post from earlier on this blog by Aris Spanos that has gotten the highest number of hits over the years. 

Happy belated birthday to R.A. Fisher!

‘R. A. Fisher: How an Outsider Revolutionized Statistics’

by Aris Spanos

Few statisticians will dispute that R. A. Fisher (February 17, 1890 – July 29, 1962) is the father of modern statistics; see Savage (1976), Rao (1992). Inspired by William Gosset’s (1908) paper on the Student’s t finite sampling distribution, he recast statistics into the modern model-based induction in a series of papers in the early 1920s. He put forward a theory of optimal estimation based on the method of maximum likelihood that has changed only marginally over the last century. His significance testing, spearheaded by the p-value, provided the basis for the Neyman-Pearson theory of optimal testing in the early 1930s. According to Hald (1998) Continue reading

Categories: Fisher, phil/history of stat, Spanos | 2 Comments

Reminder: February 18 “Testing with models that are not true” (Christian Hennig)

The sixth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

18 February, 2021

TIME: 15:00-16:45 (London); 10-11:45 a.m. (New York, EST)

For information about the Phil Stat Wars forum and how to join, click on this link. 


Testing with Models that Are Not True Continue reading

Categories: Phil Stat Forum | Leave a comment

S. Senn: The Power of Negative Thinking (guest post)



Stephen Senn
Consultant Statistician
Edinburgh, Scotland

Sepsis sceptic

During an exchange on Twitter, Lawrence Lynn drew my attention to a paper by Laffey and Kavanagh[1]. This makes an interesting, useful and very depressing assessment of the situation as regards clinical trials in critical care. The authors make various claims that RCTs in this field are not useful as currently conducted. I don’t agree with the authors’ logic here although, perhaps, surprisingly, I consider that their conclusion might be true. I propose to discuss this here. Continue reading

Categories: power, randomization | 5 Comments

February 18 “Testing with models that are not true” (Christian Hennig)

The sixth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

18 February, 2021

TIME: 15:00-16:45 (London); 10-11:45 a.m. (New York, EST)

For information about the Phil Stat Wars forum and how to join, click on this link. 


Testing with Models that Are Not True

Christian Hennig

Continue reading

Categories: Phil Stat Forum | 1 Comment

The Covid-19 Mask Wars : Hi-Fi Mask Asks


Effective yesterday, February 1, it is a violation of federal law not to wear a mask on a public conveyance or in a transit hub, including taxis, trains and commercial trucks (The 11 page mandate is here.)

The “mask wars” are a major source of disagreement and politicizing science during the current pandemic, but my interest here is not of clashes between pro-and anti-mask culture warriors, but the clashing recommendations among science policy officials and scientists wearing their policy hats. A recent Washington Post editorial by Joseph Allen, (director of the Healthy Buildings program at the Harvard T.H. Chan School of Public Health), declares “Everyone should be wearing N95 masks now”. In his view: Continue reading

Categories: covid-19 | 27 Comments

January 28 Phil Stat Forum “How Can We Improve Replicability?” (Alexander Bird)

The fifth meeting of our Phil Stat Forum*:

The Statistics Wars
and Their Casualties

28 January, 2021

TIME: 15:00-16:45 (London); 10-11:45 a.m. (New York, EST)


“How can we improve replicability?”

Alexander Bird 

Continue reading

Categories: Phil Stat Forum | 1 Comment

S. Senn: “Beta testing”: The Pfizer/BioNTech statistical analysis of their Covid-19 vaccine trial (guest post)


Stephen Senn

Consultant Statistician
Edinburgh, Scotland

The usual warning

Although I have researched on clinical trial design for many years, prior to the COVID-19 epidemic I had had nothing to do with vaccines. The only object of these amateur musings is to amuse amateurs by raising some issues I have pondered and found interesting. Continue reading

Categories: covid-19, PhilStat/Med, S. Senn | 16 Comments

Why hasn’t the ASA Board revealed the recommendations of its new task force on statistical significance and replicability?

something’s not revealed

A little over a year ago, the board of the American Statistical Association (ASA) appointed a new Task Force on Statistical Significance and Replicability (under then president, Karen Kafadar), to provide it with recommendations. [Its members are here (i).] You might remember my blogpost at the time, “Les Stats C’est Moi”. The Task Force worked quickly, despite the pandemic, giving its recommendations to the ASA Board early, in time for the Joint Statistical Meetings at the end of July 2020. But the ASA hasn’t revealed the Task Force’s recommendations, and I just learned yesterday that it has no plans to do so*. A panel session I was in at the JSM, (P-values and ‘Statistical Significance’: Deconstructing the Arguments), grew out of this episode, and papers from the proceedings are now out. The introduction to my contribution gives you the background to my question, while revealing one of the recommendations (I only know of 2). Continue reading

Categories: 2016 ASA Statement on P-values, JSM 2020, replication crisis, statistical significance tests, straw person fallacy | 7 Comments

Blog at WordPress.com.