10 years after the July 4 statistical discovery of the the Higgs & the value of negative results


Today marks a decade since the discovery on July 4, 2012 of evidence for a Higgs particle based on a “5 sigma observed effect”. CERN celebrated with a scientific symposium (webcast here). The observed effect refers to the number of excess events of a given type that are “observed” in comparison to the number that would be expected from background alone—which they can simulate in particle detectors. Because the 5-sigma standard refers to a benchmark from frequentist significance testing, the discovery was immediately imbued with controversies that, at bottom, concerned statistical philosophy.

Just a few days after the big Higgs announcement in 2012, murmurings could be heard among some Bayesian statisticians as well as in the popular press. Why a 5-sigma standard? Do significance tests in high-energy particle (HEP) physics escape the misuses of P values found in the social sciences and other sciences? While the world of physics was toasting the great discovery, there were grumblings back at the International Society of Bayesian Analysis (ISBA), raised by a leading subjective Bayesian Dennis Lindley. A letter that was being sent around to the ISBA list was leaked to me, written by statistician Tony O’Hagan. “Dear Bayesians,” the letter began, “A question from Dennis Lindley prompts me to consult this list in search of answers. We’ve heard a lot about the Higgs boson.”

Why such an extreme evidence requirement? We know from a Bayesian perspective that this only makes sense if (a) the existence of the Higgs boson . . . has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme. (O’Hagan 2012) [1]

Neither of these seemed to be the case in his opinion: “Is the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?” (O’Hagan 2012).

Bad science? It is not bad science at all. In fact, HEP physicists are sophisticated with their statistical methodology—they had seen too many bumps disappear. They want to ensure that before announcing a new particle has been discovered that, at the very least, the results being spurious is given a run for its money. Significance tests, followed by confidence intervals, are methods of choice here for good reason. You can read Lindley’s full letter here.

The last 10 years have been disappointing the HEP physicist community, as they haven’t been able to discover new particles that would knock down the Standard Model (SM) to yield physics beyond the SM, i.e., BSM physics. It is presumed there must be some BSM, given SM’s limits in explaining gravity and dark matter. Yet the Higgs particle that was discovered in 2012 continues to look like a plain-vanilla SM Higgs. In each inquiry, described as “in search for” some form of BSM, the SM null hypothesis holds up–despite massive data. While disappointing physicists, this negative role of significance tests is crucial for blocking wrong pathways for developing new physics. 

There’s a short Higgs discussion in Tour III in Excursion 3 (pp. 202-217) of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s an excerpt that links the O’Hagan letter to a 2015 update to the search for BSM: 

Back to O’Hagan and a 2015/2016 Update

O’Hagan published a digest of responses a few days later. When it was clear his letter had not met with altogether enthusiastic responses, he backed off, admitting that he had only been being provocative with the earlier letter. Still, he declares, the Higgs researchers would have been better off avoiding the “ad hoc” 5 sigma by doing a proper (subjective) Bayesian analysis. “They would surely be willing to [announce SM Higgs discovery] if they were, for instance,  99.99 percent certain” [SM Higgs] existed. Wouldn’t it be better to report

Pr(SM Higgs|data) = 0.9999?

Actually, no. Not if it’s taken as a formal probability rather than a chosen way to abbreviate: the reality of the SM Higgs has passed a severe test. Physicists believed in a Higgs particle before building the big billion dollar collider. Given the perfect predictive success of the SM, and its simplicity, such beliefs would meet the familiar standards for plausibility. But that’s very different from having evidence for a discovery, or information about the characteristics of the particle. Many aver they didn’t expect it to have so small a mass, 125 GeV. In fact, given the unhappy consequences some find with this low mass, some researchers may well have gone back and changed their prior probabilities to arrive at something more sensible (more “natural” in the parlance of HEP). Yet, their strong argument from coincidence via significance tests prevented the effect from going away.

O’Hagan/Lindley admit that a subjective Bayesian model for the Higgs would require prior probabilities to scads of high dimensional “nuisance” parameters of the background and the signal; it would demand multivariate priors, correlations between parameters, joint priors, and the ever worrisome Bayesian catchall factor:  Pr(data|not- H*). Lindley’s idea of subjectively eliciting beliefs from HEP physicists is rather unrealistic here.

Now for the update. When the collider restarted in 2015, it had far greater collider energies than before. On December 15, 2015 something exciting happened: “ATLAS and CMS both reported a small ‘bump’ in their data” at a much higher energy level than the Higgs: 750 GeV (compared to 125 GeV) (Cartlidge 2016). “As this unexpected bump could be the first hint of a new massive particle that is not predicted by the Standard Model of particle physics, the data generated hundreds of theory papers that attempt to explain the signal” (ibid.). I believe it was 500.

…Could the Bayesian model wind up in the same place? Not if Lindley/ O’Hagan’s subjective model merely keeps updating beliefs in the already expected parameters. According to Savage, “The probability of ‘something else’ … is definitely very small” (Savage 1962, p. 80). It would seem to require a long string of anomalies before the catchall is made sufficiently probable to start seeking new physics. Would they come up with a particle like the one they were now in a frenzy to explain? Maybe, but it would be a far less efficient way for discovery than the simple significance tests.

It turned out that the promising bump or “resonance” (a great HEP term) disappeared as more data became available, drowning out the significant indications seen in April. Its reality was falsified. The null hypothesis correctly explains the data. While disappointing to physicists, this negative role of significance tests is crucial for for denying BSM anomalies are real, and setting upper bounds for these discrepancies with the SM Higgs.

Tomorrow, Run 3 begins with a much more powerful collider, and physicists are optimistic! 


Selected earlier blogposts on the Higgs Discovery:


Mayo, D. (2018). “Experimental Flukes and Statistical Modeling in the Higgs Discovery,” in Isabelle Peschard and Bas van Fraassen (eds.), The Experimental Side of Modeling in Minnesota Studies in the Philosophy of Science, University of Minnesota Press, 189-217.

Categories: Error Statistics | Leave a comment

D. Lakens responds to confidence interval crusading journal editors


In what began as a guest commentary on my 2021 editorial in Conservation Biology, Daniël Lakens recently published a response to a recommendation against using null hypothesis significance tests by journal editors from the International Society of Physiotherapy Journal. Here are some excerpts from his full article, replies (‘response to Lakens‘), links and a few comments of my own.


“….The editors list five problems with p values. First, they state that ‘p values do not equate to a probability that researchers need to know’ because ‘Researchers need to know the probability that the null hypothesis is true given the data observed in their study.’ Regrettably, the editors do not seem to realise that only God knows the probability that the null hypothesis is true given the data observed, and no statistical method can provide it. Estimation will not tell you anything about the probability of hypotheses. Their second point is that a p values does not constitute evidence. Neither do estimates, so their proposed alternative suffers from the same criticism. Third, the editors claim that significant results have a low probability of replicating, and that when a p value between 0.005 and 0.05 is observed, repeating this study would only have a 67% probability of observing a significant result. This is incorrect. The citation to Boos and Stefanski is based on the assumption that multiple p values are available to estimate the average power of the set of studies, and that the studies will have 67% power. It is not possible to determine the probability a study will replicate based on a single p value. Furthermore, well-designed replication studies do not use the same sample size as an earlier study, but are designed to have high power for an effect size of interest. Fourth, the editors argue, without any empirical evidence, that in most clinical trials the null hypothesis must be false. The prevalence of null results make it doubtful this statement is true in any practical sense. In an analysis of 11,852 meta-analyses from Cochrane reviews, only 5,903 meta-analyses, or 49.8%, found a statistically significant meta-analytic effect. …Finally, the fifth point that ‘Researchers need to know more than just whether an effect does or does not exist’ is correct, but the ‘more than’ is crucial. It remains important to prevent authors from claiming there is an effect, when they are actually looking at random noise, and therefore, effect sizes complement, but do not replace, hypothesis tests.

The editors recommend to use estimation, and report confidence intervals around estimates. But then the editors write ‘The estimate and its confidence interval should be compared against the “smallest worthwhile effect” of the intervention on that outcome in that population.’ ‘If the estimate and the ends of its confidence interval are all more favourable than the smallest worthwhile effect, then the treatment effect can be interpreted as typically considered worthwhile by patients in that clinical population. If the effect and its confidence interval are less favourable than the smallest worthwhile effect, then the treatment effect can be interpreted as typically considered trivial by patients in that clinical population.’ This is not a description of an estimation approach. The editors are recommending a hypothesis testing approach against a smallest effect size of interest. When examining if the effect is more favorable than the smallest effect size of interest, this is known as a minimum effect test. When examining whether an effect is less favorable than the smallest effect size of interest, this is known as an equivalence test. Both are examples of interval hypothesis tests, where instead of comparing the observed effect against 0 (as in a null-hypothesis test) the effect is compared against a range of values that are deemed theoretically or practically interesting…. Forcing researchers to prematurely test hypotheses against a smallest effect size of interest, before they have carefully established a smallest effect size of interest, will be counterproductive.”

These are excellent points, and I would note a few others (even aside from their misdefining p-values). The problem I have with the complaint that p-values aren’t posterior probabilities is not that only God knows–presumably God would know if a claim was true or if an event would occur, or not. The problem is that no one knows how to obtain, interpret, and justify the required prior probability distributions (except in special cases of frequentist priors), and there’s no agreement as to whether they should be regarded as measuring belief (and in what?), or supply “default”, (data-dominant) priors (obtained from one of several rival systems).

However, in their response to Lakens, the editors aver that “there is little point in knowing the probability that the null hypothesis is true”. That’s because they assume it is always false! They fall into the fallacy of thinking that because no models are literally true claims about the world–that’s why they’re called ‘models’–and with enough data a discrepancy from a point null, however small, may be found, it follows that all nulls are false. This is false. (Read proofs of Mayo 2018, Excursion 4 tour IV here). (It would follow, by the way, that all the assumptions needed to get estimations off the ground are false.) Ironically, the second most famous criticisms of statistical significance tests rest on assuming there is a high (spike) prior probability that the null hypothesis is true. (For a recent example–though the argument is quite old–see Benjamin 2018). Thus, many critics of statistical significance tests agree in retiring tests, but disagree as to whether it’s because the null hypothesis is always false or probably true!

Interestingly “probable” and “probability” come from the Latin probare, meaning to try, test, or prove. “Proof” as in “The proof is in the pudding” refers to how well you put something to the test. You must show or provide good grounds for the claim, not just believe strongly.* If we used “probability” this way, it would be would very close to my idea of measuring how well or severely tested (or how well shown) a claim is. I discuss this on p. 10 of Mayo (2018), which you can read here. But it’s not our current, informal English sense of probability, as varied as that can be. (I recall that Donald Frasier (2011) blamed this on Dennis Lindley)[1]. In any of the currently used meanings, claims can be strongly believed or even known to be true while not being poorly tested by data x.

I don’t generally agree with Stuart Hurbert of the statistics wars, but he has an apt term for the strident movement to insist that only confidence intervals (CIs) be used, no tests of statistical hypotheses: the CI crusade.[2] The problem I have with “confidence interval crusaders,” is that while supplementing tests with confidence intervals (CIs) is good (at least where the model assumptions hold adequately), the “CIs only” crusaders advance the misleading perception that the only alternative is the abusive use of tests which fallaciously take statistical significance as substantive importance and commit all the well-known, hackneyed, fallacies. The “CIs only” crusaders get their condemnation of statistical significance tests off the ground only by identifying them with a highly artificial point null hypothesis test, as if Neyman and Pearson tests (with alternative hypotheses, power etc.) never existed (let alone do they consider the variations discussed by Lakens). But Neyman developed CIs at the same time he and Pearson developed tests. There is a direct duality between tests and intervals: a confidence interval (CI) at level 1 – c consists of parameter values that are not statistically significantly different from the data at significance level c. You can obtain the lower CI bound (at level 1-c) by asking: what parameter value is the data statistically significantly greater than at level c? Lakens is correct that the procedure the editors describe, checking for departures from a chosen value for a “smallest worthwhile effect,” is more properly seen as a test.

My own preferred reformulation of statistical significance tests–in terms of discrepancies (from a reference) that are well or poorly indicated–is in the spirit of CIs, but improves on them. It accomplishes the goal of CIs (to use the data to infer population effect sizes), while providing them with an inferential, and not merely a “performance” justification. The performance justification of a particular confidence interval estimate is merely that it arose from a method with good performance in a long run of uses. The editors to whom Lakens is replying deride this long-run frequency justification of tests, but fail to say how they get around this justification in using CIs. The erroneous construal of the confidence level as a probability the specific estimate is correct is encouraged. What confidence level should be used? The CI advocates typically stick with .95, but it is more informative to consider several different levels, mathematically akin to confidence distributions. Looking at estimates associated with low confidence levels, e.g., .6, .5, .4 is quite informative, but we do not see this. Moreover, the confidence intervals estimate does not tell us, for a given parameter value μ’ in the interval estimate, the answer to questions like: how well (or poorly0warranted is μ > μ’?[3] Again, it’s the poorly warranted (or inseverely tested) claims that are most informative, in my view. And unlike what some seem to think, estimation procedures require the same or more assumptions than do statistical significance tests. That is why simple significance tests (without alternatives) are relied on to test assumptions of statistical models used for estimation.

Lakens’ article is here. He is responding to this editorial; their reply to him, which not all the initial authors signed on to, is here.

Please share your constructive comments.

[1] D. Fraser’s (2011) “Is Bayes posterior just quick and dirty confidence?”

[2] Hurlbert and Lombardi 2009, p. 331. While I agree with their rejection of the “CIs only” crusaders, I reject their own crusade pitting Fisherian tests against Neyman and Pearson tests, rejecting the latter.

[3] For a very short discussion of how the severity reinterpretation of statistical tests of hypotheses connects and improves on CIs, see the Appendix of  Mayo 2020. For access to the proofs of the entire book see excerpts on this blog.

*July 4, 2022 See Grieves’ comment and my reply on the roots of these words.

Hurlbert, S. and Lombardi, C. (2009). Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the NeoFisherian, Annales Zoologici Fennici 46, 31149.

Mayo, D. Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2018)

Categories: stat wars and their casualties, statistical significance tests | 4 Comments



Continue reading

Categories: Error Statistics | 1 Comment

Too little, too late? The “Don’t say significance…” editorial gets a disclaimer (ii)


Someone sent me an email the other day telling me that a disclaimer had been added to the editorial written by the ASA Executive Director and 2 co-authors (Wasserstein et al., 2019) (“Moving to a world beyond ‘p < 0.05′”). It reads:


The editorial was written by the three editors acting as individuals and reflects their scientific views not an an endorsed position of the American Statistical Association.

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, editorial COIs, WSL 2019 | 19 Comments

Philosophy of socially aware data science conference

I’ll be speaking at this conference in Philly tomorrow. My slides are also below.


PDF of my slides: Statistical “Reforms”: Fixing Science or Threats to Replication and Falsification. Continue reading

Categories: Announcement, Philosophy of Statistics, socially aware data science | Leave a comment

D. Mayo & D. Hand: “Statistical significance and its critics: practicing damaging science, or damaging scientific practice?”



Prof. Deborah Mayo, Emerita
Department of Philosophy
Virginia Tech


Prof. David Hand
Department of Mathematics Statistics
Imperial College London

Statistical significance and its critics: practicing damaging science, or damaging scientific practice?  (Synthese)

[pdf of full paper.] Continue reading

Categories: Error Statistics | 3 Comments

Paul Daniell & Yu-li Ko commentaries on Mayo’s ConBio Editorial

I had been posting commentaries daily from January 6, 2022 (on my editorial “The Statistics Wars and Intellectual conflicts of Interest”, Conservation Biology) until Sir David Cox died on January 18, at which point I switched to some memorial items. These two commentaries from what Daniell calls my ‘birthday festschrift’ were left out, and I put them up now. (Links to others are below.)

Continue reading

Categories: Mayo editorial, stat wars and their casualties | 1 Comment

3 Commentaries on my Editorial are being published in Conservation Biology



There are 3 commentaries soon to be published in Conservation Biology on my editorial, “The statistics wars and intellectual conflicts of interest” also published in Conservation Biology. Continue reading

Categories: Mayo editorial, significance tests | Tags: , , , , | Leave a comment

A statistically significant result indicates H’ (μ > μ’) when POW(μ’) is low (not the other way round)–but don’t ignore the standard error


1. New monsters. One of the bizarre facts of life in the statistics wars is that a method from one school may be criticized on grounds that it conflicts with a conception that is the reverse of what that school intends. How is that even to be deciphered? That was the difficult task I set for myself in writing Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP, 2008) [SIST 2018]. I thought I was done, but new monsters keep appearing. In some cases, rather than see how the notion of severity gets us beyond fallacies, misconstruals are taken to criticize severity! So, for example, in the last couple of posts, here and here, I deciphered some of the better known power howlers (discussed in SIST Ex 5 Tour II) I’m linking to all of this tour (in proofs). Continue reading

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 16 Comments

Do “underpowered” tests “exaggerate” population effects? (iv)


You will often hear that if you reach a just statistically significant result “and the discovery study is underpowered, the observed effects are expected to be inflated” (Ioannidis 2008, p. 64), or “exaggerated” (Gelman and Carlin 2014). This connects to what I’m referring to as the second set of concerns about statistical significance tests, power and magnitude errors. Here, the problem does not revolve around erroneously interpreting power as a posterior probability, as we saw in the fallacy in this post. But there are other points of conflict with the error statistical tester, and much that cries out for clarification — else you will misunderstand the consequences of some of today’s reforms.. Continue reading

Categories: power, reforming the reformers, SIST, Statistical Inference as Severe Testing | 16 Comments

Join me in reforming the “reformers” of statistical significance tests


The most surprising discovery about today’s statistics wars is that some who set out shingles as “statistical reformers” themselves are guilty of misdefining some of the basic concepts of error statistical tests—notably power. (See my recent post on power howlers.) A major purpose of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP) is to clarify basic notions to get beyond what I call “chestnuts” and “howlers” of tests. The only way that disputing tribes can get beyond the statistics wars is by (at least) understanding correctly the central concepts. But these misunderstandings are more common than ever, so I’m asking readers to help. Why are they more common (than before the “new reformers” of the last decade)? I suspect that at least one reason is the popularity of Bayesian variants on tests: if one is looking to find posterior probabilities of hypotheses, then error statistical ingredients may tend to look as if that’s what they supply.  Continue reading

Categories: power, SIST, statistical significance tests | Tags: , , | 2 Comments

Happy Birthday Neyman: What was Neyman opposing when he opposed the ‘Inferential’ Probabilists? Your weekend Phil Stat reading


Today is Jerzy Neyman’s birthday (April 16, 1894 – August 5, 1981). I’m reposting a link to a quirky, but fascinating, paper of his that explains one of the most misunderstood of his positions–what he was opposed to in opposing the “inferential theory”. The paper, fro 60 years ago,Neyman, J. (1962), ‘Two Breakthroughs in the Theory of Statistical Decision Making‘ [i] It’s chock full of ideas and arguments. “In the present paper” he tells us, “the term ‘inferential theory’…will be used to describe the attempts to solve the Bayes’ problem with a reference to confidence, beliefs, etc., through some supplementation …either a substitute a priori distribution [exemplified by the so called principle of insufficient reason] or a new measure of uncertainty” such as Fisher’s fiducial probability. It arises on p. 391 of Excursion 5 Tour III of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Here’s a link to the proofs of that entire tour. If you hear Neyman rejecting “inferential accounts,” you have to understand it in this very specific way: he’s rejecting “new measures of confidence or diffidence”. Here he alludes to them as “easy ways out”. He is not rejecting statistical inference in favor of behavioral performance as is typically thought. It’s amazing how an idiosyncratic use of a word 60 years ago can cause major rumblings decades later. Neyman always distinguished his error statistical performance conception from Bayesian and Fiducial probabilisms [ii]. The surprising twist here is semantical and the culprit is none other than…Allan Birnbaum. Yet Birnbaum gets short shrift, and no mention is made of our favorite “breakthrough” (or did I miss it?). You can find quite a lot on this blog searching Birnbaum. Continue reading

Categories: Bayesian/frequentist, Neyman | Leave a comment

Power howlers return as criticisms of severity

Mayo bangs head

Suppose you are reading about a statistically significant result x that just reaches a threshold p-value α from a test T+ of the mean of a Normal distribution

 H0: µ ≤  0 against H1: µ >  0

with n iid samples, and (for simplicity) known σ.  The test “rejects” H0 at this level & infers evidence of a discrepancy in the direction of H1.

I have heard some people say:

A. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is poor evidence of a discrepancy (from the null) corresponding to µ’.  (i.e., there’s poor evidence that  µ > µ’ ). See point* on language in notes.

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is warranted, or at least not problematic.

I have heard other people say:

B. If the test’s power to detect alternative µ’ is very low, then the just statistically significant x is good evidence of a discrepancy (from the null) corresponding to µ’ (i.e., there’s good evidence that  µ > µ’).

They will generally also hold that if POW(µ’) is reasonably high (at least .5), then the inference to µ > µ’ is unwarranted.

Which is correct, from the perspective of the frequentist error statistical philosophy? Continue reading

Categories: Statistical power, statistical tests | Tags: , , , , | 7 Comments

Insevere Tests of Severe Testing (iv)


One does not have evidence for a claim if little if anything has been done to rule out ways the claim may be false. The claim may be said to “pass” the test, but it’s one that utterly lacks stringency or severity. On the basis of this very simple principle, I build a notion of evidence that applies to any error prone inference. In this account, data x are evidence for a claim C only if (and only to the extent that) C has passed a severe test with x.[1] How to apply this simple idea, however, and how to use it to solve central problems of induction and statistical inference requires careful consideration of how it is to be fleshed out. (See this post on strong vs weak severity.) Continue reading

Categories: Error Statistics | 2 Comments

No fooling: The Statistics Wars and Their Casualties Workshop is Postponed to 22-23 September, 2022

The Statistics Wars
and Their Casualties

Postponed to
22-23 September 2022


London School of Economics (CPNSS)

Yoav Benjamini (Tel Aviv University), Alexander Bird (University of Cambridge), Mark Burgman (Imperial College London),
Daniele Fanelli (London School of Economics and Political Science), Roman Frigg (London School of Economics and Political Science), Stephen Guettinger (London School of Economics and Political Science), David Hand (Imperial College London), Margherita Harris (London School of Economics and Political Science), Christian Hennig (University of Bologna), Katrin Hohl *(City University London),
Daniël Lakens (Eindhoven University of Technology), Deborah Mayo (Virginia Tech), Richard Morey (Cardiff University), Stephen Senn (Edinburgh, Scotland), Jon Williamson (University of Kent) Continue reading

Categories: Error Statistics | Leave a comment

The AI/ML Wars: “explain” or test black box models?


I’ve been reading about the artificial intelligence/machine learning (AI/ML) wars revolving around the use of so-called “black-box” algorithms–too complex for humans, even their inventors, to understand. Such algorithms are increasingly used to make decisions that affect you, but if you can’t understand, or aren’t told, why a machine predicted your graduate-school readiness, or which drug a doctor should prescribe for you, etc, you’d likely be dissatisfied and want some kind of explanation. Being told the machine is highly accurate (in some predictive sense) wouldn’t suffice. A new AI field has grown up around the goal of developing (secondary) “white box” models to “explain” the workings of the (primary) black box model. Some call this explainable AI, or XAI. The black box is still used to reach predictions or decisions, but the explainable model is supposed to help explain why the output was reached. (The EU and DARPA in the U.S. have instituted broad requirements and programs for XAI.) Continue reading

Categories: machine learning, XAI/ML | 15 Comments

Philosophy of Science Association (PSA) 22 Call for Contributed Papers

PSA2022: Call for Contributed Papers


Twenty-Eighth Biennial Meeting of the Philosophy of Science Association
November 10 – November 13, 2022
Pittsburgh, Pennsylvania


Submissions open on March 9, 2022 for contributed papers to be presented at the PSA2022 meeting in Pittsburgh, Pennsylvania, on November 10-13, 2022. The deadline for submitting a paper is 11:59 PM Pacific Standard Time on April 6, 2022. 

Contributed papers may be on any topic in the philosophy of science. The PSA2022 Program Committee is committed to assembling a program with high-quality papers on a variety of topics and diverse presenters that reflects the full range of current work in the philosophy of science. Continue reading

Categories: Announcement | Leave a comment

January 11 Forum: “Statistical Significance Test Anxiety” : Benjamini, Mayo, Hand

Here are all the slides along with the video from the 11 January Phil Stat Forum with speakers: Deborah G. Mayo, Yoav Benjamini and moderator/discussant David Hand.

D. Mayo                 Y. Benjamini.           D. Hand

Continue reading

Categories: ASA Guide to P-values, ASA Task Force on Significance and Replicability, P-values, statistical significance | 2 Comments

Can’t Take the Fiducial Out of Fisher (if you want to understand the N-P performance philosophy) [i]


R.A. Fisher: February 17, 1890 – July 29, 1962

Continuing with posts in recognition of R.A. Fisher’s birthday, I reblog (with a few new comments) one from a few years ago on a topic that had previously not been discussed on this blog: Fisher’s fiducial probability

[Neyman and Pearson] “began an influential collaboration initially designed primarily, it would seem to clarify Fisher’s writing. This led to their theory of testing hypotheses and to Neyman’s development of confidence intervals, aiming to clarify Fisher’s idea of fiducial intervals (D.R.Cox, 2006, p. 195).

Continue reading

Categories: fiducial probability, Fisher, Phil6334/ Econ 6614, Statistics | Leave a comment

R.A. Fisher: “Statistical methods and Scientific Induction” with replies by Neyman and E.S. Pearson

17 Feb 1890-29 July 1962

In recognition of Fisher’s birthday (Feb 17), I reblog what I call the “Triad”–an exchange between  Fisher, Neyman and Pearson (N-P) a full 20 years after the Fisher-Neyman break-up–adding a few new introductory remarks here. While my favorite is still the reply by E.S. Pearson, which alone should have shattered Fisher’s allegations that N-P “reinterpret” tests of significance as “some kind of acceptance procedure”, they are all chock full of gems for different reasons. They are short and worth rereading. Neyman’s article pulls back the cover on what is really behind Fisher’s over-the-top polemics, what with Russian 5-year plans and commercialism in the U.S. Not only is Fisher jealous that N-P tests came to overshadow “his” tests, he is furious at Neyman for driving home the fact that Fisher’s fiducial approach had been shown to be inconsistent (by others). The flaw is glaring and is illustrated very simply by Neyman in his portion of the triad. Further details may be found in my book, SIST (2018) especially pp 388-392 linked to here. It speaks to a common fallacy seen every day in interpreting confidence intervals. As for Neyman’s “behaviorism”, Pearson’s last sentence is revealing. Continue reading

Categories: E.S. Pearson, Fisher, Neyman, phil/history of stat | Leave a comment

Blog at WordPress.com.