My rationale for the last post is really just to highlight such passages as:
“Particle physicists have agreed, by convention, not to view an observed phenomenon as a discovery until the probability that it be a statistical fluke be below 1 in a million, a requirement that seems insanely draconian at first glance.” (Strassler)….
Even before the dust had settled regarding the discovery of a Standard Model-like Higgs particle, the nature and rationale of the 5-sigma discovery criterion began to be challenged. But my interest now is not in the fact that the 5-sigma discovery criterion is a convention, nor with the choice of 5. It is the understanding of “the probability that it be a statistical fluke” that interests me, because if we can get this right, I think we can understand a kind of equivocation that leads many to suppose that significance tests are being misinterpreted—even when they aren’t! So given that I’m stuck, unmoving, on this bus outside of London for 2+ hours (because of a car accident)—and the internet works—I’ll try to scratch out my point (expect errors, we’re moving now). Here’s another passage…
“Even when the probability of a particular statistical fluke, of a particular type, in a particular experiment seems to be very small indeed, we must remain cautious. …Is it really unlikely that someone, somewhere, will hit the jackpot, and see in their data an amazing statistical fluke that seems so impossible that it convincingly appears to be a new phenomenon?”
A very sketchy nutshell of the Higgs statistics: There is a general model of the detector, and within that model researchers define a “global signal strength” parameter “such that H0: μ = 0 corresponds to the background only hypothesis and μ = 1 corresponds to the Standard Model (SM) Higgs boson signal in addition to the background” (quote from an ATLAS report). The statistical test may be framed as a one-sided test; the test statistic records differences in the positive direction, in standard deviation or sigma units. The interest is not in the point against point hypotheses, but in finding discrepancies from H0 in the direction of the alternative, and then estimating their values. The improbability of the 5-sigma excess alludes to the sampling distribution associated with such signal-like results or “bumps”, fortified with much cross-checking of results (The observable bumps form from observable excess events):
The probability of observing a result as extreme as 5 sigmas, under the assumption it was generated by background alone, that is, under H0, is approximately 1 in 3,500,000. Alternatively, we hear: the “probability that the results were just a statistical fluke is 1 in 3,500,000”.
Yet many critics have claimed that this is to fallaciously apply the probability “to the explanation” H0. It is a common allegation, but a careful look shows this is not so. H0 does not say the observed results are due to background alone, although were H0 true (about what’s generating the data), it follows that various results would occur with specified probabilities. Thus we get the sampling distribution of d(X) under H0. For example, the probability of a type I error (false positive) is low.
(1) Pr(d(X) > 5; H0) ≤ .0000003.
These computations are based on simulating what it would be like were H0 (given a detector model). So particle physicists are not slipping in a posterior probability on H0. In fact it is an ordinary error probability (of a type 1 error) or significance level. In terms of the corresponding p-value:
(2) Pr(test T produces a p-value < .0000003; H0) < .0000003.
Now the inference that is actually detached from the evidence can be put in any number of ways:
(3) There is strong evidence for (or they have experimentally demonstrated) H: a Higgs (or a Higgs-like) particle.
Granted, inferring (3) relies on an implicit principle of evidence beyond (1) or (2): Data provide evidence for rejecting H0 (just) to the extent that H0 would (very probably) have survived, were it a reasonably adequate description of the process generating the data (with respect to the question).[A variant of the severe or stringent testing requirement for evidence.]
Here, with probability .9999997, the test would generate less impressive bumps than these, under H0. So, very probably H0 would have survived, were μ = 0.
So while it is true that some cases in science commit the fallacy of “transposing the conditional” from a low significance level to a low posterior to the null—in many other cases, what’s going on is precisely as in the case of the Higgs. If you go back to examples where the fallacy is alleged with this in mind, I think you will find they mostly evanesce.
Once the null is rejected, confidence intervals take over to check if various parameters agree with the SM predictions. Now the corresponding null hypothesis is the SM Higgs boson H’0 (Cousins, p.18 ), and discrepancies from it are probed. It is here that we actually get to the most important role served by statistical significance tests: affording a standard for denying sufficient evidence of a new discovery.
The basic principle here is that: An observed difference from a test T does not provide evidence for rejecting H’0 if even larger bumps are fairly easily produced where H’0 is a reasonably adequate description of the process generating the data (with respect to the question).
In determining results do not meet the “new discovery” threshold, it is not merely formal statistics that is involved, but various problems, such as the fact that an anomalous bump shows up at CMS but not at ATLAS, that the effect dissipates with increasing data.
 Remember the “how well did they do?” scrutiny? http://understandinguncertainty.org/explaining-5-sigma-higgs-how-well-did-they-do: thumbs up or down.
- “The probability of the background alone fluctuating up by this amount or more is about one in three million.”
- “only one experiment in three million would see an apparent signal this strong in a universe without a Higgs.”
The following two are thumbs down (according to the critic)
- Both groups said that the likelihood that their signal was a result of a chance fluctuation was less than one chance in 3.5 million,
- There is less than a one in a million chance that their results are a statistical fluke.
But correctly understood, all four are thumbs up.The incorrect one that he states alludes to .
 Correction: The correct claim is that the complement of a “false positive”–within significance testing– is a “true negative”. So, Pr(test T does not reject H0 ; H0)= 1 – the corresponding type 1 error probability.
Expect errors–please note corrections, I’ll update it when I reach land—we’re moving!
Some previous posts on Higgs & 5 sigma standard:
March 17, 2013: Update on Higgs data analysis: statistical flukes (part 1)
March 27, 2013: Higgs analysis and statistical flukes (part 2)
April 4, 2013: Guest Post. Kent Staley: On the Five Sigma Standard in Particle Physics
I totally disagree. I think you’ve described the reasoning correctly, but I can’t see phrasing like,
“Particle physicists have agreed, by convention, not to view an observed phenomenon as a discovery until the probability that it be a statistical fluke be below 1 in a million, a requirement that seems insanely draconian at first glance,”
as anything other than a failure to correctly describe the error statistical warrant for the claim. The claim “the observed phenomenon is a statistical fluke” is equivalent to “the observed phenomenon is not reproducible” which is a statistical hypothesis, not an event. On its face, the quoted phrasing is talking about the probability of an hypothesis!
To me, your “correctly understood” reads as “if we ignore the plain meaning of the words and look at the math these words attempt (and fail!) to describe”. You yourself are so very careful never to get this mixed up; I don’t know why you would defend this confusion of ideas.
And why do scientists repeatedly make this slip? On this I should not care to dogmatize…