I suppose[ed] this was somewhat of a joke from the ISBA, prompted by Dennis Lindley, but as I [now] accord the actual extent of jokiness to be only ~10%, I’m sharing it on the blog [i]. Lindley (according to O’Hagan) wonders why scientists require so high a level of statistical significance before claiming to have evidence of a Higgs boson. It is asked: “Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?”
Bad science? I’d really like to understand what these representatives from the ISBA would recommend, if there is even a shred of seriousness here (or is Lindley just peeved that significance levels are getting so much press in connection with so important a discovery in particle physics?)
Well, read the letter and see what you think.
On Jul 10, 2012, at 9:46 PM, ISBA Webmaster wrote:
Dear Bayesians,
A question from Dennis Lindley prompts me to consult this list in search of answers.
We’ve heard a lot about the Higgs boson. The news reports say that the LHC needed convincing evidence before they would announce that a particle had been found that looks like (in the sense of having some of the right characteristics of) the elusive Higgs boson. Specifically, the news referred to a confidence interval with 5-sigma limits.
Now this appears to correspond to a frequentist significance test with an extreme significance level. Five standard deviations, assuming normality, means a p-value of around 0.0000005. A number of questions spring to mind.
1. Why such an extreme evidence requirement? We know from a Bayesian perspective that this only makes sense if (a) the existence of the Higgs boson (or some other particle sharing some of its properties) has extremely small prior probability and/or (b) the consequences of erroneously announcing its discovery are dire in the extreme. Neither seems to be the case, so why 5-sigma?
2. Rather than ad hoc justification of a p-value, it is of course better to do a proper Bayesian analysis. Are the particle physics community completely wedded to frequentist analysis? If so, has anyone tried to explain what bad science that is?
3. We know that given enough data it is nearly always possible for a significance test to reject the null hypothesis at arbitrarily low p-values, simply because the parameter will never be exactly equal to its null value. And apparently the LNC has accumulated a very large quantity of data. So could even this extreme p-value be illusory?
If anyone has any answers to these or related questions, I’d be interested to know and will be sure to pass them on to Dennis.
Regards,
Tony
—-
Professor A O’Hagan
Email: a.ohagan@sheffield.ac.uk
Department of Probability and Statistics
University of Sheffield
So given that the Higgs boson does not have such an extremely small prior probability, a proper Bayesian analysis would have enabled evidence of the Higgs long before attaining such an “extreme evidence requirement”. Why has no one tried to explain to these scientists how with just a little Bayesian analysis, they might have been done in last year or years ago? I take it the Bayesian would also enjoy the simplicity and freedom of not having to adjust to take account of what physicists call “the Look Elsewhere Effect” (LEE[ii]
Let’s see if there’s a serious follow-up.[iii]
[i] bringing it down from my “Msc Kvetching page” where I’d put it last night.
[ii] For a discussion of how the error statistical philosophy avoids the classic criticisms of significance tests, see Mayo & Spanos (2011) ERROR STATISTICS. Other articles may be found on the link to my publication page.
[iii] O’Hagan just informed me of several replies to his letter at the following:
According to Louis Lyons (“Discovery or fluke: Statistics in Particle Physics”, _Physics Today_ July 2012, pages 45-51):
“Current LHC analyses employ both [frequentist and Bayesian methods]. It is true that particle physicists tend to favor frequentist methods more than most other scientists do. But they often employ Bayesian methods for dealing with nuisance parameters associated with systematic uncertainties.”
http://www.physicstoday.org/resource/1/phtoad/v65/i7/p45_s1
… Not sure if it’s a coincidence or not, but this issue of _Physics Today_ also includes a commentary (by N. David Mermin) promoting a Bayesian interpretation of Quantum Mechanics, and also a book review of _The Theory That Would Not Die: How Bayes’ Rule …_.
Paul: Thanks for your comment. I’m not sure if dealing with nuisance parameters isn’t sometimes deemed “Bayesian” simply because it uses background information about the parameters. This to me doesn’t make it tantamount to giving prior distributions at all. I didn’t read your link yet, just wanted to give a quick reply.
See Larry Wasserman’s blogpost today on “The p-value Police”. http://normaldeviate.wordpress.com/2012/07/11/the-higgs-boson-and-the-p-value-police/
Also, for a smattering of replies to the ISBN news page (presumably most from Bayesians, but see Wasserman’s thoughtful response):
http://bayesian.org/forums/news/3648
The Look Elsewhere Effect is indeed part of the rationale behind the 5-sigma requirement for what particle physicists call “Observation” reports. I think there is also a bit of a hangover from problems that arise from data selection criteria in the past not having been set in a data-blind manner. This was a problem in the first evidence for the top quark found at CDF. Joe Incandela, who is a spokersperson for CMS, was in CDF at the time, and took up the role of convenor for an important analysis group in the top search right after they had gone though some serious internal difficulties over this. Both ATLAS and CMS have serious data-blinding procedures in place, but I think there is a lingering cautiousness that results in a certain discounting of p-values in particles physics. But the rationale for it is a perfectly legitimate frequentist one.
Hi Kent: Isn’t it strange that essentially the same issue/effect is given so many different names? I thought I’d heard them all, from searching, dredging, fishing, hunting, tuning, peeking, mining, etc., until a few years ago I heard “looking for the pony”, and now, just last week, a new one: “looking elsewhere”. Someone should write an analysis of this (linguistic) phenomenon–what do you think?
I actually saw the Lindley-prompted email earlier this week, too. I agree that the “bad science” line was, indeed, a purposeful combination of funny and rude (we might disagree on the percentages).
Here’s a little bit more on the science and sociology of the discovery, from my perspective as a physicist in a different field:
http://www.andrewjaffe.net/blog/news/000540.html
http://www.andrewjaffe.net/blog/science/000538.html
And I talked about “5 sigma” a few months ago:
http://www.andrewjaffe.net/blog/science/000512.html
I haven’t really thought much about the “look elsewhere effect”. As you’ve gathered, it’s basically a way of dealing with the width of particle’s profile in the historam, narrow compared to the regime over which the data give information. I think a Bayesian *would* have a better way (or at least as good a way, within the competing paradigms) of dealing with this. We would first marginalise over any parameters for the height and shape of the distribution to measure the mass, and then there is probably a reasonable model-comparison result to give a Bayesian version of the “significance of detection”. Though I suspect that the better way to look at it (channeling Gelman’s objections to model comparison) would just be to look at the distribution in the continuous mass – cross-section space, where the cross section essentially controls the height of the peak and therefore whether there is a real detection. Indeed, the experiments do produce something that at least looks like this (although I don’t know if it’s actually anything like a likelihood).
I was pretty sure that there would be a Higgs, at about this mass, before last week. But to reiterate my usual point about Bayesian analysis, I don’t think that’s terribly relevant to what prior one should use in a scientific publication — I want to know what the results would be “as if” there was a much more agnostic prior. But indeed last year’s 2.5 sigma result *was* good enough for many of us.
However, there is a significant caveat: I think that the more general requirement for such a ridiculously high significance is exactly because we don’t really trust our (probabilistic) models for the experiments we perform. Hence the (frequentist!) adage “half of all 3 sigma results are wrong”.
Jaffe: Thanks much for your comments and links to your blog!
“I actually saw the Lindley-prompted email earlier this week, too. I agree that the “bad science” line was, indeed, a purposeful combination of funny and rude (we might disagree on the percentages).
Hmm, nobody else mentioned the question of how one was to take Lindley’s queries, channeled through O’Hagan. Doubtless it wasn’t intended for frequentists in exile. I don’t recall why I was even on the e-mail list (I think Bernardo); maybe now I’ll be removed.
I know very little about this particle physics experiment (though I’d like to know much more) so I can’t comment on your suggestion for “a Bayesian version of the ‘significance of detection’” . But it seems this further probing is an outgrowth of just the worry that caring about the LEE gives rise to. If an account thought there was no difference at all whether and how much searching had occurred, then would the attempt to make up for it 9in some way) even occur? Eventually the error would be exposed, but that’s the point of the precaution, or call for more stringent probing or however one wants to describe it. That’s what I’m reading anyway…
” general requirement for such a ridiculously high significance is exactly because we don’t really trust our (probabilistic) models ”
Curiously enough enough, that exact same observation can be used to argue that we should not trust very low p-values (say, less than 0.001) (Fisher argued this!), because such low p-values depend on our model being correct _far out in the tail_, and about that we can never be sure! so quoting a p-value, like, say 10^-10
is just nonsense, because there can be so many small effect _not accounted for in the model_ that contributes way larger probabilities!
on the LEE in particle physics, see:
http://arxiv.org/abs/1005.1891
http://arxiv.org/abs/1201.4604
see also the workshop: “Statistical issues relevant to significance of discovery claims” at BIRS
http://www.birs.ca/events/2010/5-day-workshops/10w5068
==> talk by Eilam Gross: Look Elsewhere Effect
PS: I fully agree with Larry Wasserman’s comment on the Bayes forum (http://bayesian.org/forums/news/3648).
Most Statisticians are studying data from social science or biology. We are in these fields far from the level of modelling reached in physics, in particular in particle physics where the “Standard Model” based on gauge theories allow to compute predictions with very high degree of precision [which can reach 10^-11 or more, see http://en.wikipedia.org/wiki/Anomalous_magnetic_dipole_moment ]
as Larry puts it: “The p-value is not illusory. This is not social science where the null is always false and it is only a matter fo time until we reject. In this case, the physics defines the null and alternatively clearly.”
Anon: thanks for this, the Banff forums looks extremely interesting. I too concur with Wasserman’s comment on the Bayes forum page.
One year since the Higgs discovery: http://profmattstrassler.com/2013/07/04/happy-independhiggs-day/