Higgs

Statistics and the Higgs Discovery: 5-6 yr Memory Lane

.

I’m reblogging a few of the Higgs posts at the 6th anniversary of the 2012 discovery. (The first was in this post.) The following, was originally “Higgs Analysis and Statistical Flukes: part 2″ (from March, 2013).[1]

Some people say to me: “This kind of [severe testing] reasoning is fine for a ‘sexy science’ like high energy physics (HEP)”–as if their statistical inferences are radically different. But I maintain that this is the mode by which data are used in “uncertain” reasoning across the entire landscape of science and day-to-day learning (at least, when we’re trying to find things out)[2] Even with high level theories, the particular problems of learning from data are tackled piecemeal, in local inferences that afford error control. Granted, this statistical philosophy differs importantly from those that view the task as assigning comparative (or absolute) degrees-of-support/belief/plausibility to propositions, models, or theories. 

“Higgs Analysis and Statistical Flukes: part 2”

Everyone was excited when the Higgs boson results were reported on July 4, 2012 indicating evidence for a Higgs-like particle based on a “5 sigma observed effect”. The observed effect refers to the number of excess events of a given type that are “observed” in comparison to the number (or proportion) that would be expected from background alone, and not due to a Higgsparticle. This continues my earlier post (part 1). It is an outsider’s angle on one small aspect of the statistical inferences involved. But that, apart from being fascinated by it, is precisely why I have chosen to discuss it: we [philosophers of statistics] should be able to employ a general philosophy of inference to get an understanding of what is true about the controversial concepts we purport to illuminate, e.g., significance levels.

Here I keep close to an official report from ATLAS, researchers define a “global signal strength” parameter “such that μ = 0 corresponds to the background only hypothesis and μ = 1 corresponds to the SM Higgs boson signal in addition to the background” (where SM is the Standard Model). The statistical test may be framed as a one-sided test, where the test statistic (which is actually a ratio) records differences in the positive direction, in standard deviation (sigma) units. Reports such as

Pr(Test T would yield at least a 5 sigma excess; H0: background only) = extremely low

are deduced from the sampling distribution of the test statistic, fortified with much cross-checking of results (e.g., by modeling and simulating relative frequencies of observed excesses generated with “Higgs signal +background” compared to background alone).  The inferences, even the formal statistical ones, go beyond p-value reports. For instance, they involve setting lower and upper bounds such that values excluded are ruled out with high severity, to use my term. But the popular report is in terms of the observed 5 sigma excess in an overall test T, and that is mainly what I want to consider here.

Error probabilities

In a Neyman-Pearson setting, a cut-off cα is chosen pre-data so that the probability of a type I error is low. In general,

Pr(d(X) > cαH0) ≤  α

and in particular, alluding to an overall test T:

(1) Pr(Test T yields d(X) > 5 standard deviations; H0) ≤  .0000003.

The test at the same time is designed to ensure a reasonably high probability of detecting global strength discrepancies of interest. (I always use “discrepancy” to refer to parameter magnitudes, to avoid confusion with observed differences).

[Notice these are not likelihoods.] Alternatively, researchers can report observed standard deviations (here, the sigmas), or equivalently, the associated observed statistical significance probability, p0. In general,

Pr(P < p0H0) < p0

and in particular,

(2) Pr(Test T yields P < .0000003H0.0000003.

For test T to yield a “worse fit” with H(smaller p-value) due to background alone is sometimes called “a statistical fluke” or a “random fluke”, and the probability of so statistically significant a random fluke is ~0.  With the March 2013 results, the 5 sigma difference has grown to 7 sigmas.

So probabilistic statements along the lines of (1) and (2) are standard.They allude to sampling distributions, either of test statistic d(X), or the p-value viewed as a random variable. They are scarcely illicit or prohibited. (I return to this in the last section of this post).

An implicit principle of inference or evidence

Admittedly, the move to taking the 5 sigma effect as evidence for a genuine effect (of the Higgs-like sort) results from an implicit principle of evidence that I have been calling the severity principle (SEV). Perhaps the weakest form is to a statistical rejection or falsification of the null. (I will deliberately use a few different variations on statements that can be made.)

Data x from a test T provide evidence for rejecting H0 (just) to the extent that H0 would (very probably) have survived, were it a reasonably adequate description of the process generating the data (with respect to the question).

It is also captured by a general frequentist principle of evidence (FEV) (Mayo and Cox 2010) and a variant on the general idea of severity (SEV) (EGEK 1996, Mayo and Spanos 2006).[3]

The sampling distribution is computed, under the assumption that the production of observed results is similar to the “background alone”, with respect to relative frequencies of signal-like events. (Likewise for computations under hypothesized discrepancies.) The relationship between H0 and the probabilities of outcomes is an intimate one: the various statistical nulls live their lives to refer to aspects of general types of data generating procedures (for a taxonomy, see Cox 1958, 1977).  “His true” is a shorthand for a very long statement that H0 is an approximately adequate model of a specified aspect of the process generating the data in the context. (This relates to statistical models and hypotheses living “lives of their own”.)

Severity and the detachment of inferences

The sampling distributions serve to give counterfactuals. In this case, they tell us what it would be like, statistically, were the mechanism generating the observed signals similar to H0.[i] While one would want to go on to consider the probability test T yields so statistically significant an excess under various alternatives to μ = 0, this suffices for the present discussion. Sampling distributions can be used to arrive at error probabilities that are relevant for understanding the capabilities of the test process, in relation to something we want to find out. Since a relevant test statistic is a function of the data and quantities about which we want to learn, the associated sampling distribution is the key to inference. (This is why the bootstrap, and other types of, re-sampling works when one has a random sample from the process or population of interest.)

The severity principle, put more generally:

Data from a test T[ii] provide good evidence for inferring H (just) to the extent that H passes severely with x0, i.e., to the extent that H would (very probably) not have survived the test so well were H false.

(The severity principle can also be made out just in terms of relative frequencies, as with bootstrap re-sampling.) In this case, what is surviving is minimally the non-null. Regardless of the specification of a statistical inference, to assess the severity associated with a claim H requires considering H‘s denial: together they exhaust the answers to a given question.

Without making such a principle explicit, some critics assume the argument is all about the reported p-value. The inference actually detached from the evidence can be put in any number of ways, and no uniformity is to be expected or needed:

(3) There is strong evidence for H: a Higgs (or a Higgs-like) particle.

(3)’ They have experimentally demonstrated  H: a Higgs (or Higgs-like) particle.

Or just, infer H.

Doubtless particle physicists would qualify these statements, but nothing turns on that. ((3) and (3)’ are a bit stronger than merely falsifying the null because certain properties of the particle must be shown. I leave this to one side.)

As always, the mere p-value is a pale reflection of the detailed information about the consistency of results that really fortifies the knowledge of a genuine effect. Nor is the precise improbability level what matters. We care about the inferences to real effects (and estimated discrepancies) that are warranted.

Qualifying claims by how well they have been probed

The inference is qualified by the statistical properties of the test, as in (1) and (2), but that does not prevent detaching (3). This much is shown: they are able to experimentally demonstrate the Higgs particle. They can take that much of the problem as solved and move on to other problems of discerning the properties of the particle, and much else that goes beyond our discussion*. There is obeisance to the strict fallibility of every empirical claim, but there is no probability assigned.  Neither is there in day-to-day reasoning, nor in the bulk of scientific inferences, which are not formally statistical. Having inferred (3), granted, one may say informally, “so probably we have experimentally demonstrated the Higgs”, or “probably, the Higgs exists” (?). Or an informal use of “likely” might arise. But whatever these might mean in informal parlance, they are not formal mathematical probabilities. (As often argued on this blog, discussions on statistical philosophy must not confuse these.)

[We can however write, SEV(H) ~1]

The claim in (3) is approximate and limited–as are the vast majority of claims of empirical knowledge and inference–and, moreover, we can say in just what ways. It is recognized that subsequent data will add precision to the magnitudes estimated, and may eventually lead to new and even entirely revised interpretations of the known experimental effects, models and estimates. That is what cumulative knowledge is about. (I sometimes hear people assert, without argument, that modeled quantities, or parameters, used to describe data generating processes are “things in themselves” and are outside the realm of empirical inquiry. This is silly. Else we’d be reduced to knowing only tautologies and maybe isolated instances as to how “I seem to feel now,” attained through introspection.)

Telling what’s true about significance levels

So we grant the critic that something like the severity principle is needed to move from statistical information plus background (theoretical and empirical) to inferences about evidence and inference (and to what levels of approximation).  It may be called lots of other things and framed in different ways, and the reader is free to experiment . What we should not grant the critic is any allegation that there should be, or invariably is, a link from a small observed significance level to a small posterior probability assignment to H0. Worse, (1- the p-value) is sometimes alleged to be the posterior probability accorded to the Standard Model itself! This is neither licensed nor wanted!

If critics (or the p-value police, as Wasserman called them) maintain that Higgs researchers are misinterpreting their significance levels, correct them with the probabilities in (1) and (2). If they say, it is patently obvious that Higgs researchers want to use the p-value as a posterior probability assignment to H0, point out the more relevant and actually attainable [iii] inference that is detached in (3). If they persist that what is really, really wanted is a posterior probability assignment to the inference about the Higgs in (3), ask why? As a formal posterior probability it would require a prior probability on all hypotheses that could explain the data. That would include not just H and H0 but all rivals to the Standard Model, rivals to the data and statistical models, and higher level theories as well. But can’t we just imagine a Bayesian catchall hypothesis?  On paper, maybe, but where will we get these probabilities? What do any of them mean? How can the probabilities even be comparable in different data analyses, using different catchalls and different priors?[iv]

Degrees of belief will not do. Many scientists perhaps had (and have) strong beliefs in the Standard Model before the big collider experiments—given its perfect predictive success. Others may believe (and fervently wish) that it will break down somewhere (showing supersymmetry or whatnot); a major goal of inquiry is learning about viable rivals and how they may be triggered and probed. Research requires an open world not a closed one with all possibilities trotted out and weighed by current beliefs. [v] We need to point up what has not yet been well probed which, by the way, is very different from saying of a theory that it is “not yet probable”.

Those prohibited phrases

One may wish to return to some of the condemned phrases of particular physics reports. Take,

“There is less than a one in a million chance that their results are a statistical fluke”.

This is not to assign a probability to the null, just one of many ways (perhaps not the best) of putting claims about the sampling distribution:  The statistical null asserts that H0: background alone adequately describes the process.

H0 does not assert the results are a statistical fluke, but it tells us what we need to determine the probability of observed results “under H0”. In particular, consider all outcomes in the sample space that are further from the null prediction than the observed, in terms of p-values {x: p < p0}. Even when H0 is true, such “signal like” outcomes may occur. They are p<sub:0 level flukes. Were such flukes generated even with moderate frequency under H0, they would not be evidence against H0. But in this case, such flukes occur a teeny tiny proportion of the time. Then SEV enters: if we are regularly able to generate such teeny tiny p-values, we have evidence of a genuine discrepancy from H0.

I am repeating myself, I realize, on the hopes that at least one phrasing will drive the point home. Nor is it even the improbability that substantiates this, it is the fact that an extraordinary set of coincidences would have to have occurred again and again. To nevertheless retain H0 as the source of the data would block learning. (Moreover, they know that if some horrible systematic mistake was made, it would be detected in later data analyses.)

I will not deny that there have been misinterpretations of p-values, but if a researcher has just described performing a statistical significance test, it would be “ungenerous” to twist probabilistic assertions into posterior probabilities. It would be a kind of “confirmation bias” whereby one insists on finding one sentence among very many that could conceivably be misinterpreted Bayesianly.

Triggering, indicating, inferring

As always, the error statistical philosopher would distinguish different questions at multiple stages of the inquiry. The aim of many preliminary steps is “behavioristic” and performance oriented: the goal being to control error rates on the way toward finding excess events or bumps of interest.

If interested: See statistical flukes (part 3)

The original posts of parts 1 and 2 had around 30 comments each; you might want to look at them:

Part 1: https://errorstatistics.com/2013/03/17/update-on-higgs-data-analysis-statistical-flukes-1/

Part 2 https://errorstatistics.com/2013/03/27/higgs-analysis-and-statistical-flukes-part-2/

*Fisher insisted that to assert a phenomenon is experimentally demonstrable:[W]e need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result. (Fisher Design of Experiments 1947, 14).

2018/2015/2014 Notes

[0]Physicists manage to learn quite a lot from negative results. They’d love to find something more exotic, but the negative results will not go away. A recent article from CERN, “We need to talk about the Higgs” says: While there are valid reasons to feel less than delighted by the null results of searches for physics beyond the Standard Model (SM), this does not justify a mood of despondency. 

“Physicists aren’t just praying for hints of new physics, Strassler stresses. He says there is very good reason to believe that the LHC should find new particles. For one, the mass of the Higgs boson, about125.09 billion electron volts, seems precariously low if the census of particles is truly complete. Various calculations based on theory dictate that the Higgs mass should be comparable to a figure called the Planck mass, which is about 17 orders of magnitude higher than the boson’s measured heft.”The article is here.

[1]My presentation at a Symposium on the Higgs discovery at the Philosophy of Science Association (Nov. 2014) is here.

[2] I have often noted that there are other times where we are trying to find evidence to support a previously held position.

[3]Aspects of the statistical controversy in the Higgs episode occurs in Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Mayo 2018)

___________

Original notes:

[i] This is a bit stronger than merely falsifying the null here, because certain features of the particle discerned must also be shown. I leave details to one side.

[ii] Which almost always refers to a set of tests, not just one.

[iii] I sense that some Bayesians imagine P(H) is more “hedged” than to actually infer (3). But the relevant hedging, the type we can actually attain, is  given by an assessment of severity or corroboration or the like. Background enters via a repertoire of information about experimental designs, data analytic techniques, mistakes and flaws to be wary of, and a host of theories and indications about which aspects have/have not been severely probed. Many background claims enter to substantiate the error probabilities; others do not alter them.

[iv] In aspects of the modeling, researchers make use of known relative frequencies of events (e.g., rates of types of collisions) that lead to legitimate, empirically based, frequentist “priors” if one wants to call them that.

[v] After sending out the letter, prompted by Lindley, O’Hagan wrote up a synthesis https://errorstatistics.com/2012/08/25/did-higgs-physicists-miss-an-opportunity-by-not-consulting-more-with-statisticians/

REFERENCES (from March, 2013 post):

ATLAS Collaboration  (November 14, 2012),  Atlas Note: “Updated ATLAS results on the signal strength of the Higgs-like boson for decays into WW and heavy fermion final states”, ATLAS-CONF-2012-162. http://cds.cern.ch/record/1494183/files/ATLAS-CONF-2012-162.pdf

Cox, D.R. (1958), “Some Problems Connected with Statistical Inference,” Annals of Mathematical Statistics, 29: 357–72.

Cox, D.R. (1977), “The Role of Significance Tests (with Discussion),” Scandinavian Journal of Statistics, 4: 49–70.

Mayo, D.G. (1996), Error and the Growth of Experimental Knowledge, University of Chicago Press, Chicago.

Mayo, D. G. and Cox, D. R. (2010). “Frequentist Statistics as a Theory of Inductive Inference” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D Mayo and A. Spanos eds.), Cambridge: Cambridge University Press: 247-275.

Mayo, D.G., and Spanos, A. (2006), “Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction,” British Journal of Philosophy of Science, 57: 323–357.

Categories: Higgs, highly probable vs highly probed, P-values | Leave a comment

3 YEARS AGO (JULY 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: July 2014. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1]. Posts that are part of a “unit” or a group count as one. This month there are three such groups: 7/8 and 7/10; 7/14 and 7/23; 7/26 and 7/31.

July 2014

  • (7/7) Winner of June Palindrome Contest: Lori Wike
  • (7/8) Higgs Discovery 2 years on (1: “Is particle physics bad science?”)
  • (7/10) Higgs Discovery 2 years on (2: Higgs analysis and statistical flukes)
  • (7/14) “P-values overstate the evidence against the null”: legit or fallacious? (revised)
  • (7/23) Continued:”P-values overstate the evidence against the null”: legit or fallacious?
  • (7/26) S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)
  • (7/31) Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest Posts)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

 

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Higgs, P-values | Leave a comment

Higgs discovery three years on (Higgs analysis and statistical flukes)

3rd-birthday-cake2

.

2015: The Large Hadron Collider (LHC) is back in collision mode in 2015[0]. There’s a 2015 update, a virtual display, and links from ATLAS, one of two detectors at (LHC)) here. The remainder is from one year ago. (2014) I’m reblogging a few of the Higgs posts at the anniversary of the 2012 discovery. (The first was in this post.) The following, was originally “Higgs Analysis and Statistical Flukes: part 2″ (from March, 2013).[1]

Some people say to me: “This kind of reasoning is fine for a ‘sexy science’ like high energy physics (HEP)”–as if their statistical inferences are radically different. But I maintain that this is the mode by which data are used in “uncertain” reasoning across the entire landscape of science and day-to-day learning (at least, when we’re trying to find things out)[2] Even with high level theories, the particular problems of learning from data are tackled piecemeal, in local inferences that afford error control. Granted, this statistical philosophy differs importantly from those that view the task as assigning comparative (or absolute) degrees-of-support/belief/plausibility to propositions, models, or theories. 

“Higgs Analysis and Statistical Flukes: part 2”images

Everyone was excited when the Higgs boson results were reported on July 4, 2012 indicating evidence for a Higgs-like particle based on a “5 sigma observed effect”. The observed effect refers to the number of excess events of a given type that are “observed” in comparison to the number (or proportion) that would be expected from background alone, and not due to a Higgs particle. This continues my earlier post (part 1). It is an outsider’s angle on one small aspect of the statistical inferences involved. But that, apart from being fascinated by it, is precisely why I have chosen to discuss it: we [philosophers of statistics] should be able to employ a general philosophy of inference to get an understanding of what is true about the controversial concepts we purport to illuminate, e.g., significance levels. Continue reading

Categories: Higgs, highly probable vs highly probed, P-values, Severity | Leave a comment

A biased report of the probability of a statistical fluke: Is it cheating?

cropped-qqqq.jpg One year ago I reblogged a post from Matt Strassler, “Nature is Full of Surprises” (2011). In it he claims that

[Statistical debate] “often boils down to this: is the question that you have asked in applying your statistical method the most even-handed, the most open-minded, the most unbiased question that you could possibly ask?

It’s not asking whether someone made a mathematical mistake. It is asking whether they cheated — whether they adjusted the rules unfairly — and biased the answer through the question they chose…”

(Nov. 2014):I am impressed (i.e., struck by the fact) that he goes so far as to call it “cheating”. Anyway, here is the rest of the reblog from Strassler which bears on a number of recent discussions:


“…If there are 23 people in a room, the chance that two of them have the same birthday is 50 percent, while the chance that two of them were born on a particular day, say, January 1st, is quite low, a small fraction of a percent. The more you specify the coincidence, the rarer it is; the broader the range of coincidences at which you are ready to express surprise, the more likely it is that one will turn up.
Continue reading

Categories: Higgs, spurious p values, Statistics | 7 Comments

“Statistical Flukes, the Higgs Discovery, and 5 Sigma” at the PSA

We had an excellent discussion at our symposium yesterday: “How Many Sigmas to Discovery? Philosophy and Statistics in the Higgs Experiments” with Robert Cousins, Allan Franklin and Kent Staley. Slides from my presentation, “Statistical Flukes, the Higgs Discovery, and 5 Sigma” are posted below (we each only had 20 minutes, so this is clipped,but much came out in the discussion). Even the challenge I read about this morning as to what exactly the Higgs researchers discovered (and I’ve no clue if there’s anything to the idea of a “techni-higgs particle”) — would not invalidate* the knowledge of the experimental effects severely tested.

 

*Although, as always, there may be a reinterpretation of the results. But I think the article is an isolated bit of speculation. I’ll update if I hear more.

Categories: Higgs, highly probable vs highly probed, Statistics | 26 Comments

Philosophy of Science Assoc. (PSA) symposium on Philosophy of Statistics in the Higgs Experiments “How Many Sigmas to Discovery?”

psa-home

.

The biennial meeting of the Philosophy of Science Association (PSA) starts this week (Nov. 6-9) in Chicago, together with the History of Science Society. I’ll be part of the symposium:

 

How Many Sigmas to Discovery?
Philosophy and Statistics in the Higgs Experiments

 

on Nov.8 with Robert Cousins, Allan Franklin, and Kent Staley. If you’re in the neighborhood stop by.

 

Summary

“A 5 sigma effect!” is how the recent Higgs boson discovery was reported. Yet before the dust had settled, the very nature and rationale of the 5 sigma (or 5 standard deviation) discovery criteria began to be challenged and debated both among scientists and in the popular press. Why 5 sigma? How is it to be interpreted? Do p-values in high-energy physics (HEP) avoid controversial uses and misuses of p-values in social and other sciences? The goal of our symposium is to combine the insights of philosophers and scientists whose work interrelates philosophy of statistics, data analysis and modeling in experimental physics, with critical perspectives on how discoveries proceed in practice. Our contributions will link questions about the nature of statistical evidence, inference, and discovery with questions about the very creation of standards for interpreting and communicating statistical experiments. We will bring out some unique aspects of discovery in modern HEP. We also show the illumination the episode offers to some of the thorniest issues revolving around statistical inference, frequentist and Bayesian methods, and the philosophical, technical, social, and historical dimensions of scientific discovery.

   Questions:

1) How do philosophical problems of statistical inference interrelate with debates about inference and modeling in high energy physics (HEP)?

2) Have standards for scientific discovery in particle physics shifted? And if so, how has this influenced when a new phenomenon is “found”?

3) Can understanding the roles of statistical hypotheses tests in HEP resolve classic problems about their justification in both physical and social sciences?

4) How do pragmatic, epistemic and non-epistemic values and risks influence the collection, modeling, and interpretation of data in HEP?

 

Abstracts for Individual Presentations

robert cousins(1) Unresolved Philosophical Issues Regarding Hypothesis Testing in High Energy Physics
Robert D. Cousins.
Professor, Department of Physics and Astronomy, University of California, Los Angeles (UCLA)

The discovery and characterization of a Higgs boson in 2012-2013 provide multiple examples of statistical inference as practiced in high energy physics (elementary particle physics).  The main methods employed have a decidedly frequentist flavor, drawing in a pragmatic way on both Fisher’s ideas and the Neyman-Pearson approach.  A physics model being tested typically has a “law of nature” at its core, with parameters of interest representing masses, interaction strengths, and other presumed “constants of nature”.  Additional “nuisance parameters” are needed to characterize the complicated measurement processes.  The construction of confidence intervals for a parameter of interest q is dual to hypothesis testing, in that the test of the null hypothesis q=q0 at significance level (“size”) a is equivalent to whether q0 is contained in a confidence interval for q with confidence level (CL) equal to 1-a.  With CL or a specified in advance (“pre-data”), frequentist coverage properties can be assured, at least approximately, although nuisance parameters bring in significant complications.  With data in hand, the post-data p-value can be defined as the smallest significance level a at which the null hypothesis would be rejected, had that a been specified in advance.  Carefully calculated p-values (not assuming normality) are mapped onto the equivalent number of standard deviations (“s”) in a one-tailed test of the mean of a normal distribution. For a discovery such as the Higgs boson, experimenters report both p-values and confidence intervals of interest. Continue reading

Categories: Error Statistics, Higgs, P-values | Tags: | 18 Comments

Higgs discovery two years on (2: Higgs analysis and statistical flukes)

Higgs_cake-sI’m reblogging a few of the Higgs posts, with some updated remarks, on this two-year anniversary of the discovery. (The first was in my last post.) The following, was originally “Higgs Analysis and Statistical Flukes: part 2″ (from March, 2013).[1]

Some people say to me: “This kind of reasoning is fine for a ‘sexy science’ like high energy physics (HEP)”–as if their statistical inferences are radically different. But I maintain that this is the mode by which data are used in “uncertain” reasoning across the entire landscape of science and day-to-day learning (at least, when we’re trying to find things out)[2] Even with high level theories, the particular problems of learning from data are tackled piecemeal, in local inferences that afford error control. Granted, this statistical philosophy differs importantly from those that view the task as assigning comparative (or absolute) degrees-of-support/belief/plausibility to propositions, models, or theories.  Continue reading

Categories: Higgs, highly probable vs highly probed, P-values, Severity, Statistics | 14 Comments

Higgs Discovery two years on (1: “Is particle physics bad science?”)

Higgs_cake-s

July 4, 2014 was the two year anniversary of the Higgs boson discovery. As the world was celebrating the “5 sigma!” announcement, and we were reading about the statistical aspects of this major accomplishment, I was aghast to be emailed a letter, purportedly instigated by Bayesian Dennis Lindley, through Tony O’Hagan (to the ISBA). Lindley, according to this letter, wanted to know:

“Are the particle physics community completely wedded to frequentist analysis?  If so, has anyone tried to explain what bad science that is?”

Fairly sure it was a joke, I posted it on my “Rejected Posts” blog for a bit until it checked out [1]. (See O’Hagan’s “Digest and Discussion”) Continue reading

Categories: Bayesian/frequentist, fallacy of non-significance, Higgs, Lindley, Statistics | Tags: , , , , , | 4 Comments

The Science Wars & the Statistics Wars: More from the Scientism workshop

images-11-1Here are the slides from my presentation (May 17) at the Scientism workshop in NYC. (They’re sketchy since we were trying for 25-30 minutes.) Below them are some mini notes on some of the talks.

Now for my informal notes. Here’s a link to the Speaker abstracts;the presentations may now be found at the conference site here. Comments, questions, and corrections are welcome. Continue reading

Categories: evidence-based policy, frequentist/Bayesian, Higgs, P-values, scientism, Statistics, StatSci meets PhilSci | 11 Comments

Phil6334 Day #7: Selection effects, the Higgs and 5 sigma, Power

SEV CALCULATORBelow are slides from March 6, 2014: (a) the 2nd half of “Frequentist Statistics as a Theory of Inductive Inference” (Selection Effects),”* and (b) the discussion of the Higgs particle discovery and controversy over 5 sigma.physics pic yellow particle burst blue cone

We spent the rest of the seminar computing significance levels, rejection regions, and power (by hand and with the Excel program). Here is the updated syllabus  (3rd installment).

A relevant paper on selection effects on this blog is here.

Categories: Higgs, P-values, Phil6334, selection effects | Leave a comment

Is being lonely unnatural for slim particles? A statistical argument

pileofuniversesBeing lonely is unnatural, at least if you are a slim Higgs particle (with mass on the order of the type recently discovered)–according to an intriguing statistical argument given by particle physicist Matt Strassler (sketched below). Strassler sets out “to explain the scientific argument as to why it is so unnatural to have a Higgs particle that is “lonely” — with no other associated particles (beyond the ones we already know) of roughly similar mass.

This in turn is why so many particle physicists have long expected the LHC to discover more than just a single Higgs particle and nothing else… more than just the Standard Model’s one and only missing piece… and why it will be a profound discovery with far-reaching implications if, during the next five years or so, the LHC experts sweep the floor clean and find nothing more in the LHC’s data than the Higgs particle that was found in 2012. (Strassler)

What’s the natural/unnatural intuition here? In his “First Stab at Explaining ‘Naturalness’,” Strassler notes “the word ‘natural’ has multiple meanings.

The one that scientists are using in this context isn’t “having to do with nature” but rather “typical” or “as expected” or “generic”, as in, “naturally the baby started screaming when she bumped her head”, or “naturally it costs more to live near the city center”, or “I hadn’t worn those glasses in months, so naturally they were dusty.”  And unnatural is when the baby doesn’t scream, when the city center is cheap, and when the glasses are pristine. Usually, when something unnatural happens, there’s a good reason……

If you chose a universe at random from among our set of Standard Model-like worlds, the chance that it would look vaguely like our universe would be spectacularly smaller than the chance that you would put a vase down carelessly at the edge of the table and find it balanced, just by accident.

Why would it make sense to consider our universe selected at random, as if each one is equally probable?  What’s the relative frequency of possible people who would have done and said everything I did at every moment of my life?  Yet no one thinks this is unnatural. Nevertheless, it really, really bothers particle physicists that our class of universes is so incredibly rare, or would be, if we were in the habit of randomly drawing universes out of a bag, like blackberries (to allude to C.S. Peirce). Anyway, here’s his statistical argument:

I want you to imagine a theory much like the Standard Model (plus gravity). Let’s say it even has all the same particles and forces as the Standard Model. The only difference is that the strengths of the forces, and the strengths with which the Higgs field interacts with other particles and with itself (which in the end determines how much mass those particles have) are a little bit different, say by 1%, or 5%, or maybe even up to 50%. In fact, let’s imagine ALL such theories… all Standard Model-like theories in which the strengths with which all the fields and particles interact with each other are changed by up to 50%. What will the worlds described by these slightly different equations (shown in a nice big pile in Figure 2) be like?

Among those imaginary worlds, we will find three general classes, with the following properties.

  1. In one class, the Higgs field’s average value will be zero; in other words, the Higgs field is OFF. In these worlds, the Higgs particle will have a mass as much as ten thousand trillion (10,000,000,000,000,000) times larger than it does in our world. All the other known elementary particles will be massless …..
  2. In a second class, the Higgs field is FULL ON.  The Higgs field’s average value, and the Higgs particle’s mass, and the mass of all known particles, will be as much as ten thousand trillion (10,000,000,000,000,000) times larger than they are in our universe. In such a world, there will again be nothing like the atoms or the large objects we’re used to. For instance, nothing large like a star or planet can form without collapsing and forming a black hole.
  3. In a third class, the Higgs field is JUST BARELY ON.  It’s average value is roughly as small as in our world — maybe a few times larger or smaller, but comparable.  The masses of the known particles, while somewhat different from what they are in our world, at least won’t be wildly different. And none of the types of particles that have mass in our own world will be massless. In some of those worlds there can even be atoms and planets and other types of structure. In others, there may be exotic things we’re not used to. But at least a few basic features of such worlds will be recognizable to us.

Now: what fraction of these worlds are in class 3? Among all the Standard Model-like theories that we’re considering, what fraction will resemble ours at least a little bit?

The answer? A ridiculously, absurdly tiny fraction of them (Figure 3). If you chose a universe at random from among our set of Standard Model-like worlds, the chance that it would look vaguely like our universe would be spectacularly smaller than the chance that you would put a vase down carelessly at the edge of the table and find it balanced, just by accident.

In other words, if the Standard Model (plus gravity) describes everything that exists in our world, then among all possible worlds, we live in an extraordinarily unusual one — one that is as unnatural as a vase balanced to within an atom’s breadth of falling off or settling back on to the table. Classes 1 and 2 of universes are natural — generic — typical; most Standard Model-like theories are in those classes. Class 3, of which our universe is an example is a part, includes the possible worlds that are extremely non-generic, non-typical, unnatural. That we should live in such an unusual universe — especially since we live, quite naturally, on a rather ordinary planet orbiting a rather ordinary star in a rather ordinary galaxy — is unexpected, shocking, bizarre.  And it is deserving, just like the balanced vase, of an explanation.  One certainly has to suspect there might be a subtle mechanism, something about the universe that we don’t yet know, that permits our universe to naturally be one that can live on the edge.

Does it make sense to envision these possible worlds as somehow equally likely? I don’t see it.  How do they know that if an entity of whatever sort found herself on one of the ‘natural’ and common worlds that she wouldn’t manage to describe her physics so that her world was highly unlikely and highly unnatural? Maybe it seems unnatural because, after all, we’re here reporting on it so there’s a kind of “selection effect”.

An imaginary note to the Higgs particle:

Dear Higgs Particle: Not long ago, physicists were happy as clams to have discovered you  –you were on the cover of so many magazines, and the focus of so many articles. How much they celebrated your discovery…at first. Sadly, it now appears you are not up to snuff, you’re not all they wanted by a long shot, and I’m reading that some physicists are quite disappointed in you! You’re kind of a freak of nature; you may have been born this way, but the physicists were expecting you to be different, to be, well bigger, or if as tiny as you are, to at least be part of a group of particles, to have friends, you know, like a social network, else to have more mass, much, much, much more … They’re saying you must be lonely, and that– little particle–is quite unnatural.

Now, I’m a complete outsider when it comes to particle physics, and my ruminations will likely be deemed naïve to the physicists, but it seems to me that the familiar intuitions about naturalness are ones that occur within an empirical universe within which we (humans) have a large number of warranted expectations. When it comes to intuitions about the entire universe, what basis can there possibly be for presuming to know how you’re “expected” to behave, were you to fulfill their intuitions about naturalness? There’s a universe, and it is what it is. Doesn’t it seem a bit absurd to apply the intuitions applicable within the empirical world to the world itself? 

 It’s one thing to say there must be a good explanation, “a subtle mechanism” or whatever, but I’m afraid that if particle physicists don’t find the particle they’re after, they will stick us with some horrible multiverse of bubble universes. 

So, if you’ve got a support network out there, tell them to come out in the next decade or so, before they’ve decided they’ve “swept the floor clean”. The physicists are veering into philosophical territory, true, but their intuitions are the ones that will determine what kind of physics we should have, and I’m not at all happy with some of the non-standard alternatives on offer. Good luck, Mayo

Where does the multiverse hypothesis come in? In an article in Quanta by Natalie Wolchover

Physicists reason that if the universe is unnatural, with extremely unlikely fundamental constants that make life possible, then an enormous number of universes must exist for our improbable case to have been realized. Otherwise, why should we be so lucky? Unnaturalness would give a huge lift to the multiverse hypothesis, which holds that our universe is one bubble in an infinite and inaccessible foam. According to a popular but polarizing framework called string theory, the number of possible types of universes that can bubble up in a multiverse is around 10500. In a few of them, chance cancellations would produce the strange constants we observe.[my emphasis].

Does our universe regain naturalness under the multiverse hypothesis? No. It is still unnatural (if I’m understanding this right). Yet the physicists take comfort in the fact that under the multiverse hypothesis, “of the possible universes capable of supporting life — the only ones that can be observed and contemplated in the first place — ours is among the least fine-tuned.”

God forbid we should be so lucky to live in a universe that is “fine-tuned”![i]

What do you think?lhc


[i] Strassler claims this is a purely statistical argument, not one having to do with origins of the universe.

Categories: Higgs, Statistics | 20 Comments

11th bullet, multiple choice question, and last thoughts on the JSM

photo-on-8-4-13-at-3-40-pm1I. Apparently I left out the last bullet in my scribbled notes from Silver’s talk. There was an 11th. Someone sent it to me from a blog: revolution analytics:

11. Like scientists, journalists ought to be more concerned with the truth rather than just appearances. He suggested that maybe they should abandon the legal paradigm of seeking an adversarial approach and behave more like scientists looking for the truth.

OK. But, given some of the issues swirling around the last few posts, I think it’s worth noting that scientists are not disinterested agents looking for the truth—it’s only thanks to its (adversarial!) methods that they advance upon truth. Question: What’s the secret of scientific progress (in those areas that advance learning)?  Answer: Even if each individual scientist were to strive mightily to ensure that his/her theory wins out, the stringent methods of the enterprise force that theory to show its mettle or die (or at best remain in limbo). You might say, “But there are plenty of stubborn hard cores in science”. Sure, and they fail to advance. In those sciences that lack sufficiently stringent controls, the rate of uncorrected spin is as bad as Silver suggests it is in journalism. Think of social psychologist Diederik Stapel setting out to show what is already presumed to be believable. (See here and here and search this blog.).

There’s a strange irony when the same people who proclaim, “We must confront those all too human flaws and foibles that obstruct the aims of truth and correctness”, turn out to be enablers, by championing methods that enable flaws and foibles to seep through. It may be a slip of logic. Here’s a multiple choice question:

Multiple choice: Circle all phrases that correctly complete the “conclusion“:

Let’s say that factor F is known to obstruct the correctness/validity of solutions to problems, or that factor F is known to adversely impinge on inferences.

(Examples of such factors include: biases, limited information, incentives—of various sorts).

Factor F is known to adversely influence inferences.

Conclusion: Therefore any adequate systematic account of inference should _______

(a) allow F to influence inferences.
(b) provide a formal niche by which F can influence inferences.
(c) take precautions to block (or at least be aware of) the ability of F to adversely influence inferences.
(d) none of the above.

(For an example, see discussion of #7 in previous post.)

II. I may be overlooking sessions (inform me if you know of any), but I would have expected more on the statistics in the Higgs boson discoveries at the JSM 2013. Especially given the desire to emphasize the widespread contributions of statistics to the latest sexy science[i].  (At one point, I was asked about being part of a session on the five sigma effect in the Higgs boson discovery–not that I’m any kind of expert– by David Banks, because of my related blog posts (e.g., here), but people were already in other sessions. But I’m thinking about something splashy by statisticians in particle physics.) Did I miss? [ii]

III. I think it’s easy to see why lots of people showed up to hear Nate Silver: It’s fun to see someone “in the news”, be it from politics, finance, high tech, acting, TV, or, even academics–I, for one, was curious. I’m sure as many would have come out to hear Esther Duflo, Cheryl Sandberg, Fabiola Gionatti, or even Huma Abedin–to list some that happen to come to mind– or any number of others who have achieved recent recognition (and whose work intersects in some way with statistics). It’s interesting that I don’t see pop philosophers invited to give key addresses in yearly philosophy meetings; maybe because philosophers eschew popularity. I may be unaware of some; I don’t attend so many meetings.

IV. Other thoughts: I’ve only been to a handful of “official” statistics meetings. Obviously the # of simultaneous sessions makes the JSM a kind of factory experience, but that’s to be expected. But do people really need to purchase those JSM backpacks? I don’t know how much of the $400 registration fee goes to that, but it seems wasteful…. I saw people tossing theirs out, which I didn’t have the heart to do. Perhaps I’m just showing my outsider status.

V. Montreal: I intended to practice my French, but kept bursting into English too soon. Everyone I met (who lives there) complained about the new money and doing away with pennies in the near future. I wonder if we’re next.

[i]On Silver’s remark (in response to a “tweeted” question) that “data science” is a “sexed-up” term for statistics, I don’t know. I can see reflecting deeply over the foundations of statistical inference, but over the foundations of data analytics?

[ii] You don’t suppose the controversy about particle physics being “bad science” had anything to do with downplaying the Higgs statistics?

Categories: Higgs, Statistics, StatSci meets PhilSci | 5 Comments

What should philosophers of science do? (Higgs, statistics, Marilyn)

Marilyn Monroe not walking past a Higgs boson and not making it decay, whatever philosophers might say.

Marilyn Monroe not walking past a Higgs boson and not making it decay, whatever philosophers might say.

My colleague, Lydia Patton, sent me this interesting article, “The Philosophy of the Higgs,” (from The Guardian, March 24, 2013) when I began the posts on “statistical flukes” in relation to the Higgs experiments (here and here); I held off posting it partly because of the slightly sexist attention-getter pic  of Marilyn (in reference to an “irrelevant blonde”[1]), and I was going to replace it, but with what?  All the men I regard as good-looking have dark hair (or no hair). But I wanted to take up something in the article around now, so here it is, a bit dimmed. Anyway apparently MM was not the idea of the author, particle physicist Michael Krämer, but rather a group of philosophers at a meeting discussing philosophy of science and science. In the article, Krämer tells us:

For quite some time now, I have collaborated on an interdisciplinary project which explores various philosophical, historical and sociological aspects of particle physics at the Large Hadron Collider (LHC). For me it has always been evident that science profits from a critical assessment of its methods. “What is knowledge?”, and “How is it acquired?” are philosophical questions that matter for science. The relationship between experiment and theory (what impact does theoretical prejudice have on empirical findings?) or the role of models (how can we assess the uncertainty of a simplified representation of reality?) are scientific issues, but also issues from the foundation of philosophy of science. In that sense they are equally important for both fields, and philosophy may add a wider and critical perspective to the scientific discussion. And while not every particle physicist may be concerned with the ontological question of whether particles or fields are the more fundamental objects, our research practice is shaped by philosophical concepts. We do, for example, demand that a physical theory can be tested experimentally and thereby falsified, a criterion that has been emphasized by the philosopher Karl Popper already in 1934. The Higgs mechanism can be falsified, because it predicts how Higgs particles are produced and how they can be detected at the Large Hadron Collider.

On the other hand, some philosophers tell us that falsification is strictly speaking not possible: What if a Higgs property does not agree with the standard theory of particle physics? How do we know it is not influenced by some unknown and thus unaccounted factor, like a mysterious blonde walking past the LHC experiments and triggering the Higgs to decay? (This was an actual argument given in the meeting!) Many interesting aspects of falsification have been discussed in the philosophical literature. “Mysterious blonde”-type arguments, however, are philosophical quibbles and irrelevant for scientific practice, and they may contribute to the fact that scientists do not listen to philosophers.

I entirely agree that philosophers have wasted a good deal of energy maintaining that it is impossible to solve Duhemian problems of where to lay the blame for anomalies. They misrepresent the very problem by supposing there is a need to string together a tremendously long conjunction consisting of a hypothesis H and a bunch of auxiliaries Ai which are presumed to entail observation e. But neither scientists nor ordinary people would go about things in this manner. The mere ability to distinguish the effects of different sources suffices to pinpoint blame for an anomaly. For some posts on falsification, see here and here*.

The question of why scientists do not listen to philosophers was also a central theme of the recent inaugural conference of the German Society for Philosophy of Science. I attended the conference to present some of the results of our interdisciplinary research group on the philosophy of the Higgs. I found the meeting very exciting and enjoyable, but was also surprised by the amount of critical self-reflection. Continue reading

Categories: Higgs, Statistics, StatSci meets PhilSci | 88 Comments

Blog at WordPress.com.