Monthly Archives: November 2014


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: November 2011. I mark in red 3 posts that seem most apt for general background on key issues in this blog.*

  • (11/1) RMM-4:“Foundational Issues in Statistical Modeling: Statistical Model Specification and Validation*” by Aris Spanos, in Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”)
  • (11/3) Who is Really Doing the Work?*
  • (11/5) Skeleton Key and Skeletal Points for (Esteemed) Ghost Guest
  • (11/9) Neyman’s Nursery 2: Power and Severity [Continuation of Oct. 22 Post]
  • (11/12) Neyman’s Nursery (NN) 3: SHPOWER vs POWER
  • (11/15) Logic Takes a Bit of a Hit!: (NN 4) Continuing: Shpower (“observed” power) vs Power
  • (11/18) Neyman’s Nursery (NN5): Final Post
  • (11/21) RMM-5: “Low Assumptions, High Dimensions” by Larry Wasserman, in Rationality, Markets, and Morals (Special Topic: Statistical Science and Philosophy of Science: Where Do/Should They Meet?”) See also my deconstruction of Larry Wasserman.
  • (11/23) Elbar Grease: Return to the Comedy Hour at the Bayesian Retreat
  • (11/28) The UN Charter: double-counting and data snooping
  • (11/29) If you try sometime, you find you get what you need!

*I announced this new, once-a-month feature at the blog’s 3-year anniversary. I will repost and comment on one of the 3-year old posts from time to time. [I’ve yet to repost and comment on the one from Oct. 2011, but will shortly.] For newcomers, here’s your chance to catch-up; for old timers,this is philosophy: rereading is essential!


 Oct. 2011

Sept. 2011 (Within “All She Wrote (so far))












Categories: 3-year memory lane, Bayesian/frequentist, Statistics | Leave a comment

How likelihoodists exaggerate evidence from statistical tests


I insist on point against point, no matter how much it hurts

Have you ever noticed that some leading advocates of a statistical account, say a testing account A, upon discovering account A is unable to handle a certain kind of important testing problem that a rival testing account, account B, has no trouble at all with, will mount an argument that being able to handle that kind of problem is actually a bad thing? In fact, they might argue that testing account B is not a  “real” testing account because it can handle such a problem? You have? Sure you have, if you read this blog. But that’s only a subliminal point of this post.

I’ve had three posts recently on the Law of Likelihood (LL): Breaking the [LL](a)(b)[c], and [LL] is bankrupt. Please read at least one of them for background. All deal with Royall’s comparative likelihoodist account, which some will say only a few people even use, but I promise you that these same points come up again and again in foundational criticisms from entirely other quarters.[i]

An example from Royall is typical: He makes it clear that an account based on the (LL) is unable to handle composite tests, even simple one-sided tests for which account B supplies uniformly most powerful (UMP) tests. He concludes, not that his test comes up short, but that any genuine test or ‘rule of rejection’ must have a point alternative!  Here’s the case (Royall, 1997, pp. 19-20):

[M]edical researchers are interested in the success probability, θ, associated with a new treatment. They are particularly interested in how θ relates to the old treatment’s success probability, believed to be about 0.2. They have reason to hope θ is considerably greater, perhaps 0.8 or even greater. To obtain evidence about θ, they carry out a study in which the new treatment is given to 17 subjects, and find that it is successful in nine.

Let me interject at this point that of all of Stephen Senn’s posts on this blog, my favorite is the one where he zeroes in on the proper way to think about the discrepancy we hope to find (the .8 in this example). (See note [ii]) Continue reading

Categories: law of likelihood, Richard Royall, Statistics | Tags: | 18 Comments

Msc Kvetch: “You are a Medical Statistic”, or “How Medical Care Is Being Corrupted”

1119OPEDmerto-master495A NYT op-ed the other day,”How Medical Care Is Being Corrupted” (by Pamela Hartzband and Jerome Groopman, physicians on the faculty of Harvard Medical School), gives a good sum-up of what I fear is becoming the new normal, even under so-called “personalized medicine”. 

WHEN we are patients, we want our doctors to make recommendations that are in our best interests as individuals. As physicians, we strive to do the same for our patients.

But financial forces largely hidden from the public are beginning to corrupt care and undermine the bond of trust between doctors and patients. Insurers, hospital networks and regulatory groups have put in place both rewards and punishments that can powerfully influence your doctor’s decisions.

Continue reading

Categories: PhilStat/Med, Statistics | Tags: | 8 Comments

Erich Lehmann: Statistician and Poet

Erich Lehmann 20 November 1917 – 12 September 2009

Erich Lehmann                       20 November 1917 –              12 September 2009

Memory Lane 1 Year (with update): Today is Erich Lehmann’s birthday. The last time I saw him was at the Second Lehmann conference in 2004, at which I organized a session on philosophical foundations of statistics (including David Freedman and D.R. Cox).

I got to know Lehmann, Neyman’s first student, in 1997.  One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that).  He told me he was sitting in a very large room at an ASA meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, dark table sat just one book, all alone, shiny red.  He said he wondered if it might be of interest to him!  So he walked up to it….  It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after. Some related posts on Lehmann’s letter are here and here.

That same year I remember having a last-minute phone call with Erich to ask how best to respond to a “funny Bayesian example” raised by Colin Howson. It is essentially the case of Mary’s positive result for a disease, where Mary is selected randomly from a population where the disease is very rare. See for example here. (It’s just like the case of our high school student Isaac). His recommendations were extremely illuminating, and with them he sent me a poem he’d written (which you can read in my published response here*). Aside from being a leading statistician, Erich had a (serious) literary bent. Continue reading

Categories: highly probable vs highly probed, phil/history of stat, Sir David Cox, Spanos, Statistics | Tags: , | Leave a comment

Lucien Le Cam: “The Bayesians Hold the Magic”

lecamToday is the birthday of Lucien Le Cam (Nov. 18, 1924-April 25,2000): Please see my updated 2013 post on him.


Categories: Bayesian/frequentist, Statistics | Leave a comment

Why the Law of Likelihood is bankrupt–as an account of evidence



There was a session at the Philosophy of Science Association meeting last week where two of the speakers, Greg Gandenberger and Jiji Zhang had insightful things to say about the “Law of Likelihood” (LL)[i]. Recall from recent posts here and here that the (LL) regards data x as evidence supporting H1 over H0   iff

Pr(x; H1) > Pr(x; H0).

On many accounts, the likelihood ratio also measures the strength of that comparative evidence. (Royall 1997, p.3). [ii]

H0 and H1 are statistical hypothesis that assign probabilities to the random variable X taking value x.  As I recall, the speakers limited  H1 and H0  to simple statistical hypotheses (as Richard Royall generally does)–already restricting the account to rather artificial cases, but I put that to one side. Remember, with likelihoods, the data x are fixed, the hypotheses vary.

1. Maximally likely alternatives. I didn’t really disagree with anything the speakers said. I welcomed their recognition that a central problem facing the (LL) is the ease of constructing maximally likely alternatives: so long as Pr(x; H0) < 1, a maximum likely alternative H1 would be evidentially “favored”. There is no onus on the likelihoodist to predesignate the rival, you are free to search, hunt, post-designate and construct a best (or better) fitting rival. If you’re bothered by this, says Royall, then this just means the evidence disagrees with your prior beliefs.

After all, Royall famously distinguishes between evidence and belief (recall the evidence-belief-action distinction), and these problematic cases, he thinks, do not vitiate his account as an account of evidence. But I think they do! In fact, I think they render the (LL) utterly bankrupt as an account of evidence. Here are a few reasons. (Let me be clear that I am not pinning Royall’s defense on the speakers[iii], so much as saying it came up in the general discussion[iv].) Continue reading

Categories: highly probable vs highly probed, law of likelihood, Richard Royall, Statistics | 63 Comments

A biased report of the probability of a statistical fluke: Is it cheating?

cropped-qqqq.jpg One year ago I reblogged a post from Matt Strassler, “Nature is Full of Surprises” (2011). In it he claims that

[Statistical debate] “often boils down to this: is the question that you have asked in applying your statistical method the most even-handed, the most open-minded, the most unbiased question that you could possibly ask?

It’s not asking whether someone made a mathematical mistake. It is asking whether they cheated — whether they adjusted the rules unfairly — and biased the answer through the question they chose…”

(Nov. 2014):I am impressed (i.e., struck by the fact) that he goes so far as to call it “cheating”. Anyway, here is the rest of the reblog from Strassler which bears on a number of recent discussions:

“…If there are 23 people in a room, the chance that two of them have the same birthday is 50 percent, while the chance that two of them were born on a particular day, say, January 1st, is quite low, a small fraction of a percent. The more you specify the coincidence, the rarer it is; the broader the range of coincidences at which you are ready to express surprise, the more likely it is that one will turn up.
Continue reading

Categories: Higgs, spurious p values, Statistics | 7 Comments

The Amazing Randi’s Million Dollar Challenge

09randi3-master675-v2-1The NY Times Magazine had a feature on the Amazing Randi yesterday, “The Unbelievable Skepticism of the Amazing Randi.” It described one of the contestants in Randi’s most recent Million Dollar Challenge, Fei Wang:

“[Wang] claimed to have a peculiar talent: from his right hand, he could transmit a mysterious force a distance of three feet, unhindered by wood, metal, plastic or cardboard. The energy, he said, could be felt by others as heat, pressure, magnetism or simply “an indescribable change.” Tonight, if he could demonstrate the existence of his ability under scientific test conditions, he stood to win $1 million.”

Isn’t “an indescribable change” rather vague?

…..The Challenge organizers had spent weeks negotiating with Wang and fine-tuning the protocol for the evening’s test. A succession of nine blindfolded subjects would come onstage and place their hands in a cardboard box. From behind a curtain, Wang would transmit his energy into the box. If the subjects could successfully detect Wang’s energy on eight out of nine occasions, the trial would confirm Wang’s psychic power. …”

After two women failed to detect the “mystic force” the M.C. announced the contest was over.

“With two failures in a row, it was impossible for Wang to succeed. The Million Dollar Challenge was already over.”

You think they might have given him another chance or something.

“Stepping out from behind the curtain, Wang stood center stage, wearing an expression of numb shock, like a toddler who has just dropped his ice cream in the sand. He was at a loss to explain what had gone wrong; his tests with a paranormal society in Boston had all succeeded. Nothing could convince him that he didn’t possess supernatural powers. ‘This energy is mysterious,’ he told the audience. ‘It is not God.’ He said he would be back in a year, to try again.”

The article is here. If you don’t know who A. Randi is, you should read it.

Randi, much better known during Uri Geller spoon-bending days, has long been the guru to skeptics and fraudbusters, but also a hero to some critical psi believers like I.J. Good. Geller continually sued Randi for calling him a fraud. As such, I.J. Good warned me that I might be taking a risk in my use of “gellerization” in EGEK (1996), but I guess Geller doesn’t read philosophy of science. A post on “Statistics and ESP Research” and Diaconis is here.


I’d love to have seen Randi break out of these chains!


Categories: Error Statistics | Tags: | 3 Comments

“Statistical Flukes, the Higgs Discovery, and 5 Sigma” at the PSA

We had an excellent discussion at our symposium yesterday: “How Many Sigmas to Discovery? Philosophy and Statistics in the Higgs Experiments” with Robert Cousins, Allan Franklin and Kent Staley. Slides from my presentation, “Statistical Flukes, the Higgs Discovery, and 5 Sigma” are posted below (we each only had 20 minutes, so this is clipped,but much came out in the discussion). Even the challenge I read about this morning as to what exactly the Higgs researchers discovered (and I’ve no clue if there’s anything to the idea of a “techni-higgs particle”) — would not invalidate* the knowledge of the experimental effects severely tested.


*Although, as always, there may be a reinterpretation of the results. But I think the article is an isolated bit of speculation. I’ll update if I hear more.

Categories: Higgs, highly probable vs highly probed, Statistics | 26 Comments

Philosophy of Science Assoc. (PSA) symposium on Philosophy of Statistics in the Higgs Experiments “How Many Sigmas to Discovery?”



The biennial meeting of the Philosophy of Science Association (PSA) starts this week (Nov. 6-9) in Chicago, together with the History of Science Society. I’ll be part of the symposium:


How Many Sigmas to Discovery?
Philosophy and Statistics in the Higgs Experiments


on Nov.8 with Robert Cousins, Allan Franklin, and Kent Staley. If you’re in the neighborhood stop by.



“A 5 sigma effect!” is how the recent Higgs boson discovery was reported. Yet before the dust had settled, the very nature and rationale of the 5 sigma (or 5 standard deviation) discovery criteria began to be challenged and debated both among scientists and in the popular press. Why 5 sigma? How is it to be interpreted? Do p-values in high-energy physics (HEP) avoid controversial uses and misuses of p-values in social and other sciences? The goal of our symposium is to combine the insights of philosophers and scientists whose work interrelates philosophy of statistics, data analysis and modeling in experimental physics, with critical perspectives on how discoveries proceed in practice. Our contributions will link questions about the nature of statistical evidence, inference, and discovery with questions about the very creation of standards for interpreting and communicating statistical experiments. We will bring out some unique aspects of discovery in modern HEP. We also show the illumination the episode offers to some of the thorniest issues revolving around statistical inference, frequentist and Bayesian methods, and the philosophical, technical, social, and historical dimensions of scientific discovery.


1) How do philosophical problems of statistical inference interrelate with debates about inference and modeling in high energy physics (HEP)?

2) Have standards for scientific discovery in particle physics shifted? And if so, how has this influenced when a new phenomenon is “found”?

3) Can understanding the roles of statistical hypotheses tests in HEP resolve classic problems about their justification in both physical and social sciences?

4) How do pragmatic, epistemic and non-epistemic values and risks influence the collection, modeling, and interpretation of data in HEP?


Abstracts for Individual Presentations

robert cousins(1) Unresolved Philosophical Issues Regarding Hypothesis Testing in High Energy Physics
Robert D. Cousins.
Professor, Department of Physics and Astronomy, University of California, Los Angeles (UCLA)

The discovery and characterization of a Higgs boson in 2012-2013 provide multiple examples of statistical inference as practiced in high energy physics (elementary particle physics).  The main methods employed have a decidedly frequentist flavor, drawing in a pragmatic way on both Fisher’s ideas and the Neyman-Pearson approach.  A physics model being tested typically has a “law of nature” at its core, with parameters of interest representing masses, interaction strengths, and other presumed “constants of nature”.  Additional “nuisance parameters” are needed to characterize the complicated measurement processes.  The construction of confidence intervals for a parameter of interest q is dual to hypothesis testing, in that the test of the null hypothesis q=q0 at significance level (“size”) a is equivalent to whether q0 is contained in a confidence interval for q with confidence level (CL) equal to 1-a.  With CL or a specified in advance (“pre-data”), frequentist coverage properties can be assured, at least approximately, although nuisance parameters bring in significant complications.  With data in hand, the post-data p-value can be defined as the smallest significance level a at which the null hypothesis would be rejected, had that a been specified in advance.  Carefully calculated p-values (not assuming normality) are mapped onto the equivalent number of standard deviations (“s”) in a one-tailed test of the mean of a normal distribution. For a discovery such as the Higgs boson, experimenters report both p-values and confidence intervals of interest. Continue reading

Categories: Error Statistics, Higgs, P-values | Tags: | 18 Comments

Blog at