This gets to a distinction I have tried to articulate, between explaining a known effect (like looking for a known object), and searching for an unknown effect (that may well not exist). In the latter, possible effects of “selection” or searching need to be taken account of. Of course, searching for the Higgs is akin to the latter, not the former, hence the joke in the recent New Yorker cartoon.
“Always the last place you look!”
Categories: philosophy of science, Statistics
| Tags: double counting, Higgs boson, selection effects
20 thoughts on ““Always the last place you look!””

The Statistics Wars & Their Casualties
Blog links (references)
Reviews of Statistical Inference as Severe Testing (SIST)
- P. Bandyopadhyay (2019) Notre Dame Philosophical Reviews
- C. Hennig (2019) Statistical Modeling, Causal. Inference, and Social Science blog
- A. Spanos (2019) OEconomia: History, Methodology, Philosophy
- R. Cousins 2020 (Preprint)
- S. Fletcher (2020) Philosophy of Science
- B. Haig (2020) Methods in Psychology
- C. Mayo-Wilson (2020 forthcoming) Philosophical Review
- T. Sterkenburg (2020) Journal for General Philosophy of Science
Interviews & Debates on PhilStat (2020)
- The Statistics Debate!with Jim Berger, Deborah Mayo, David Trafimow & Dan Jeske, moderator (10/15/20)
- The Filter podcast with Matt Asher (11/23/20)
- Philosophy of Data Science Series Keynote Episode 1: Revolutions, Reforms, and Severe Testing in Data Science with Glen Wright Colopy (11/24/20)
- Philosophy of Data Science Series Keynote Episode 2: The Philosophy of Science & Statistics with Glen Wright Colopy (12/01/20)
Interviews on PhilStat (2019)
Top Posts & Pages
- Jerzy Neyman and "Les Miserables Citations" (statistical theater in honor of his birthday yesterday)
- S. Senn: To infinity and beyond: how big are your data, really? (guest post)
- U-PHIL: Hennig and Gelman on Wasserman (2011)
- Stephen Senn: Is Pooling Fooling? (Guest Post)
- Mayo Pubs
- Little Bit of Logic (5 mini problems for the reader)
- S. Senn: Randomisation is not about balance, nor about homogeneity but about randomness (Guest Post)
- S. Senn: Evidence Based or Person-centred? A Statistical debate (Guest Post)
- D. Lakens responds to confidence interval crusading journal editors
- Summer Seminar
Conferences & Workshops
RMM Special Topic
Mayo & Spanos, Error Statistics
My Websites
Recent Posts: PhilStatWars
THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4
Final session: The Statistics Wars and Their Casualties: 8 December, Session 4
SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4
WORKSHOP
The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2
LOG IN/OUT
Archives
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
I think this is an example of what I have called ‘The problem of Hermione’
“How do you prove your wife is faithful? The puzzle of proving a wife faithful is precisely the one that Shakespeare has to solve in The Winter’s Tale. Although there have been some revisionist interpretations of this play inviting us to believe Leontes that Hermione is guilty, I am convinced that we are meant to interpret this as a tale of groundless jealous paranoia and since I am not a structuralist I regard authorial intentions as important . So what is the dramatic problem? Shakespeare has to secure a triple ‘conviction’. Through the play he has to convince any lingering doubters in the audience that Hermione is innocent. Within the play he has to convince Leontes of her innocence. Finally, again through the play he has to convince the audience that Leontes is convinced. How can he do this? There is no evidence he can bring. He resorts to an oracle: the oracle of Delphos. The oracle is delivered sealed and then opened: Hermione is innocent and a prophetic threat is delivered, ‘the king shall live without an heir unless that which is lost is found’. Leontes disbelieves the message but then immediately is brought news that his son has died, thus delivering both proof of the oracle’s prophetic power and a judgement on him. A broken man, Leontes realises his folly and repents.” Dicing with Death, Chapter 10
There is a huge literature on analysing so-called (clinical) ‘equivalence trials’ which attempts to grapple with variants of this problem. How do you prove that two treatments are identical?
Also similar is the problem of pharmacovigilance, which has been described as looking in a dark cellar for a black cat you hope isn’t there (not an original of mine, alas, and I wish who could find out who first coined this).
According to this page the original quote is:
“A philosopher is a blind man in a dark cellar at midnight looking for a black cat that isn’t there. He is distinguished from a theologian, in that the theologian finds the cat. He is also distinguished from a lawyer, who smuggles in a cat in his overcoat pocket, and emerges to produce it in triumph.” — William L. Prosser, “My Philosophy of Law,” Cornell Law Quarterly, 1942.
Corey, Thanks for the reference!
I had actually intended the cartoon as a deep and serious way to get at the issue of selection effects! By the way, Nietzsche’s answer to the definition of a philosopher in Corey’s comment is this:
“Philosophy, as I have so far understood and lived it, means living voluntarily among ice and high mountains—seeking out everything strange and questionable in existence, everything so far placed under a ban by morality. ….
Every attainment, every step forward in knowledge, follows from courage, from hardness against oneself, from cleanliness in relation to oneself.
I do not refute ideals, I merely put on gloves before them.” (Ecce Homo), How One Becomes What One Is
All such ideals are a decadent, nay-saying, route to artificially heightening the will to power.
That said, I don’t see the depth of difficulty in showing equivalence of two treatments—of course it is never a matter of proof.
I took the cartoon seriously and like it very much.
There are myriad difficulties in ‘proving’ equivalence compared to ‘proving’ a difference. One simple one is to do with the value of blinding. If you come to me and ask me to fake clinical trial data showing that two treatments are equivalent but without revealing the randomisation code, it’s a piece of cake. All I need to do is generate data for from a single distribution. The differences between the two groups of patients created at random will, then indeed, be random. However, if you ask me to generate data ‘proving’ that a new treatment is better than the control, I need to generate data from two distributions, not one, and in consequence I need to know exactly who got what.
I don’t think anybody would accept prrof of the efficacy of a homeopathic treatments by comparing it to a conventional one in a double bind trial and showing there was no difference. Showing a difference to placebo would be another matter.
See, for example, Senn SJ. Inherent difficulties with active control equivalence studies. Statistics in Medicine 1993; 12: 2367-2375
I took it that the bioequivalent drug, as in a generic variant, has to have the same ingredient as the original brand name, same strength etc. The effectiveness of the original drug is tested using methods quite lower than proof also, but once that’s given, it would seem to require just showing that the second is chemically identical and its effect differs by no more than some amount determined to be acceptable, perhaps similar to the expected variability of the original as produced in different batches or the like.
Of course no evidence against is not automatically evidence for: they’d need to argue that, were the differences more than such and such, with high probability one of the checks would have detected this. Clearly this is not the case with comparing a conventional drug with a homeopathic treatment. My current headache might well remain for about the same time if I take aspirin or Stephen Senn performs a voodoo chant. So this points to a lousy test rather than a necessary limit to the reasoning.
Anyway, you’re the expert here Stephen, and I know you’re part of the big literature on the statistics of showing bioequivalence (some of which I have read). I imagine you know of cases where it was found inequivalence later on?
It is true that the coating of generics can be different, which was the source of my problem some years ago. But clearly, that could have been avoided, and doesn’t point to any necessary evidential limit. Consider high precision null experiments in physics (e.g., 0 ether drag).
Yes, it would seem that a drug is a drug is a drug but it isn’t necessarily so and absorbtion can differ considerably between formulations. But in any case my blinding point stands. In conventional hypothesis testing you are trying to show that the data cannot reasonably come from one distribution and must come from two (active and placebo, say). In equivalance testing (whether bioequivalance or therapeutic equivalance) you are trying to show that the data cannot come from two distributions and must come from one. These two cases are logically very different in a way that conventional statistical testing does not really address. (Nor does conventional Bayesian modelling.)
Stephen: I didn’t say a drug is a drug is a drug.. And I understand the logical difference between trying to affirm no (relevant) difference, and trying to affirm a difference (I teach logic, remember),and I have no reason to question your criticism that one cannot say, about those double blind studies you object to, that there is a high probability a difference would have shown up, were the drugs relevantly inequivalent. That’s your expertise. My point is just that I do not see a necessary obstacle to reasoning that a relevant discrepancy is absent.
So did you work your voodoo? My headache is gone. Or was it the aspirins I took?
OK. Let me put it this way, a key issue of any study is competence. Is it competent to find a difference if it exists? If it finds a difference it is obviously competent to find it. How about if it dosen’t find it? Well, in a Bayesian formulation you can show that IF a study is competent the more you look (or if you like, the bigger the study is, other things being equal) and the more you fail to see a difference the more you can believe there is no difference. However, at the same time, IF you are NOT sure the study is competent, then the more you look and the more you fail to find a difference the more you come to doubt that the study was competent to find it.
In a game of hunt the thimble if you find it, the competence of your search strategy is pretty irrelevant: the thimble is on your finger. If you fail to find it all sorts of nagging doubts arise: is there a thimble? have I overlooked some obvious place? is my search strategy adequate ? am I stupid?
All biostatisticians working on equivalence trials know this and so does the FDA. That’s why they hate equivalence studies (in particular therapeutic equivalence studies – they are more relaxed about bioequivalence studies). However with the advent of biosimilars, these issue are being raised again.
arise
Stephen: I missed this comment. The cartoon was focused on cases of finding something (also an issue in Glymour’s new post), but I’m equally interested in not finding. Yes, it all comes down to being able to assess how capable the analysis is of finding differences if the exist. I should think your claim holds also for error statisticians, if the study is highly capable of unearthing a difference, and finds none, there’s evidence of no difference. Likewise, if capability to detect is open to doubt, failure to find can indicate failure of capability to detect. Right?
Yes – ish. The problem is that whereas you can increase the power of any study to anything you like conditional on competence by racking up the sample size, you can’t address the issue of competence directly. But a study that finds a difference is competent to find one. One that doesn’t may or may not be competent to find one, however large it is. If in a double blind trial I was able to distinguish homeopathic treatment from placebo that would be pretty impressive. If I failed to distinguish a homeopathic treatment from an active treatment, however large the trial, however narrow the confidence interval between the two, it would not carry the same conviction as regards validity of the homeopathic effect.
I don’t know whether this chimes with error statistics but I think it chimes with falsificationism. Falsifying something is just more impressive than ‘corroborating’ it.
Stephen: I meant to mention about possible implications for this issue of generic labeling we took up inMarch:
https://errorstatistics.com/2012/03/25/the-new-york-times-goes-to-war-against-generic-drug-manufacturers/
If there are weak grounds for deeming generics bioequivalent, then generics would need their own labels, which would mean their own testing, which would mean no discounted price, and the original drug maker* keeps the patent. That would help a lot of drug companies about to go over the “patent cliff”, but would increase costs for consumers. Has it happened, or does it happen often, that generic drug makers are blamed for not meeting bioequivalence (in subsequently discovered effects of the drug)?
*Of course, as Schachtman notes, no current law says this, but it would seem to be a basis for arguing for at least extending the patent.
Mayo,
I don’t believe the generic manufacturer’s failure to establish bioequivalence revives an expired patent, but it does leave the original sponsor/patent holder in a virtual monopoly position.
You might be interested in United States v. Generix Drug Corp., 460 U.S. 453 (1983) [http://www.law.cornell.edu/supremecourt/text/460/453]. There the drug was the drug, but the generic manufacturer had changed the “excipients,” which was enough to allow the FDA/US government to obtain an injunction to shut down the generic manufacture by showing “a reasonable possibility that the safety and effectiveness of [the] generic drug products might be affected by differences between their inactive “excipients” and those found in approved products.”
You can believe that the original patent holder likely dropped the dime on the generic manufacturer, after analyzing the generic product as marketed.
Nathan
Nathan: Thanks for your comment: “I don’t believe the generic manufacturer’s failure to establish bioequivalence revives an expired patent, but it does leave the original sponsor/patent holder in a virtual monopoly position.”
Indeed, that was my point, although I meant it as deliberately drawing out the implications of Senn’s critique or charge. I didn’t mean that they failed to establish it in the usual way, initially, only that some new evidence might have arisen indicating that, while purporting to have shown equivalence, they did not.
But I don’t understand your meaning in saying: “You can believe that the original patent holder likely dropped the dime on the generic manufacturer, after analyzing the generic product as marketed.”
I was referring to the cited case, where the FDA engaged in an enforcement action against the generic manufacturer because of the different binder in the pill. Call me cynical, but the reality is that the FDA is typically very busy, and the original patent holder/sponsor likely bought the generic product and analyzed it carefully to document the difference, and then sent a “public citizen” letter to the FDA to get the enforcement action started. I don’t know that; it’s just my hunch about how things work.
Nathan: Wow! I suspected you might mean that, but wasn’t sure. That suggests, possibly, that companies with the patent should be involved in marketing the generic. I have owned several drug stocks that had an important drug turning generic, and by and large, I found that they work with the generic company in some capacity (not fully revealed), at least to extend their jurisdiction (and keep the stock price from falling too much), but maybe I’m not aware of what their actual deal was. I’ll look the case up at some point. (Of course, that’s quite a while ago.) Thanks!
The so-called proprietary pharmaceutical houses sometimes do have subsidiaries that are in the generic market. The divide between these two business models is not absolute. Of course, you probably have read about any number of “pay to delay” deals between proprietary and generic manufacturers, which are only beginning to come under scrutiny for their potential restraint of trade implications.
Nathan: Yes, I’ve sometimes viewed them favorably, if I own the stock, for example, especially if the drug is valuable medically, and I know how many, many years they’ve struggled to convince the FDA. Perhaps it would be better if the same Co sold the drugs, but more cheaply once the patent deadline is reached, rather than have another outfit have to develop a generic and then demonstrate bioequivalence. Particularly if this is as difficult as Senn suggests. I realize this has gone some distance from my cartoon.
If I may so, those who have not worked on proving equivalence find it all very easy. In fact, because manufacturers of innovator drugs find it frequently necessary to change formulations but dislike the idea of going through a full scale development to get a formulation switch registered, they often run so-called ‘bridging studies’ themselves to prove equivalence between their own formulations. Such studies not infrequently fail and I myself have been involved in failed equivalence studies on many occasions. The most spectacular case was when switching from one dry-powder inhaler to another. We showed pretty convincingly that the new formulation of the drug had 1/4 the potency of the old. In fact 24 micro g of the new was not quite as efficacious as 6 micro g of the old. It is odd that in a world that seems to believe that there can be a huge difference in quality (and price) between wines using the same grapes it is assumed that generics are pretty much indistinguishable from innovator drugs.
Stephen: I’m granting your argument as to the difficulty in practice of warranting claims to bioequivalence. This is your field and I trust what you say. I don’t know whether you think they eventually do an adequate job. Even determining the relationship of the potencies would at least permit an adjustment in prescribed usage. Isn’t the ability to find “failed equivalence studies” a good thing, in that sense? Would it be adequate to show that the generic was no more different than differences in switching formulations within a single manufacturer? More than half our drugs are generics, so it would be very important to know if their acceptability is in practice based on poor statistics.