I was just reading a paper by Martin and Liu (2014) in which they allude to the “questionable logic of proving H0 false by using a calculation that assumes it is true”(p. 1704). They say they seek to define a notion of “plausibility” that
“fits the way practitioners use and interpret p-values: a small p-value means H0 is implausible, given the observed data,” but they seek “a probability calculation that does not require one to assume that H0 is true, so one avoids the questionable logic of proving H0 false by using a calculation that assumes it is true“(Martin and Liu 2014, p. 1704).
Questionable? A very standard form of argument is a reductio (ad absurdum) wherein a claim C is inferred (i.e., detached) by falsifying ~C, that is, by showing that assuming ~C entails something in conflict with (if not logically contradicting) known results or known truths [i]. Actual falsification in science is generally a statistical variant of this argument. Supposing H0 in p-value reasoning plays the role of ~C. Yet some aver it thereby “saws off its own limb”!
“[P]aradoxically, when we achieve our goal and successfully reject H0 we will actually be left in complete existential vacuum because during the rejection of H0 NHST ‘saws off its own limb’ (Jaynes, 2003; p. 524): If we manage to reject H0 then it follows that pr(data or more extreme data|H0) is useless because H0 is not true” (p.15).
Here’s Jaynes (p. 524):
“Suppose we decide that the effect exists; that is, we reject [null hypothesis] H0. Surely, we must also reject probabilities conditional on H0, but then what was the logical justification for the decision? Orthodox logic saws off its own limb.’ “
But this reasoning would saw off the legs of all hypothetical testing or falsification. The entailment from a provisional hypothesis or model H to x, whether it is statistical or deductive, does not go away after the hypothesis or model H is rejected on grounds that the prediction is not born out.[i] It is called an argumentative or implicationary assumption in logic. It is not questionable, but the strongest form of scientific reasoning. When particle physicists deduce the events that would be expected with immensely high probability under H0: background alone (e.g., bumps would disappear with more data), the derivation does not get sawed off when H0 is refuted! The conditional claim remains. And if the statistical test passes an audit (of its assumptions), H0 is statistically falsified. (Search Higgs on this blog.)[ii]
Now I don’t know if the limb-sawing charge is behind Martin and Liu’s claim that finding H0 “false by using a calculation that assumes it is true” is “questionable”(we’re not told), but I know this: If you’re seeing limb-sawing in p-value logic, you’re sawing off the limbs of reductio arguments; since it’s a mistake to saw off reductios, it follows that seeing limb-sawing in P-value logic is a mistake [ii].
Send me your limb-sawers: I’m now collecting other examples of the limb-sawing fallacy; for years I skipped by them with a silent “hah!”, trying to avert my eyes–not wanting to acknowledge how logic can “go on a holiday” even in some discussions by brilliant statisticians. But now I think the problem bears taking seriously, so please send me examples you come across [iii].
[i]Reductio ad absurdum, a form of argument where one provisionally assumes one or more claims, derives a contradiction from them, and then concludes that at least one of those claims must be false. A reductio argument …is specifically aimed at bringing someone to reject some belief (an arbitrary encyclopedia entry).
[ii] Actually the most important function of the p-value, as I see it, is to block rejections of H0 when the p-value is not small. We reason: if even larger differences than the observed d0 would be produced fairly often even if we supposed H0 adequately describes the data-generating mechanism, then d0 does not warrant rejecting H0.
[iii] The limb-sawing fallacy makes an appearance, but without attribution, in my new book [i] (“Statistical Inference as Severe Testing”, which is currently undergoing a final round of edits). Fans of Jaynes exhorted me not to attach his name to this howler, and I obliged. To their credit, they acknowledged his flaw.
Jaynes, E. T. 2003. Probability Theory: The Logic of Science. Cambridge: Cambridge University Press.
Szucs, D. and Ioannidis, J. 2016. “When null hypothesis significance testing is unsuitable for research: a reassessment”
Martin, R. and Liu, C. (2014), “A Note on P-Values Interpreted as Plausibilities” Statistica Sinica, Vol. 24, No. 4 (October 2014), pp. 1703-1716.
*Some other comedy hour posts:
(09/03/11) Overheard at the comedy hour at the Bayesian retreat
(4/4/12) Jackie Mason: Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
(04/28/12) Comedy Hour at the Bayesian Retreat: P-values versus Posteriors
(05/05/12) Comedy Hour at the Bayesian (Epistemology) Retreat: Highly Probable vs Highly Probed
(09/03/12) After dinner Bayesian comedy hour…. (1 year anniversary)
(09/08/12) Return to the comedy hour…(on significance tests)
(04/06/13) Who is allowed to cheat? I.J. Good and that after dinner comedy hour….
(04/27/13) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)