Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!


bending of starlight.

[T]he impressive thing about the 1919 tests of Einstein ‘s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted. The theory is incompatible with certain possible results of observation—in fact with results which everybody before Einstein would have expected. This is quite different from the situation I have previously described, [where]..it was practically impossible to describe any human behavior that might not be claimed to be a verification of these [psychological] theories.” (Popper, CR, [p. 36))


Popper lauds Einstein’s General Theory of Relativity (GTR) as sticking its neck out, bravely being ready to admit its falsity were the deflection effect not found. The truth is that even if no deflection effect had been found in the 1919 experiments it would have been blamed on the sheer difficulty in discerning so small an effect (the results that were found were quite imprecise.) This would have been entirely correct! Yet many Popperians, perhaps Popper himself, get this wrong.[i] Listen to Popperian Paul Meehl (with whom I generally agree).

The stipulation beforehand that one will be pleased about substantive theory T when the numerical results come out as forecast, but will not necessarily abandon it when they do not, seems on the face of it to be about as blatant a violation of the Popperian commandment as you could commit. For the investigator, in a way, is doing…what astrologers and Marxists and psychoanalysts allegedly do, playing heads I win, tails you lose.” (Meehl 1978, 821)

No, there is a confusion of logic. A successful result may rightly be taken as evidence for a real effect H, even though failing to find the effect need not be taken to refute the effect, or even as evidence as against H. This makes perfect sense if one keeps in mind that a test might have had little chance to detect the effect, even if it existed. The point really reflects the asymmetry of falsification and corroboration. Popperian Alan Chalmers wrote an appendix to a chapter of his book, What is this Thing Called Science? (1999)(which at first had criticized severity for this) once I made my case. [i]

For example, one of the sets of eclipse plates from Sobral (the controversial astrographic plates) were so blurred by a change of focus in the telescope that they precluded any decent estimate of the standard error. If all the eclipse results were like that, they would announce no deflection had been found. But this would not constitute evidence that the deflection effect didn’t exist, much less that GTR was false. Even if the deflection effect exists, the probability of failing to detect it with the crude 1919 instruments is high. So discerning no effect would not be evidence of no effect. To think otherwise would be an example of what we may call a fallacy of negative or non significant results. The eclipse tests, not just those of 1919, but all eclipse tests of the deflection effect, failed to give very precise results. Nothing like a stringent estimate of λ emerged until the field was rescued by radioastronomical data from quasars in the 1960s.

If one wants to go through the gymnastics of how the severity requirement cashes this out: Let H assert the Einstein deflection effect, and not-H, the Einstein effect is absent, or it is smaller than the predicted amount.[ii] Now to have evidence against H here is to have evidence for not-H. So we can just apply the severity criterion to not-H and see what happens. The observed failure to detect H is in accordance with not-H, so the first severity requirement holds, but there’s a high probability of this occurring even if H is true (not-H is false). So there’s poor evidence for not-H, on severity grounds.

By contrast, once instruments were available to powerfully detect any deflection effects, a non-show would have to be taken against its existence, and thus against GTR..

Popperian requirements are upheld: you are not free to readily interpret any result as consistent with H, much less as counting in favor of the Einstein prediction. However, failure to find data in accord with prediction H isn’t evidence against H if such a no show is easy to explain even if the predicted effect holds. (See also this post on Popper and pseudoscience)

See also chapter 8 of EGEK: Severe Tests and Novel Evidence.

[i] Alan Chalmers at first had this kind of case as a criticism of my account of severity in his “What is this thing called science?” After arguing my case, he did the very rare thing of amending his book before publication. It came as an Appendix.

Appendix: Happy meetings of theory and experiment. Many agree that the merit of a theory is demonstrated by the extent to which it survives severe tests. However, there is a wide wide class of cases of confirmation in science that do not fit readily into this picture, unless great care is taken in characterising severity of tests. The cases I have in mind involve significant matches between theory and observation in circumstances where a lack of match would not tell against the theory.

[Several examples follow.]

One common kind of situation in science involves making a novel prediction from a theory in conjunction with some complicated and perhaps dubious auxiliary assumptions.  [If the theory] is not confirmed, the problem could as well lie with the auxiliary hypotheses as with the theory. Consequently, it might appear that testing the prediction did not constitute a sever test of the theory. ….

Deborah Mayo’s characterisation of severity is able to accommodate these examples She will ask whether the confirmation would have been likely to occur if the theory were false.  Both in the case of my Copernican example and the dislocations example the answer is that they would be very unlikely to occur [were the theories false]…..Mayo’s conception of severity is in line with scientific practice. (Chalmers 1999, 210-212).

I heartily recommend Chalmers’ introductory text!



[i]The famous 1919 eclipse expeditions purported to test Einstein’s new account of gravity against the long-reigning Newtonian theory. According to Einstein’s theory of gravitation, to an observer on earth, light passing near the sun is deflected by an angle,λ, reaching its maximum of 1.75″ for light just grazing the sun, but light deflection would be undetectable on earth with the instruments available in 1919. Although the light deflection of stars near the sun (approximately 1 second of arc) would be detectable, the sun’s glare renders such stars invisible, save during a total eclipse, which “by strange good fortune” would occur on May 29, 1919.” ([1920] 1987, 113),

There were three hypotheses for which “it was especially desirable to discriminate between” (Dyson et.al, 1923, 291). Each is a statement about a parameter, the deflection of light at the limb of the sun, λ (in arc seconds):λ = 0 (no deflection);λ =.87 (Newton),λ=1.75” (Einstein). The Newtonian prediction deflection stems from assuming light has mass and follows Newton’s law of gravity.


Chalmers, A. 1999. What is This Thing Called Science?  3rd edition Hackett.

Dyson, E. W., A. S. Eddington, and C. Davidson. 1923. “A Determination of the Deflection of Light by the Sun’s Gravitational Field, from Observations Made at the Total Eclipse of May 29, 1919.” Memoirs of the Royal Astronomical Society LXII (1917-1923): 291–333.

Eddington, Arthur. 1987. Space, Time and Gravitation: An Outline of the General Relativity Theory. Cambridge Science Classics Series. Cambridge: Cambridge University Press.

Mayo, Deborah. 2010. “Learning from Error: The Theoretical Significance of Experimental Knowledge.” Edited by Kent Staley. The Modern Schoolman 87 (Experimental and Theoretical Knowledge) (The Ninth Henle Conference in the History of Philosophy) (May): 191–217.

Meehl, Paul. 1978. “Theoretical Risks and Tabular Asterisks:Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology.” Journal of Consulting and Clinical Psychology 1978, Vol. 46, 806-834.

Popper, Karl. 1962. Conjectures and Refutations: The Growth of Scientific Knowledge. New York: Basic Books.



Categories: fallacy of non-significance, philosophy of science, Popper, Severity, Statistics | Tags:

Post navigation

2 thoughts on “Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!

  1. Steven McKinney

    While the noisy nature of the images and measurements taken in 1919 allowed much debate as to the nature of the 1919 findings, current examples of gravitational lensing are truly awe inspiring.

    Click to access einstein_rings_natures_gravitational_lenses.pdf

    includes discussion of red-shift analysis of light allowing image decomposition to show the foreground (lensing) object and the lensed object behind it.

    Einstein would have loved this one:


    And hot off the press, just weeks ago, a supernova in a galaxy with a redshift of 1.49 (somewhere out near the edge of the observable universe) lensed by a galaxy cluster at a redshift of 0.54 (about 5 billion light years away)


Blog at WordPress.com.