No-Pain Philosophy (part 3): A more contemporary perspective

See (Part 2)

See (Part 1)


7.  How the story turns out (not well)

This conception of testing, which Lakatos called “sophisticated methodological falsificationism,” takes us quite a distance from the more familiar if hackneyed conception of Popper as a simple falsificationist.[i]  It called for warranting a host of different methodological rules for each of the steps along the way in order to either falsify or corroborate hypotheses.  But it doesn’t end well.  They had no clue how to warrant such rules as reliable for their tasks.  As for Duhem’s problem, Popperians generally assume it is never really solvable (at least in any interesting scientific tests), but that testing is always done within a large-scale theory or paradigm. The “arrow of modus tollens”, in this view, is always directed at the cluster of theories (or paradigm, or disciplinary matrix, or large scale theory) containing the primary hypothesis H and the auxiliary hypotheses and background theories. It is further imagined that the paradigm stipulates which parts of the background theories and auxiliaries to attempt to blame if you get into trouble with anomalies, and which portions of the theory must be retained at all costs (the hard core).  Even though one might reconstruct episodes in the history of science along these lines, the account fails to provide forward-looking tools for finding things out.  It’s all just a matter of “rational reconstruction” of intuitively sound scientific episodes (another one of Lakatos’ terms).

What about the problem of inferring you have a genuine anomaly (a falsifying hypothesis)?  This too is left at a very unsatisfactory level, e.g., the anomaly is real if it will not go away. However, Popper himself thought that thanks to hypotheses that stand up to severe tests, “we can be reasonably successful in attributing our refutations to definite portions of the theoretical maze.  (“For we are reasonably successful in this—a fact which must remain inexplicable for one who adopts Duhem’s and Quine’s view on the matter.”) (C & R, 1963, 242).  But that does not mean he supplied an adequate account for warranting this important fact. He did not.

Due to his own “deductivist” language, and the logical empiricist assumptions about theory testing at the time, Popper remained caught up in tangles of language, and was unable to cash out required notions. For an example of the former, Popperians say things like: it is warranted to infer (prefer or believe) H (because H has passed a severe test), but there is no justification for H (because “justifying” H would mean H was true or highly probable). For an example of the latter, Popperians will say a hypothesis H must be subjected to severe tests, where severe tests are defined as tests that would with high probability falsify H, if H is false, but then have no answer when asked: how can you warrant saying that a given test is or would be severe (i.e., that it has a high probability of finding flaws in H, if they exist)?  At most Popper could illustrate with intuitive examples of tests thought to have poor probing power: e.g., tests that do not require novel predictions, but can at most account for known effects.  That is fine, so far as it goes, but intuitive illustrations do not suffice.

8. Roles for Statistics

Yet it is precisely at these points, in solving the problems of sophisticated methodological falsificationism (not that I would use that name), that statistical methods can and do enter. Statistical methods can warrant inferences about genuine, systematic effects, or, alternatively, warrant inferring that all systematic statistical information has been adequately captured by a hypothesized model (Spanos 2007). Statistical methods and models provide roomy niches in between substantive scientific theories, models and hypotheses on the one hand, and a host of more local and intermediate questions, where the effects of different factors may be modeled, probed, and distinguished. Had Popper made use of the statistical testing ideas being developed around the same time (and around the corner), he might have been able to substantiate his account of methodological falsification and justified his recognition that we do manage to solve our Duhemian problems in practice.[ii]  Statistical models and methods are excellent examples of how we succeed in both inferring genuine anomalies (falsifying hypothesis) and pinpointing their source.  The key, however, is the opposite of holism, testing within a paradigm, or any of the “largisms” that have entered the philosophical scene, but rather to entirely local, piecemeal inquiries, split off from the questions they may later serve to answer.  We don’t have to affirm each auxiliary hypothesis, it suffices to distinguish their effects, and/or subtract them out afterwards (see Mayo 1996, EGEK, chapter 1).

9. Jettison the traditional way of formulating “Duhem’s problem”

Let me be clear that I do not advocate that statisticians go back to Popper to obtain illumination for their problems of evidence and inference.  On the contrary, I am rather disappointed that philosophers have not, by and large, offered more realistic replacements for testing accounts that are mostly stuck in a logical empiricist time-warp.  I happen to like Popper’s work (Peirce is better)–only because I can “translate him”–, but I never would be discussing Popper here were I not rather surprised to see him coming up in the writing of statisticians (of various inclinations). I want to spare them some dead-ends, but most importantly, get us over some straight-jacketed ways of talking that have trickled down from logical empiricism.  (Ironically, contemporary philosophers of science, especially in the U.S., almost never make use of Popper.)

Furthermore, I argue that we should reject the pattern of argument associated with Duhem’s problem in the first place (that first premise: If H & A1 &…and & An then O).  If you think about it, I think you’ll agree: Scientists, in any interesting test, do not try to form a conjunction of background theories and auxiliary hypotheses to derive a particular data set.  (Think of something like going from the general theory of relativity (GTR) to predictions of the timing data in a particular pair of binary pulsars.) Duhemian problems are real, but the very way of putting the problem is silly and has caused much mischief.  But that’s an issue for another time.

Note: I’ve said all this much more clearly and fully in published works, most of which are available through this page.  For a very quick overview on this blog—but skip the first few paragraphs on how I couldn’t get my key to work in London), one source is Nov. 5, 2011.

Mayo, D. (2006). “Critical Rationalism and Its Failure to Withstand Critical Scrutiny,” in C. Cheyne and J. Worrall (eds.) Rationality and Reality: Conversations with Alan Musgrave, Kluwer Series Studies in the History and Philosophy of Science, Springer: The Netherlands: 63-99.

Popper, K. (1963).  Conjectures and Refutations: The Growth of Scientific Knowledge, Routledge: London, New York.

Spanos, A. (2007) “Curve-Fitting, the Reliability of Inductive Inference and the Error-Statistical Approach,” Philosophy of Science 74(5): 1046-1066.

[i] That Lakatos departed still further from Popper does not mean that we need to go there as well.

[ii] I really do have a letter from Popper telling me he regrets never having learned modern statistics.  This is when I sent him a 1990 paper on severity, asking whether this was not really what he had meant.  I have an earlier letter, less interesting.

Categories: No-Pain Philosophy, philosophy of science | Tags: , , , ,

Post navigation

10 thoughts on “No-Pain Philosophy (part 3): A more contemporary perspective

  1. Mark

    I must confess to being a statistician who has already gone “back to Popper” (I’ve read much of his stuff, much of it more than once), and actually found great solace in his writing because that’s just not they way we, as modern statisticians, are taught to think. However, I am grateful to you for rescuing “inductive reasoning” for me… I also confess to internal conflict between the Popperian and the Fisherian. That said, I believe that there is a very good reason for statisticians of the frequentist ilk to turn back to Popper AND Fisher… At least in my field of public health, the fundamental concept of probability often seems to have been misplaced. People applying frequentist statistical methods seem happy to use “probability” as a synonym for “uncertainty”, almost as a strange marriage between subjective probability and something like fate. Even Nassim Taleb, whose writings first turned me onto Popper, seems to conflate the two when he talks about “improbable events” (although he does confess to thinking of probability mostly in a “qualitative manner”). Thus, I’m happy to take up Popper’s flag in a stance against probabilism (both of the subjective Bayesian and faux frequentist sorts), but thanks to you (and, ironically, Popper himself in “clocks and clouds”) I now have a better understanding of how frequentist probability does allow for inductive reasoning. So thanks for this.

  2. Thanks so much Mark, I hadn’t noticed this. (I had better check that I’m getting all comments sent to my e-mail.) It’s fine to turn back to him; I’m pleased that statisticians look at philosophy of science altogether. I just didn’t want people to think I was advocating him as directly relevant for statisticians, and find it ironic that some have looked to him, but have taken away mixed messages. I’d be glad to hear more about your examples in public health, and I am gratified if I could enlighten on induction. Obviously, there’s more to the story…please stay tuned.

  3. Mark

    Thanks Deborah! The examples in public health are numerous, so I’ll just mention the ubiquitous “risk factor epidemiology”, whose purpose is to divide populations into risk categories using population-level (i.e., actuarial-type) “probability” models, and then apply these at the individual level (see e.g., To my mind, these individual risks cannot be construed as “single-event probabilites” in Popper’s sense without making some super-strong (and completely ridiculous) assumptions; they almost seem Bayesian, but I’m pretty sure that no self-respecting Bayesian would claim them as such. They pretty clearly don’t fit into the error-statistics approach to me (does anybody ever assess the error rates associated with these predictions? also, note that the concept of “sampling distribution” seems to be missing from such risk calculators). Seems like inductive reasoning without any objective basis.

  4. Mark:
    “They pretty clearly don’t fit into the error-statistics approach to me (does anybody ever assess the error rates associated with these predictions?”
    I’m surprised you say that, I’m pretty sure the insurance companies care about error rates! The acceptability of the risk model would seem to depend on the population model reliability assigning these risk rates.
    Of course that differs from my recommended use of error probabilities in scientific inference, which I admit is different than these actuarial estimates. Here it is not a matter just of low long run error rates. In scientific inferences about a hypothesis or theory, one uses error probabilities (associated with a hypothesis or inference) in order to represent what it would be like were one or another hypothesis correct/incorrect about an aspect of the underlying data generating mechanism.
    But back to your example, as you say, people do not regard the actuarial risk estimates as “single case probabilities” (whatever those are), but at most estimates of the relative frequency of an outcome in different categories. I will look at the paper you sent .

    • Mark

      Deborah, I apologize that my previous message wasn’t sufficiently clear, I certainly do not want to waste your time or take you away from your other writing (which I’m totally digging, by the way… Just finished Error and Inference). My use of the term “actuarial-like” made it seem like I was talking about insurance companies (which operate more like casinos, and so certainly THEY care about error), but what I really meant were the vast quantities of “public health science” papers and dissertations that assess population-level “risk factors” (with little regard for error, and I’ve admittedly contributed to this literature) and then apply those at the individual level. That link I sent was to a silly individual-level risk calculator based on a population-level model. I can recommend papers if you are interested.

  5. Deborah,

    Some readers of my blog ( ) have suggested that the practical import of our two blogs are similar, although our conceptual bases are quite similar. Looking at this post, I can see why. I have worked with students of Popper who regard his insights as critical, and the more critical the further you get away from controlled lab conditions. But he is stronger at pointing out problems than solutions.

    I have just blogged on JS Mill’s System of Logic at , which I suggest provides a good overview of these issues for the consumer of scientific advice. It remains to develop the advice for technologists and scientists, which I agree we sorely need. As for comparing our views, I think I am more interested in cases, such as Mill’s White Swans, where ‘severe testing’ is more problemtaic than I think you would regard as typical.

    What do you think?

  6. Pingback: Misspecification Tests: (part 4) and brief concluding remarks « Error Statistics Philosophy

  7. Pingback: Mayo, Senn, and Wasserman on Gelman’s RMM* Contribution « Error Statistics Philosophy

  8. Pingback: Mayo, Senn, and Wasserman on Gelman’s RMM** Contribution « Error Statistics Philosophy

Blog at