Jean Miller here. (I obtained my PhD with D. Mayo in Phil/STS at VT.) Some of us “island philosophers” have been looking to pick our favorite book reviews of EGEK (Mayo 1996; Lakatos Prize 1999) to celebrate its “sweet sixteen” this month. This review, by Dr. Cosma Shalizi (CMU, Stat) has been chosen as the top favorite (in the category of reviews outside philosophy). Below are some excerpts–it was hard to pick, as each paragraph held some new surprise, or unique way to succinctly nail down the views in EGEK. You can read the full review here. Enjoy.
After I’d bungled teaching it enough times to have an idea of what I was doing, one of the first things students in my introductory physics classes learned (or anyway were taught), and which I kept hammering at all semester, was error analysis: estimating the uncertainty in measurements, propagating errors from measured quantities into calculated ones, and some very quick and dirty significance tests, tests for whether or not two numbers agree, within their associated margins of error. I did this for purely pragmatic reasons: it seemed like one of the most useful things we were supposed to teach, and also one of the few areas where what I did had any discernible effect on what they learnt. Now that I’ve read Mayo’s book, I’ll be able to offer another excuse to my students the next time I teach error analysis, namely, that it’s how science really works.
I exaggerate her conclusion slightly, but only slightly. Mayo is a dues-paying philosopher of science (literally, it seems), and like most of the breed these days is largely concerned with questions of method and justification, of “ampliative inference” (C. S. Peirce) or “non-demonstrative inference” (Bertrand Russell). Put bluntly and concretely: why, since neither can be deduced rigorously from unquestionable premises, should we put more trust in David Grinspoon‘s ideas about Venus than in those of Immanuel Velikovsky? A nice answer would be something like, “because good scientific theories are arrived at by employing thus-and-such a method, which infallibly leads to the truth, for the following self-evident reasons.” A nice answer, but not one which is seriously entertained by anyone these days, apart from some professors of sociology and literature moonlighting in the construction of straw men. In the real world, science is alas fallible, subject to constant correction, and very messy. Still, mess and all, we somehow or other come up with reliable, codified knowledge about the world, and it would be nice to know how the trick is turned: not only would it satisfy curiosity (“the most agreeable of all vices” — Nietzsche), and help silence such people as do, in fact, prefer Velikovsky to Grinspoon, but it might lead us to better ways of turning the trick. Asking scientists themselves is nearly useless: you’ll almost certainly just get a recital of whichever school of methodology we happened to blunder into in college, or impatience at asking silly questions and keeping us from the lab. If this vice is to be indulged in, someone other than scientists will have to do it: namely, the methodologists.
That they have been less than outstandingly successful is not exactly secret. Thus the biologist Peter Medawar, writing on Induction and Intuition in Scientific Thought: “Most scientists receive no tuition in scientific method, but those who have been instructed perform no better as scientists than those who have not. …..
Medawar’s friend Karl Popper achieved (fully deserved) eminence by tenacious insistence on the importance of this point, becoming a sort of Lenin of the philosophy of science. Instead of conferring patents of epistemic nobility, lawdoms and theoryhoods, on certain hypotheses, Popper hauled them all before an Anglo-Austrian Tribunal of Revolutionary Empirical Justice. The procedure of the court was as follows: the accused was blindfolded, and the magistrates then formed a firing squad, shooting at it with every piece of possibly-refuting observational evidence they could find………
Mayo, playing the Jacobin or Bolshevik to Popper’s Girondin or Cadet, thinks she knows what the problem is: for all his can’t-make-an-omelette-without-breaking-eggs rhetoric, Popper is entirely too soft on conjectures.
“Although Popper’s work is full of exhortations to put hypotheses through the wringer, to make them ‘suffer in our stead in the struggle for the survival of the fittest,’ the tests Popper sets out are white-glove affairs of logical analysis. If anomalies are approached with white gloves, it is little wonder that they seem to tell us only that there is an error somewhere and that they are silent about its source. We have to become shrewd inquisitors of errors, interact with them, simulate them (with models and computers), amplify them: we have to learn to make them talk.” [p. 4, reference omitted]
Fortunately, scientists have not only devoted much effort to making errors talk, they have even developed a theory of inquisition, in the form of mathematical statistics, especially the theory of statistical inference worked out by Jerzy Neyman and Egon Pearson in the 1930s. Mayo’s mission is largely to show how this very standard mathematical statistics justifies a very large class of scientific inferences, those concerned with “experimental knowledge,” and to suggest that the rest of our business can be justified on similar grounds. Statistics becomes a kind of applied methodology, as well as the “continuation of experiment by other means.”
Mayo’s key notion is that of a severe test of a hypothesis, one with “an overwhelmingly good chance of revealing the presence of a specific error, if it exists — but not otherwise” (p. 7). More formally (when we can be this formal), the severity of a passing result is the probability that, if the hypothesis is false, our test would have given results which match the hypothesis less well than the ones we actually got do, taking the hypothesis, the evidence used in the test, and the way of calculating fit between hypothesis and evidence to be fixed. [Semi-technical note containing an embarrassing confession.] If a severe test does not turn up the error it looks for, it’s good grounds for thinking that the error is absent. By putting our hypotheses through a battery of severe tests, screening them for the members of our “error repertoire,” our “canonical models of error,” we can come to have considerable confidence that they are not mistaken in those respects. Instead of a method for infallibly or even reliably finding truths, we have a host of methods for reliably finding errors: which turns out to be good enough.
Distributions of experimental outcomes, then, are the key objects for Mayo’s tests, especially the standard Neyman-Pearson statistical tests. The kind of probabilities Mayo, and Neyman and Pearson, use are probabilities of various things happening: meaning that the probability of a certain result, p(A), is the proportion of times A occurs in many repetitions of the experiment, its frequency. This is a very familiar sense of probability; it’s the one we invoke when we say that a fair coin has a 50% probability of coming up heads, that the chance of getting three sixes with fair (six-sided!) dice is 1 in 216, that a certain laboratory procedure will make an indicator chemical change from red to blue 95% of the time when a toxin is present. Or, more to the present point: “the hypothesis is significant at the five percent level” means “the hypothesis passed the test, and the probability of its doing so, if it were false, is no more than five percent,” which means “if the hypothesis is false, and we repeated this experiment many times, we would expect to get results inside our passing range no more than five percent of the time.”
This interpretation of probability, the “frequentist” interpretation, is not the only one however. Ever since its origins in the seventeenth century, if we are to believe its historians, mathematical probability has oscillated, not to say equivocated, between two interpretations, between saying how often a given kind of event happens, and saying how much credence we should give a given assertion. Now, this is the sort of philosophical question — viz., what the hell is a probability anyway? — which scientists are normally none the worse for ignoring, and normally blithely ignore. But maybe once every hundred years these questions actually affect the course of research, philosophy really does make a difference: the existence of atoms was such a question at the beginning of the century, and the nature of probability is one today. To see why, and why Mayo spends much of her book chastising the opponents of the frequentist interpretation, requires a little explanation.
Modern believers in subjective probability are called Bayesians, after the Rev. Mr. Thomas Bayes, who in 1763 posthumously published a theorem about the calculation of conditional probabilities…..The theorem itself is beyond dispute, being an easy consequence of the definition of a conditional probability, with many useful applications, the classical one being diagnostic testing. The uses to which it has been put are, however, as peculiar as those of any mathematical theorem, even Gödel’s.
In particular, if you think of probabilities as degrees-of-belief, it is tempting, maybe even necessary, to regard Bayes’s theorem as a rule for assessing the evidential support of beliefs. For instance, let A be “Mr. Geller is psychic” and B be “this spoon will bend without the application of physical force.” Once we’ve assigned p(A), p(B), and p(B|A), we can calculate just how much more we ought to believe in Geller’s psychic powers after seeing him bend a spoon without visibly doing so. p(A) and p(B) and sometimes even p(B|A) are, in this view, all reflections of our subjective beliefs, before we examine the evidence. They are called the “prior probabilities,” or even just the “priors.” The prize, p(A|B), is the “posterior,” and regarded as the weight we should give to a hypothesis (A) on the strength of a given piece of evidence (B). As I said, it’s hard to avoid this interpretation if you think of probabilities as degrees-of-belief, and there is a large, outspoken and able school of methodologists and statisticians who insist that this is the way of thinking about probability, scientific inference, and indeed rationality in general: the Bayesian Way.
Looked at from a vantage-point along that Way, Neyman-Pearson hypothesis testing is arrant nonsense, involving all manner of irrelevant considerations, when all you need is the posterior. For those of us taking the frequentist (or, as Mayo prefers, error-statistical) perspective, Bayesians want to quantify the unquantifiable and proscribe inferential tools that scientific practice shows are most useful, and are forced to give precise values to perfectly ridiculous quantities, like the probability of a getting a certain experimental result if all the hypotheses we can dream up are wrong. For us, to assign a probability to a hypothesis might make sense (in Peirce’s words) “if universes were as plenty as blackberries, if we could put a quantity of them in a bag, shake them well up, draw out a sample and examine them” (Collected Works 2.684, quoted p. 78); as it is, hypotheses are either true or false, a condition quite lacking in gradations. Bayesians not only assign such probabilities, they do so a priori, condensing their prejudices into real numbers between 0 and 1 inclusive; two Bayesians cannot meet without smiling at each other’s priors. True, they can show that, in the limit of presenting an infinite amount of (consistent) evidence, the priors “wash out” (provided they’re “non-extreme,” not 0 or 1 to start with); but it has also been shown that, “for any body of evidence there are prior probabilities in a hypothesis H that, while nonextreme, will result in the two scientists having posterior probabilities in H that differ by as much as one wants” (p. 84n, Mayo’s emphasis). This is discouraging, to say the least, and accords very poorly with the way that scientists actually do come to agree, very quickly, on the value and implications of pieces of evidence. Bayesian reconstructions of episodes in the history of science, Mayo says, are on a level with claiming that Leonardo da Vinci painted by numbers since, after all, there’s some paint-by-numbers kit which will match any painting you please.
Mayo will have nothing to do with painting by numbers, and wants to trash all the kits she runs across. These do not just litter the Bayesian Way; the whole attempt to find “evidential relation” measures, which will supposedly quantify how much support a given body of evidence provides for a given hypothesis, fall into the dumpster as well. The idea behind them, that the relation between evidence and hypothesis is some kind of a fraction of a deductive implication, can now I think be safely set aside as a nice idea which just doesn’t work. (This is a pity; it is easy to program.) It should be said, as Mayo does, that the severity of a test is not an evidential relation measure, rather is a property of the test, telling us how reliably it picks out a kind of mistake — that it misses it once every hundred tries, or once every other try, or never. (If a hypothesis passes a test on a certain body of evidence with severity 1, it does not mean that the evidence implies the hypothesis, for instance.) Also on the list of science-by-numbers kits to be thrown out are some abuses of Neyman-Pearson tests, the kind of unthinking applications of them that led a physicist of my acquaintance to speak sarcastically of “statistical hypothesis testing, that substitute for thought.” Some of these Mayo lays (perhaps unjustly) at Neyman’s feet, exonerating Pearson; she shows that none of them are necessitated by a proper understanding of the theory of testing……
Aside from my usual querulousness about style… I have only two substantial problems with Mayo’s ideas; or perhaps I just wish she’d pushed them further here than she did. First, they do not seem to distinguish scientific knowledge — at least not experimental knowledge — from technological knowledge, or even really from artisanal know-how. Second, they leave me puzzled about how science got on before statistics. …..
Read the full review: here.