Excursion 1 Tour II: Error Probing Tools vs. Logics of Evidence
Blurb. Core battles revolve around the relevance of a method’s error probabilities. What’s distinctive about the severe testing account is that it uses error probabilities evidentially: to assess how severely a claim has passed a test. Error control is necessary but not sufficient for severity. Logics of induction focus on the relationships between given data and hypotheses–so outcomes other than the one observed drop out. This is captured in the Likelihood Principle (LP). Tour II takes us to the crux of central wars in relation to the Law of Likelihood (LL) and Bayesian probabilism. (1.4) Hypotheses deliberately designed to accord with the data can result in minimal severity. The likelihoodist wishes to oust them via degrees of belief captured in prior probabilities. To the severe tester, such gambits directly alter the evidence by leading to inseverity. (1.5) Stopping rules: If a tester tries and tries again until significance is reached–optional stopping–significance will be attained erroneously with high probability. According to the LP, the stopping rule doesn’t alter evidence. The irrelevance of optional stopping is an asset for holders of the LP, it’s the opposite for a severe tester. The warring sides talk past each other. Continue reading
I’ve been asked if I agree with Regina Nuzzo’s recent note on p-values [i]. I don’t want to be nit-picky, but one very small addition to Nuzzo’s helpful tips for communicating statistical significance can make it a great deal more helpful. Here’s my friendly amendment. She writes: Continue reading
Tour guides in your travels jot down Mementos and Keepsakes from each Tour[i] of my new book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (CUP 2018). Their scribblings, which may at times include details, at other times just a word or two, may be modified through the Tour, and in response to questions from travelers (so please check back). Since these are just mementos, they should not be seen as replacements for the more careful notions given in the journey (i.e., book) itself. Still, you’re apt to flesh out your notes in greater detail, so please share yours (along with errors you’re bound to spot), and we’ll create Meta-Mementos. Continue reading
a road through the jungle
In my talk yesterday at the Philosophy Department at Virginia Tech, I introduced my new book: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (Cambridge 2018). I began with my preface (explaining the meaning of my title), and turned to the Statistics Wars, largely from Excursion 1 of the book. After the sum-up at the end, I snuck in an example from the replication crisis in psychology. Here are the slides.
Where you are in the Journey* We’ll move from the philosophical ground ﬂoor to connecting themes from other levels, from Popperian falsiﬁcation to signiﬁcance tests, and from Popper’s demarcation to current-day problems of pseudoscience and irreplication. An excerpt from our Museum Guide gives a broad-brush sketch of the ﬁrst few sections of Tour II:
Karl Popper had a brilliant way to “solve” the problem of induction: Hume was right that enumerative induction is unjustiﬁed, but science is a matter of deductive falsiﬁcation. Science was to be demarcated from pseudoscience according to whether its theories were testable and falsiﬁable. A hypothesis is deemed severely tested if it survives a stringent attempt to falsify it. Popper’s critics denied he could sustain this and still be a deductivist …
Popperian falsiﬁcation is often seen as akin to Fisher’s view that “every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis” (1935a, p. 16). Though scientists often appeal to Popper, some critics of signiﬁcance tests argue that they are used in decidedly non-Popperian ways. Tour II explores this controversy.
While Popper didn’t make good on his most winning slogans, he gives us many seminal launching-oﬀ points for improved accounts of falsiﬁcation, corroboration, science versus pseudoscience, and the role of novel evidence and predesignation. These will let you revisit some thorny issues in today’s statistical crisis in science. Continue reading
My new book, Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars,” you might have discovered, includes Souvenirs throughout (A-Z). But there are some highlights within sections that might be missed in the excerpts I’m posting. One such “keepsake” is a quote from Fisher at the very end of Section 2.1.
These are some of the ﬁrst clues we’ll be collecting on a wide diﬀerence between statistical inference as a deductive logic of probability, and an inductive testing account sought by the error statistician. When it comes to inductive learning, we want our inferences to go beyond the data: we want lift-oﬀ. To my knowledge, Fisher is the only other writer on statistical inference, aside from Peirce, to emphasize this distinction.
In deductive reasoning all knowledge obtainable is already latent in the postulates. Rigour is needed to prevent the successive inferences growing less and less accurate as we proceed. The conclusions are never more accurate than the data. In inductive reasoning we are performing part of the process by which new knowledge is created. The conclusions normally grow more and more accurate as more data are included. It should never be true, though it is still often said, that the conclusions are no more accurate than the data on which they are based. (Fisher 1935b, p. 54)
How do you understand this remark of Fisher’s? (Please share your thoughts in the comments.) My interpretation, and its relation to the “lift-off” needed to warrant inductive inferences, is discussed in an earlier section, 1.2, posted here. Here’s part of that.