Here are the slides from my discussion of Nancy Reid today at BFF4: The Fourth Bayesian, Fiducial, and Frequentist Workshop: May 1-3, 2017 (hosted by Harvard University)

# “Fusion-Confusion?” My Discussion of Nancy Reid: “BFF Four- Are we Converging?”

Categories: Bayesian/frequentist, C.S. Peirce, confirmation theory, fiducial probability, Fisher, law of likelihood, Popper
Tags: Hacking
1 Comment

In my talk I alluded to the analogy: Carnap is to Bayesians as Popper is to frequentists. A questioner began by saying that in social science it’s typical to go from a statistically significant result e to infer one’s favorite explanation H’. He asked: Doesn’t this mean that statistical tests (or maybe only significance tests) are in opposition with Popper? The answer is no, although going from statistically significant e to H’ is about as glaring a fallacy as one can imagine (unless Ho and H’ exhaust the possibilities). If you understand Popper, statistical tests, and a bit of logic, you will see that non-fallacious uses of tests are Popperian.

1. First: a side trip to underdetermination. The most well-known problem with making inferences from data x to hypotheses that might explain them is underdetermination: data underdetermine hypotheses and theories insofar as there’s more than 1 hypothesis to “explain” them. That is why hypothetical deductive HD inference is inadequate. HD inference goes from “H entails e, and e, to H” –deductively invalid. The statistical version of HD inference goes from:

H is made more probable by e than it was to start; e is observed, therefore H gets “confirmed” in the sense of getting a Bayesian boost.

The problem of underdetermination is exacerbated with the statistical version of HD: there are many incompatible hypotheses that can be found to give H the B-boost. The problem of underdetermination is NOT a problem with any statistical tools, it’s a fact of life. It IS a problem if your statistical tools inadequately cope with it.

2. To cope with underdetermination, good tests demand something beyond “confirmation”(in the Bayesian sense of a B-boost). Popper said that e can’t count in support of H’ unless e reports the result of H SURVIVING or passing a severe test of H’s flaws.

The probability of H’ passing such a stringent test, if H is false is low. One may infer the absence of specific flaws in H’, only if with high probability the method would have unearthed them, and yet it does not. The probability attaches to the method! It describes the method’s probative capability.

3. Meehl’s criticism. Many accept the Popperian standpoint:

The more stringent and precise is the test of a discrepancy between H and data, the better H is corroborated when no such discrepancy is found. H is NOT assigned a probability, but it’s corroborated.

Their complaint (emphasized by Meehl) is this: Suppose a test of a null hypothesis (no effect) Ho is highly precise or powerful. And suppose (statistically) falsifying Ho is taken to corroborate or support a substantive H’ (which would “explain” x). Then, the highly precise test will make it easier to find support for H’. This is at odds with Popper.

Indeed! And it is also at odds with statistical tests. If you insist these are N-P tests, fine.

If you seek to infer H’, you must stringently probe flaws of H’! You are entitled to infer H’ only if you’ve stringently probed how H’ may be wrong, and didn’t find any flaws.

Corroboration comes from FAILING TO FALSIFY H’, i.e., from H’ surviving.

Meehl was describing a fallacious use of tests.

4. Popper’s criticism of fallacious uses of significance tests. Rejecting a null hypothesis Ho (of no effect) might entitle you to infer the existence of a genuine discrepancy from Ho. Call that inference e. So long as many explanations for e exist—e underdetermines its explanation. But the corroboration or severity requirement blocks inferring any particular explanation H’. So do statistical tests! Moving from statistical effect e to a particular explanation of e, when that explanation hasn’t been well probed, is a fallacy of significance tests! It is permitted by Carnapian or other B-boost accounts. That was Popper’s criticism.

Does that mean you can never corroborate a hypothesis H’? No! It means you need to actually test H’ before you can corroborate it! It means you shouldn’t settle for a report that says it’s OK to infer H’ so long as H’ is more probable than Ho, nor that H’ is more likely than Ho. Those inference methods do not yield corroboration, but only a claim of comparative fit: H’ “explains” the data better than does Ho. That might be interesting to know, but doesn’t go very far. Not so long as there are zillions of other hypotheses that also agree with the data, but which haven’t been probed at all. They haven’t been ruled out in the slightest.

5. Corroborating H’. Nor does highly corroborating H’ require listing all the possible alternatives to it. Scientists don’t do that. Inference proceeds in stages. H’ may describe an explanation at a certain level of approximation. For example, we don’t have a full theory of prion transmission (in cases of diseases like mad cow), but we can infer, with severity, a theory H’ that explains the transmission by protein misfolding, without DNA. We’ve rejected, with severity, the denial of H’ and learned this much about the mechanism.

In my view, an inference account must also report on what hasn’t been well probed–that’s the source of insights for going further and building higher level theories.