Here are the slides from my discussion of Nancy Reid today at BFF4: The Fourth Bayesian, Fiducial, and Frequentist Workshop: May 13, 2017 (hosted by Harvard University)
“FusionConfusion?” My Discussion of Nancy Reid: “BFF Four Are we Converging?”
Categories: Bayesian/frequentist, C.S. Peirce, confirmation theory, fiducial probability, Fisher, law of likelihood, Popper
 Tags: Hacking

1 Comment
One thought on ““FusionConfusion?” My Discussion of Nancy Reid: “BFF Four Are we Converging?””
Leave a Reply to Mayo Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
July 28 – Aug 11, 2019
Interviews on PhilStat (2019)
Top Posts & Pages
 Palavering about Palavering about Pvalues
 "The 2019 ASA Guide to Pvalues and Statistical Significance: Don’t Say What You Don’t Mean" (Some Recommendations)(ii)
 S. Senn: Red herrings and the art of cause fishing: Lord’s Paradox revisited (Guest post)
 Statistical Concepts in Their Relation to Reality–E.S. Pearson
 The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA PValue Project Backfiring? (i)
 Spurious Correlations: Death by getting tangled in bedsheets and the consumption of cheese! (Aris Spanos)
 The Meaning of My Title: Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars
 "A small pvalue indicates it’s improbable that the results are due to chance alone" –fallacious or not? (more on the ASA pvalue doc)
 Fisher and Neyman after anger management?
 Performance or Probativeness? E.S. Pearson’s Statistical Philosophy: Belated Birthday Wish
Conferences & Workshops
RMM Special Topic
Mayo & Spanos, Error Statistics
My Websites
LOG IN/OUT
Archives
 August 2019
 July 2019
 June 2019
 May 2019
 April 2019
 March 2019
 February 2019
 January 2019
 December 2018
 November 2018
 October 2018
 September 2018
 August 2018
 July 2018
 June 2018
 May 2018
 April 2018
 March 2018
 February 2018
 January 2018
 December 2017
 November 2017
 October 2017
 September 2017
 August 2017
 July 2017
 June 2017
 May 2017
 April 2017
 March 2017
 February 2017
 January 2017
 December 2016
 November 2016
 October 2016
 September 2016
 August 2016
 July 2016
 June 2016
 May 2016
 April 2016
 March 2016
 February 2016
 January 2016
 December 2015
 November 2015
 October 2015
 September 2015
 August 2015
 July 2015
 June 2015
 May 2015
 April 2015
 March 2015
 February 2015
 January 2015
 December 2014
 November 2014
 October 2014
 September 2014
 August 2014
 July 2014
 June 2014
 May 2014
 April 2014
 March 2014
 February 2014
 January 2014
 December 2013
 November 2013
 October 2013
 September 2013
 August 2013
 July 2013
 June 2013
 May 2013
 April 2013
 March 2013
 February 2013
 January 2013
 December 2012
 November 2012
 October 2012
 September 2012
 August 2012
 July 2012
 June 2012
 May 2012
 April 2012
 March 2012
 February 2012
 January 2012
 December 2011
 November 2011
 October 2011
 September 2011
In my talk I alluded to the analogy: Carnap is to Bayesians as Popper is to frequentists. A questioner began by saying that in social science it’s typical to go from a statistically significant result e to infer one’s favorite explanation H’. He asked: Doesn’t this mean that statistical tests (or maybe only significance tests) are in opposition with Popper? The answer is no, although going from statistically significant e to H’ is about as glaring a fallacy as one can imagine (unless Ho and H’ exhaust the possibilities). If you understand Popper, statistical tests, and a bit of logic, you will see that nonfallacious uses of tests are Popperian.
1. First: a side trip to underdetermination. The most wellknown problem with making inferences from data x to hypotheses that might explain them is underdetermination: data underdetermine hypotheses and theories insofar as there’s more than 1 hypothesis to “explain” them. That is why hypothetical deductive HD inference is inadequate. HD inference goes from “H entails e, and e, to H” –deductively invalid. The statistical version of HD inference goes from:
H is made more probable by e than it was to start; e is observed, therefore H gets “confirmed” in the sense of getting a Bayesian boost.
The problem of underdetermination is exacerbated with the statistical version of HD: there are many incompatible hypotheses that can be found to give H the Bboost. The problem of underdetermination is NOT a problem with any statistical tools, it’s a fact of life. It IS a problem if your statistical tools inadequately cope with it.
2. To cope with underdetermination, good tests demand something beyond “confirmation”(in the Bayesian sense of a Bboost). Popper said that e can’t count in support of H’ unless e reports the result of H SURVIVING or passing a severe test of H’s flaws.
The probability of H’ passing such a stringent test, if H is false is low. One may infer the absence of specific flaws in H’, only if with high probability the method would have unearthed them, and yet it does not. The probability attaches to the method! It describes the method’s probative capability.
3. Meehl’s criticism. Many accept the Popperian standpoint:
The more stringent and precise is the test of a discrepancy between H and data, the better H is corroborated when no such discrepancy is found. H is NOT assigned a probability, but it’s corroborated.
Their complaint (emphasized by Meehl) is this: Suppose a test of a null hypothesis (no effect) Ho is highly precise or powerful. And suppose (statistically) falsifying Ho is taken to corroborate or support a substantive H’ (which would “explain” x). Then, the highly precise test will make it easier to find support for H’. This is at odds with Popper.
Indeed! And it is also at odds with statistical tests. If you insist these are NP tests, fine.
If you seek to infer H’, you must stringently probe flaws of H’! You are entitled to infer H’ only if you’ve stringently probed how H’ may be wrong, and didn’t find any flaws.
Corroboration comes from FAILING TO FALSIFY H’, i.e., from H’ surviving.
Meehl was describing a fallacious use of tests.
4. Popper’s criticism of fallacious uses of significance tests. Rejecting a null hypothesis Ho (of no effect) might entitle you to infer the existence of a genuine discrepancy from Ho. Call that inference e. So long as many explanations for e exist—e underdetermines its explanation. But the corroboration or severity requirement blocks inferring any particular explanation H’. So do statistical tests! Moving from statistical effect e to a particular explanation of e, when that explanation hasn’t been well probed, is a fallacy of significance tests! It is permitted by Carnapian or other Bboost accounts. That was Popper’s criticism.
Does that mean you can never corroborate a hypothesis H’? No! It means you need to actually test H’ before you can corroborate it! It means you shouldn’t settle for a report that says it’s OK to infer H’ so long as H’ is more probable than Ho, nor that H’ is more likely than Ho. Those inference methods do not yield corroboration, but only a claim of comparative fit: H’ “explains” the data better than does Ho. That might be interesting to know, but doesn’t go very far. Not so long as there are zillions of other hypotheses that also agree with the data, but which haven’t been probed at all. They haven’t been ruled out in the slightest.
5. Corroborating H’. Nor does highly corroborating H’ require listing all the possible alternatives to it. Scientists don’t do that. Inference proceeds in stages. H’ may describe an explanation at a certain level of approximation. For example, we don’t have a full theory of prion transmission (in cases of diseases like mad cow), but we can infer, with severity, a theory H’ that explains the transmission by protein misfolding, without DNA. We’ve rejected, with severity, the denial of H’ and learned this much about the mechanism.
In my view, an inference account must also report on what hasn’t been well probed–that’s the source of insights for going further and building higher level theories.