“Probabilism as an Obstacle to Statistical Fraud-Busting”

Boston Colloquium 2013-2014


“Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?” was my presentation at the 2014 Boston Colloquium for the Philosophy of Science):“Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge.”  

 As often happens, I never put these slides into a stand alone paper. But I have incorporated them into my book (in progress*), “How to Tell What’s True About Statistical Inference”. Background and slides were posted last year.

Slides (draft from Feb 21, 2014) 

Download the 54th Annual Program

Cosponsored by the Department of Mathematics & Statistics at Boston University.

Friday, February 21, 2014
10 a.m. – 5:30 p.m.
Photonics Center, 9th Floor Colloquium Room (Rm 906)
8 St. Mary’s Street

*Seeing a light at the end of tunnel, finally.
Categories: P-values, significance tests, Statistical fraudbusting, Statistics

Post navigation

7 thoughts on ““Probabilism as an Obstacle to Statistical Fraud-Busting”

  1. p. 52-3 is the first time I’ve called the “we’ve tried (to get tests interpreted correctly)” group on their claim of really trying. Probabilists can’t help it because they don’t see how error probabilities serve the inferential goal of assessing “how well probed” rather than “how probabile”.

  2. Karthik

    Thanks for the slides, Mayo! They are very clear.

    1) My favourite line is: “It’s not so much replication but triangulation that’s required” (pg. 46). I think it is just about the perfect line on the issue.

    2) I had a question about the following line that has been said a few times in different papers:
    “We don’t need an exhaustive list of hypotheses to split off the problem of how well (or poorly) probed a given hypothesis is…” (pg. 25).

    I was thinking about something like the Clever Hans effect or something. Where, the most reasonable solution uncovered later was one where the horse was just reacting to the interviewer’s/questioner’s body language. But, initially, this particular hypothesis wasn’t part of the list of possibilities entertained by the researchers, wouldn’t it accidentally have looked like the claim was initially severely tested, only to be later found out that they had missed out on a crucial counter-hypothesis. Essentially, missing out on a crucial counter-hypothesis seems to me to make any claims of the severe testing of a particular hypothesis incorrect. So, in some sense, doesn’t the severe testing viewpoint also require some acknowledgement of all the possible hypotheses, otherwise how could we say that the claim is severely tested? Am I missing something in my understanding of severity here?

    • Karthik:
      You mention blog editor Jean Miller’s favorite example, Hans the horse.
      Your question is one I’ve worked a lot on in general phil science: underdetermination and the problem of rival explanations. A lot of my interest in statistical inference, for purposes of my general work in philosophy of science, is that it helps to see how we can partition answers one “level” at a time, as it were. Philosophers of science, by contrast, especially post-Kuhn, have focused on large-scale theories, holisms, paradigms, and hard cores. The “new experimentalists” (of which I am one) try to correct that.

      In statistics, error statistics anyway, you see a nice partition of questions from the assumptions of the data, the statistical models, substantive questions. More than that, it’s the fact that the error probabilities apply to local pieces. You don’t need elaborate frameworks to learn something quickly (“jump in and jump out”.) So I was able to apply this “piecemeal approach” to general underdetermination problems in phil of science.
      please see:

      Click to access (1997)%20Severe%20Tests,%20Arguing%20from%20Error,%20and%20Methodological%20Underdetermination.pdf

      Click to access Learning%20from%20Error%20Henle.pdf

      (1) We don’t require an exhaustive list of alternatives, say to GTR, much less probability assignments to them, in order to pass with severity “local” claims, such as the deflection effect is known to enormously high precision, By contrast I don’t think scientists would ever really mean to say that GTR is true with probability .95 or whatever, however one likes to interpret such claims.

      (2) It is extremely useful know what problems and domains have not passed severely since that’s the impetus for specifying the characteristics of tests that would be needed. Scientists can be on the lookout for a phenomenon that fits the bill.

      (3) What passes with severity is stable through changes in underlying theory, and revised explanations of the cause of an effect. Experimental relativists were ready to replace the mechanism underlying the deflection effect from Einstein’s to, say, Brans Dicke, had just a few experiments in the 70s worked out differently. Look how they’re ready to abandon the Standard Model, but even so the effects vouchsafed with severity will remain.

      (4) We have to gather what I call a repertoire of errors and mistaken interpretations of data, relevant for canonical types of inquiry. You may recall my posts on ESP in the 80s (including Diaconis, Suppes, and I.J. Good). The point was to emphasize how background information grows and how it enters into error statistical inquiry.

      There was a “real effect” with Hans, but its cause was not that the horse could add, but subtle cues. If you thought you’d ruled out “signaling by the horse trainer” with a warranted severity assessment, you’d be wrong.

  3. Karthik

    I think I understand the claim better now. I think the very last para really left no doubt as to the interpretation for me. Thanks!

    Then should the quote instead be understood as the following?

    “We don’t need an exhaustive list of effect-size/statistical hypotheses to split off the problem of how well (or poorly) probed a given effect-size/statistical hypothesis is…”

    So, it is not about more abstract theoretical hypotheses per se, but statistical hypotheses, and therefore, it is about hypotheses related to effect-sizes. Is that the appropriate understanding then?

    I ask because it is not clear to me (yet) that this is true at all levels. So, for example, it might not be possible for hypotheses/expectations at a more abstract level, since what is crucially necessary for the quote to be true (I think) is some sort of ordering of all possible hypotheses along a scale (as is possible with effect sizes or discrepancies), but perhaps (?) not at more abstract levels.

    By statistical hypothesis, I mean a hypothesis in terms of discrepancies about the variable of interest.

    • Karthik: I definitely don’t intend to limit it to statistics, but for purposes of the topic of the presentation (which was statistical inference) it could be seen that way. I’m not sure the hypotheses would have to be ordered. My idea is that all of (error prone) learning from data is analogous to statistical inference. This may extend as well to theoretical learning without empirical data.
      That said, I’m not saying your general question, and that of underdetermination in general, isn’t serious. My general philosophy of science, as with any, still has open problems.

      • Karthik

        Thanks again for the clarity (and patience) in the responses!

  4. I just wanted to convey the fact that I’ve received several e-mails on this post from excellent people who simply, as a rule, will not comment on blogs. So I wanted to convey an interesting sentiment many shared–these are from members of all tribes, by the way, endorsing my hope: ” I don’t know, maybe some good will come of all this” as a reaction to the editors saying: “The state of the art remains uncertain” regarding inferential statistical procedures. What my e-mailers meant, as did I, was that maybe the powers that be will realize that foundational issues, and aspects of the philosophy and history of statistics, are rather closer to day-to-day practice than typically thought, and further, it was expressed to me, that serious attention should be paid. (I know I’m planning some conferences, in my mind at least, for now.)

    As for why many people are reluctant to post comments, even though I allow anonymous comments, I will say that we’ve had many great comments and exchanges on this site, and I’m very grateful to them. I will also acknowledge that blog comments, in general, are not known for being especially constructive. Nate Schachtman paid the blog a big compliment once (on his law blog) when he explained why he didn’t allow comments, but pointed to our blog as one of the exceptions to the low expectations for blog comments.

    I also think that twitter gives people another outlet, and have heard bloggers express the general sentiment that twitter forums have diminished comments on their blogs.
    When I have time, I’ll go back to “U-Phils” and more elaborate connections to conferences, other blogs, and on-line publications like RMM.

Blog at WordPress.com.