Overheard at the comedy hour at the Bayesian retreat-2 years on

mic-comedy-clubIt’s nearly two years since I began this blog, and some are wondering if I’ve covered all the howlers thrust our way? Sadly, no. So since it’s Saturday night here at the Elba Room, let’s listen in on one of the more puzzling fallacies–one that I let my introductory logic students spot…

“Did you hear the one about significance testers sawing off their own limbs?

‘Suppose we decide that the effect exists; that is, we reject [null hypothesis] H0. Surely, we must also reject probabilities conditional on H0, but then what was the logical justification for the decision? Orthodox logic saws off its own limb.’ “

Ha! Ha! By this reasoning, no hypothetical testing or falsification could ever occur. As soon as H is falsified, the grounds for falsifying disappear! If H: all swans are white, then if I see a black swan, H is falsified. But according to this critic, we can no longer assume the deduced prediction from H! What? The entailment from a hypothesis or model H to x, whether it is statistical or deductive, does not go away after the hypothesis or model H is rejected on grounds that the prediction is not born out.[i] When particle physicists deduce that the events could not be due to background alone, the statistical derivation (to what would be expected under H: background alone) does not get sawed off when H is denied!images-2

The above quote is from Jaynes (p. 524) writing on the pathologies of “orthodox” tests. How does someone writing a great big book on “the logic of science” get this wrong? To be generous, we may assume that in the heat of criticism, his logic takes a wild holiday. Unfortunately, I’ve heard several of his acolytes repeat this. There’s a serious misunderstanding of how hypothetical reasoning works: 6 lashes, and a pledge not to uncritically accept what critics say, however much you revere them.

Jaynes, E. T. 2003. Probability Theory: The Logic of Science. Cambridge: Cambridge University Press.

[i]Of course there is also no warrant for inferring an alternative hypothesis, unless it is a non-null warranted with severity—even if the alternative entails the existence of a real effect. (Statistical significance is not substantive significance—it is by now cliché . Search this blog for fallacies of rejection.)

A few previous comedy hour posts:

(09/03/11) Overheard at the comedy hour at the Bayesian retreat
(4/4/12) Jackie Mason: Fallacy of Rejection and the Fallacy of Nouvelle Cuisine
(04/28/12) Comedy Hour at the Bayesian Retreat: P-values versus Posteriors

(05/05/12) Comedy Hour at the Bayesian (Epistemology) Retreat: Highly Probable vs Highly Probed
(09/03/12) After dinner Bayesian comedy hour…. (1 year anniversary)
(09/08/12) Return to the comedy hour…(on significance tests)
(04/06/13) Who is allowed to cheat? I.J. Good and that after dinner comedy hour….
(04/27/13) Getting Credit (or blame) for Something You Didn’t Do (BP oil spill, comedy hour)

Categories: Comedy, Error Statistics, Statistics

Post navigation

22 thoughts on “Overheard at the comedy hour at the Bayesian retreat-2 years on

  1. Corey

    Jaynes somehow failed to notice that his argument proves too much: any Bayesian who did model selection on the basis of Bayes factors or posterior probabilities would be just as culpable of the misdeed. One might even claim that the argument bars one from calculating the normalizing constant of a posterior distribution, since that too involves data probabilities computed on simple statistical hypotheses known/judged to be false.

    • Corey: Thanks for your comment, since I take it you support a lot of his more positive message. It’s good for people to realize they don’t have to take the package (any package) hook line and sinker*, that they can, and should, think for themselves;and that erroneous claims don’t disappear, even if no one in the group dares to challenge them. All that said, do you think it’s his logic taking a wild holiday, or more than that? Just a thought question, you needn’t answer.

      *I hadn’t known the etymology: based on the idea of a fish so hungry it swallows the hook (the part that catches the fish), the line ( the string) and the sinker (a weight attached to the line to keep it under water)

  2. “do you think it’s his logic taking a wild holiday”

    It’s meant as a joke: http://bayes.wustl.edu/etj/articles/inadequacy.pdf (pp 52-53)

    • Paul: Thanks for the link,but I don’t see that he gets the reason that the ridicule is misplaced (e.g., merely that there are cases of matching numbers with Bayesians?). In any event, please explain how you interpret the gist of his joke.
      On a different issue: The bottom of p. 54 is interesting.

      • I interpret it as a dig at the (formal) deductiveness in the orthodox inference: the rejection of H_0 as false instead of as improbable.

        • Paul: But this is another of his mistakes*: We too go from data to hypotheses, just not to their probability( unless specifically warranted), but that does not mean the inference doesn’t use probability. We may argue along the lines of “an argument from coincidence” that there is evidence against the null, and qualify that inference using the probativeness of the test. They infer to the existence of a genuine discrepancy from the background (in the Higgs case) because it’s practically impossible to continually bring about 5 sigma effects under the assumption they are due to background alone. Never mind if they are right in that case, I’m trying to bring out the logic of the argument.

          When we infer warranted claims in all the rest of science and everyday life, we do not add “with a high probability” except as a manner of speaking, i.e., we do not assign a formal probability. It is because claims that are warranted (unwarranted) may properly be said to be “believable” (“unbelievable”)that it’s easy to slide into probability talk–because of the english use of the term, but there’s no actual assignment of a posterior. The statistical probability, in short, in the error statistical approach,is used to quantify the probative capacity of the tool. If we keep triggering statistically significant increased radiation levels in the water in a region in Japan, that’s excellent evidence of increased radiation. There’s a qualification, yes, having to do with the sensitivity of the tools to detect/fail to detect radiation levels. That leads to an inference that includes a magnitude that is (or is not) warranted to infer. We can speak of the reliability of the methods used, etc.
          Sorry to go on, but this is at the heart of what is overlooked when it is thought the choices are to accept as true (full stop) or to assign degrees of formal probability to a claim (however interpreted).
          *Here it is not the logical mistake of this post, simply a false dilemma, but not my main point.

  3. Anonymous

    To accept a probability statement is to accept a statement as true. To oust talk of a statement as true,is also to oust “H is probable” as a true statement. The Bayesian does not qualify the statement “H is probable”, it’s just true. In sampling theory, statements are always qualified by effect sizes, variability and computed probabilities of error associated with the method.

    • john byrd

      This important point needs to be taught in a clear and concise manner in stats courses and mainstream discipline-specific methods classes. Regardless of whether you prefer Bayesian methods, you must appreciate that students need to know this distinction. The students are taught error statistics with little attention to underlying philosophy, and taught Bayesian methods with a load of rhetoric such as that in the Jaynes book. I do not teach very much these days, but hire and train a lot of young PhDs. I have yet to receive one who understands these distinctions between the error and Bayesian stats approaches.

    • Anonymous: They also generally imagine the “evidence statement” is given (not assigned a probability). Anyway, to qualify a statement by pointing to the reliability of the method used to arrive at it, is very different from giving it a formal probability. For example, if data x from a small study purports to give evidence that
      H: drug D is effective,
      we naturally say H has not passed a very stringent test with x, but we don’t mean P(H|x) is low—however that might be interpreted.

  4. I don’t see the mistake. There is that awkward, asymmetric deductivism right there in the formal orthodox inference for all to see. Supplementary argument certainly can soften it and deflect the ridicule but it can’t remove it. And for me it’s easy to slide into probability talk because it’s natural, not because it’s just a manner of speaking – a peculiarity of the English language. It’s the natural expression of the reasoning about material claims and hypotheses going on in my head and I do use it in everyday life. Of course I used to accept that adopting the orthodox statistical ways was the right thing to do when it comes time to work quantitatively – with hard data and precisely specified numbers and functions. Like most people, that’s what I was taught is the right and proper way to do “objective” inference in science, but I didn’t like it. It seemed clumsy and contrived even before I found out that it was /unnecessarily/ clumsy and contrived.

    • Paul: I don’t want to equivocate over “mistake”. Supposing the choice is “H is true(full stop)” or “H has a formal probability” is a (mere) false dilemma, whereas the mistake which is the focus of my post is a logical one. And I want to stay on that topic. So, back to it, you’re agreeing that if H entails e, then the entailment doesn’t go away even if not-e is observed and H is falsified. So what’s the joke (in relation to what he says)?

      • Well I believe the essence of the joke is (parodic) logic: the deduction-oriented orthodoxian is imagined to take e to be the set of possible outcomes of some experiment and reason that H entails a splitting of e into acceptance and rejection sets e1, e2: H entails e = (e1 or e2). But, by the identification of rejection and falsification, he also has that e2 entails not H. So H entails (e1 or not H) and, on observing not e1, he is imagined to (be able to) deduce that H entails not H!

        Or something like that.

        • Paul: O my gawd!

          Let H entail e.
          ~e is observed
          ~H is inferred

          (H entails ~H) is logically equivalent to ~H

          so, indeed, if we infer ~H, we infer the logically equivalent (H entails ~H).

          let e be your e1 and ~e would be your e2 say

          By the way, we always have
          H entails (e v~e)
          Since the consequent is a tautology, but it doesn’t do anything.
          but I must run off to teach a logic class (seriously)!

  5. “(H entails ~H) is logically equivalent to ~H”

    But (H entails ~H) is the “punchline” – the sawing off of the limb – and going any further is both unnecessary and probably breaking the rules of joke logic (“analysing the joke to death”). The appropriate context, in which the falsification model is already rejected, is also crucial to the joke. For the intended Jaynesian/Bayesian audience,

    “Ha! Ha! By this reasoning, no hypothetical testing or falsification could ever occur. As soon as H is falsified, the grounds for falsifying disappear!”


    “Ha! Ha! By this reasoning, no hypothetical testing could ever occur. As soon as H is falsified, the grounds for testing disappear!”

    • Paul: But it’s fallacious to say: “As soon as H is falsified, the grounds for testing disappear!”

      The entailment, be it deductive or statistical, does not disappear—if H entails e, it doesn’t stop entailing e on account of someone’s determining ~e.

      There is no sawing off of limbs, and there is no “going further”. The fallacy puts a stop to what he claims just as soon as he claims it!

      • Of course it’s fallacious. It’s a joke: a parody; an exaggeration or caricature of orthodox inferential thinking for humorous effect. If it weren’t fallacious there’d never have been any NHST etc. for anyone to laugh at, would there?

    • john byrd

      It appears the joke is on the teller…

  6. If followers of Jaynes agree with Paul (and Jaynes, apparently) that As soon as H is falsified, the grounds on which the test was based disappear!—a position that is based on a fallacy– then I’m confused as to how Andrew Gelman can claim to follow Jaynes at all.
    “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive…” (Gelman, 2011, bottom p. 71): http://www.rmm-journal.de/downloads/Article_Gelman.pdf
    Gelman employs significance test-type reasoning to reject a model when the data sufficiently disagree.
    Now, strictly speaking, a model falsification, even to inferring something as weak as “the model breaks down,” is not purely deductive, but Gelman is right to see it as about as close as one can get, in statistics, to a deductive falsification of a model. But where does that leave him as a Jaynesian? Perhaps he’s not one of the ones in Paul’s Jaynes/Bayesian audience who is laughing, but is rather shaking his head?

  7. Corey

    Mayo: Gelman on Jaynes. Jaynes was far from consistent on this point — the die example that Gelman likes was very nearly an exercise in hypothesis testing by p-values..

    • Corey: Of course Jaynes is not consistent, you can’t do much of anything if you actually applied the fallacy to one’s own work. Thanks for the link to Gelman on Jaynes; I note one of the comments declared “It is like finding out there is no Santa Clause all over again”. There are some impolite skirmishes by some commentators, you’ll recall, having to do with Jaynes on this blog, but I’m in no mood to link to them.

      One scarcely needs Jaynes to tell us about falsification, but there’s no warrant to infer a particular model that happens to do a better job fitting the data x–at least on x alone. Insofar as there are many alternatives that could patch things up, an inference to one particular alternative fails to pass with severity. I don’t understand how it can be that some of the critics of the (bad) habit of some significance testers to move from rejecting the null to a particular alternative, nevertheless seem prepared to allow this in Bayesian model testing. But maybe they carry out further checks down the road; I don’t claim to really get the methods of correcting Bayesian priors (as part of a model).

      • Corey

        Mayo: The die model is interesting because Jaynes motivates his particular model “patches” by recourse to prior information about die manufacture and physics*. So while I agree that that there’s no warrant on the data alone to infer a particular model that happens to fit the data better, I’d still regard Jaynes’s final model as reasonably well warranted by the *entirety* of the information at his disposal.

        *Specifically, that opposite sides add up to seven and that differentiability of the physics implies local linearity in a neighborhood of the “perfect cube” configuration.

  8. Interested people can read Entsophy’s rude dismissal of my criticism of Jaynes over at Gelman’s blog:

    He says he hasn’t read Jaynes, but knows he’s correct. Jaynes’ very big, often repetitive, book on probability theory has lots of interesting probability examples,but it’s overloaded with “chip on his shoulder” polemics, and dismissals of “orthodox statistics”. Clark Glymour wrote, independently of my blog or anything I said: “I started Jaynes’ book once, but found on so many points he was logically inept and dogmatic that I quit.”

Blog at WordPress.com.