Jean Miller: Happy Sweet 16 to EGEK! (Shalizi Review: “We Have Ways of Making You Talk”)

Jean Miller here.  (I obtained my PhD with D. Mayo in Phil/STS at VT.) Some of us “island philosophers” have been looking to pick our favorite book reviews of EGEK (Mayo 1996; Lakatos Prize 1999) to celebrate its “sweet sixteen” this month. This review, by Dr. Cosma Shalizi (CMU, Stat) has been chosen as the top favorite (in the category of reviews outside philosophy).  Below are some excerpts–it was hard to pick, as each paragraph held some new surprise, or unique way to succinctly nail down the views in EGEK. You can read the full review here. Enjoy.

“We Have Ways of Making You Talk, or, Long Live Peircism-Popperism-Neyman-Pearson Thought!”
by Cosma Shalizi

After I’d bungled teaching it enough times to have an idea of what I was doing, one of the first things students in my introductory physics classes learned (or anyway were taught), and which I kept hammering at all semester, was error analysis: estimating the uncertainty in measurements, propagating errors from measured quantities into calculated ones, and some very quick and dirty significance tests, tests for whether or not two numbers agree, within their associated margins of error. I did this for purely pragmatic reasons: it seemed like one of the most useful things we were supposed to teach, and also one of the few areas where what I did had any discernible effect on what they learnt. Now that I’ve read Mayo’s book, I’ll be able to offer another excuse to my students the next time I teach error analysis, namely, that it’s how science really works.

I exaggerate her conclusion slightly, but only slightly. Mayo is a dues-paying philosopher of science (literally, it seems), and like most of the breed these days is largely concerned with questions of method and justification, of “ampliative inference” (C. S. Peirce) or “non-demonstrative inference” (Bertrand Russell). Put bluntly and concretely: why, since neither can be deduced rigorously from unquestionable premises, should we put more trust in David Grinspoon‘s ideas about Venus than in those of Immanuel Velikovsky? A nice answer would be something like, “because good scientific theories are arrived at by employing thus-and-such a method, which infallibly leads to the truth, for the following self-evident reasons.” A nice answer, but not one which is seriously entertained by anyone these days, apart from some professors of sociology and literature moonlighting in the construction of straw men. In the real world, science is alas fallible, subject to constant correction, and very messy. Still, mess and all, we somehow or other come up with reliable, codified knowledge about the world, and it would be nice to know how the trick is turned: not only would it satisfy curiosity (“the most agreeable of all vices” — Nietzsche), and help silence such people as do, in fact, prefer Velikovsky to Grinspoon, but it might lead us to better ways of turning the trick. Asking scientists themselves is nearly useless: you’ll almost certainly just get a recital of whichever school of methodology we happened to blunder into in college, or impatience at asking silly questions and keeping us from the lab. If this vice is to be indulged in, someone other than scientists will have to do it: namely, the methodologists.

That they have been less than outstandingly successful is not exactly secret. Thus the biologist Peter Medawar, writing on Induction and Intuition in Scientific Thought: “Most scientists receive no tuition in scientific method, but those who have been instructed perform no better as scientists than those who have not. …..

Medawar’s friend Karl Popper achieved (fully deserved) eminence by tenacious insistence on the importance of this point, becoming a sort of Lenin of the philosophy of science. Instead of conferring patents of epistemic nobility, lawdoms and theoryhoods, on certain hypotheses, Popper hauled them all before an Anglo-Austrian Tribunal of Revolutionary Empirical Justice. The procedure of the court was as follows: the accused was blindfolded, and the magistrates then formed a firing squad, shooting at it with every piece of possibly-refuting observational evidence they could find………

Mayo, playing the Jacobin or Bolshevik to Popper’s Girondin or Cadet, thinks she knows what the problem is: for all his can’t-make-an-omelette-without-breaking-eggs rhetoric, Popper is entirely too soft on conjectures.

“Although Popper’s work is full of exhortations to put hypotheses through the wringer, to make them ‘suffer in our stead in the struggle for the survival of the fittest,’ the tests Popper sets out are white-glove affairs of logical analysis. If anomalies are approached with white gloves, it is little wonder that they seem to tell us only that there is an error somewhere and that they are silent about its source. We have to become shrewd inquisitors of errors, interact with them, simulate them (with models and computers), amplify them: we have to learn to make them talk.” [p. 4, reference omitted]

Fortunately, scientists have not only devoted much effort to making errors talk, they have even developed a theory of inquisition, in the form of mathematical statistics, especially the theory of statistical inference worked out by Jerzy Neyman and Egon Pearson in the 1930s. Mayo’s mission is largely to show how this very standard mathematical statistics justifies a very large class of scientific inferences, those concerned with “experimental knowledge,” and to suggest that the rest of our business can be justified on similar grounds. Statistics becomes a kind of applied methodology, as well as the “continuation of experiment by other means.”

Mayo’s key notion is that of a severe test of a hypothesis, one with “an overwhelmingly good chance of revealing the presence of a specific error, if it exists — but not otherwise” (p. 7). More formally (when we can be this formal), the severity of a passing result is the probability that, if the hypothesis is false, our test would have given results which match the hypothesis less well than the ones we actually got do, taking the hypothesis, the evidence used in the test, and the way of calculating fit between hypothesis and evidence to be fixed. [Semi-technical note containing an embarrassing confession.] If a severe test does not turn up the error it looks for, it’s good grounds for thinking that the error is absent. By putting our hypotheses through a battery of severe tests, screening them for the members of our “error repertoire,” our “canonical models of error,” we can come to have considerable confidence that they are not mistaken in those respects. Instead of a method for infallibly or even reliably finding truths, we have a host of methods for reliably finding errors: which turns out to be good enough.


Distributions of experimental outcomes, then, are the key objects for Mayo’s tests, especially the standard Neyman-Pearson statistical tests. The kind of probabilities Mayo, and Neyman and Pearson, use are probabilities of various things happening: meaning that the probability of a certain result, p(A), is the proportion of times A occurs in many repetitions of the experiment, its frequency. This is a very familiar sense of probability; it’s the one we invoke when we say that a fair coin has a 50% probability of coming up heads, that the chance of getting three sixes with fair (six-sided!) dice is 1 in 216, that a certain laboratory procedure will make an indicator chemical change from red to blue 95% of the time when a toxin is present. Or, more to the present point: “the hypothesis is significant at the five percent level” means “the hypothesis passed the test, and the probability of its doing so, if it were false, is no more than five percent,” which means “if the hypothesis is false, and we repeated this experiment many times, we would expect to get results inside our passing range no more than five percent of the time.”

This interpretation of probability, the “frequentist” interpretation, is not the only one however. Ever since its origins in the seventeenth century, if we are to believe its historians, mathematical probability has oscillated, not to say equivocated, between two interpretations, between saying how often a given kind of event happens, and saying how much credence we should give a given assertion. Now, this is the sort of philosophical question — viz., what the hell is a probability anyway? — which scientists are normally none the worse for ignoring, and normally blithely ignore. But maybe once every hundred years these questions actually affect the course of research, philosophy really does make a difference: the existence of atoms was such a question at the beginning of the century, and the nature of probability is one today. To see why, and why Mayo spends much of her book chastising the opponents of the frequentist interpretation, requires a little explanation.

Modern believers in subjective probability are called Bayesians, after the Rev. Mr. Thomas Bayes, who in 1763 posthumously published a theorem about the calculation of conditional probabilities…..The theorem itself is beyond dispute, being an easy consequence of the definition of a conditional probability, with many useful applications, the classical one being diagnostic testing. The uses to which it has been put are, however, as peculiar as those of any mathematical theorem, even Gödel’s.

In particular, if you think of probabilities as degrees-of-belief, it is tempting, maybe even necessary, to regard Bayes’s theorem as a rule for assessing the evidential support of beliefs. For instance, let A be “Mr. Geller is psychic” and B be “this spoon will bend without the application of physical force.” Once we’ve assigned p(A), p(B), and p(B|A), we can calculate just how much more we ought to believe in Geller’s psychic powers after seeing him bend a spoon without visibly doing so. p(A) and p(B) and sometimes even p(B|A) are, in this view, all reflections of our subjective beliefs, before we examine the evidence. They are called the “prior probabilities,” or even just the “priors.” The prize, p(A|B), is the “posterior,” and regarded as the weight we should give to a hypothesis (A) on the strength of a given piece of evidence (B). As I said, it’s hard to avoid this interpretation if you think of probabilities as degrees-of-belief, and there is a large, outspoken and able school of methodologists and statisticians who insist that this is the way of thinking about probability, scientific inference, and indeed rationality in general: the Bayesian Way.

Looked at from a vantage-point along that Way, Neyman-Pearson hypothesis testing is arrant nonsense, involving all manner of irrelevant considerations, when all you need is the posterior. For those of us taking the frequentist (or, as Mayo prefers, error-statistical) perspective, Bayesians want to quantify the unquantifiable and proscribe inferential tools that scientific practice shows are most useful, and are forced to give precise values to perfectly ridiculous quantities, like the probability of a getting a certain experimental result if all the hypotheses we can dream up are wrong. For us, to assign a probability to a hypothesis might make sense (in Peirce’s words) “if universes were as plenty as blackberries, if we could put a quantity of them in a bag, shake them well up, draw out a sample and examine them” (Collected Works 2.684, quoted p. 78); as it is, hypotheses are either true or false, a condition quite lacking in gradations. Bayesians not only assign such probabilities, they do so a priori, condensing their prejudices into real numbers between 0 and 1 inclusive; two Bayesians cannot meet without smiling at each other’s priors. True, they can show that, in the limit of presenting an infinite amount of (consistent) evidence, the priors “wash out” (provided they’re “non-extreme,” not 0 or 1 to start with); but it has also been shown that, “for any body of evidence there are prior probabilities in a hypothesis H that, while nonextreme, will result in the two scientists having posterior probabilities in H that differ by as much as one wants” (p. 84n, Mayo’s emphasis). This is discouraging, to say the least, and accords very poorly with the way that scientists actually do come to agree, very quickly, on the value and implications of pieces of evidence. Bayesian reconstructions of episodes in the history of science, Mayo says, are on a level with claiming that Leonardo da Vinci painted by numbers since, after all, there’s some paint-by-numbers kit which will match any painting you please.

Mayo will have nothing to do with painting by numbers, and wants to trash all the kits she runs across. These do not just litter the Bayesian Way; the whole attempt to find “evidential relation” measures, which will supposedly quantify how much support a given body of evidence provides for a given hypothesis, fall into the dumpster as well. The idea behind them, that the relation between evidence and hypothesis is some kind of a fraction of a deductive implication, can now I think be safely set aside as a nice idea which just doesn’t work. (This is a pity; it is easy to program.) It should be said, as Mayo does, that the severity of a test is not an evidential relation measure, rather is a property of the test, telling us how reliably it picks out a kind of mistake — that it misses it once every hundred tries, or once every other try, or never. (If a hypothesis passes a test on a certain body of evidence with severity 1, it does not mean that the evidence implies the hypothesis, for instance.) Also on the list of science-by-numbers kits to be thrown out are some abuses of Neyman-Pearson tests, the kind of unthinking applications of them that led a physicist of my acquaintance to speak sarcastically of “statistical hypothesis testing, that substitute for thought.” Some of these Mayo lays (perhaps unjustly) at Neyman’s feet, exonerating Pearson; she shows that none of them are necessitated by a proper understanding of the theory of testing……

Aside from my usual querulousness about style… I have only two substantial problems with Mayo’s ideas; or perhaps I just wish she’d pushed them further here than she did. First, they do not seem to distinguish scientific knowledge — at least not experimental knowledge — from technological knowledge, or even really from artisanal know-how. Second, they leave me puzzled about how science got on before statistics. …..

Read the full review:  here.

Categories: philosophy of science, Statistics | Tags: , , , ,

Post navigation

33 thoughts on “Jean Miller: Happy Sweet 16 to EGEK! (Shalizi Review: “We Have Ways of Making You Talk”)

  1. Fisher

    I love this review! I’ve read the original several times over the years and it still seems fresh. The best part about it is that it’s from a Physicist. They seem to be the most intransigent on this subject. I didn’t understand much of Jayne’s science-by-numbers book on probability, but I definitely had the feeling it can be “safely put aside as a nice idea that doesn’t work” (in Shalizi’s words).

    But what really frustrates me about their warped Bayesian view of probability is that you can show the following in a few lines of trivial mathematics (the kind Bayesians loooove to throw around):

    If both evidence “E” and “not E” are possible (i.e. P(E) is not equal to zero or one) then you get for any hypothesis H:

    P(H|E)-P(H) = -{P(not E)/P(E)}[P(H|not E) – P(H)]

    If you think of the outcomes {E, not E} as a test of H, the only way observing E could significantly increase the probability of H is if there was some possibility of observing “not E” and this outcome significantly REDUCES the probability of H. (the minus sign on the right hand side is key)

    In other words E can only provide strong evidence for H if {E, not E} was a “severe” test of H.

    In the extreme case of a “weak” test, when neither E or “not E” could have denied H (reduced it’s probability), then there is no possibility that either could have confirmed H. Since the above equation reduces to P(H|E)=P(H|not E)=P(H).

    So even with their illogical philosophy of probability, they should still agree with Mayo. It’s only their ideology that causes them to be blind to their own mathematics.

    It’s odd though how many of these intransigent Bayesian Physicists were good scientists. I guess that there is some truth to the quote from the review:

    “Most scientists receive no tuition in scientific method, but those who have been instructed perform no better as scientists than those who have not”

    There’s no telling what Laplace might have been able to achieve if only he had read EGEK!

    • Fisher: Just one thing on the conditional of yours:
      “In other words E can only provide strong evidence for H if {E, not E} was a “severe” test of H.”
      I see your intuition, but one has to be careful not to equate a Bayesian assessment of P(E|not-H) with what is required to pass H severely (on any Bayesian account I’m aware of). That is, if I’m understanding you, your claim would mean
      (Strong Bayesian evidence for H) entails (H has passed a severe test).

      But they are prepared to hold the antecedent in cases where the consequent does not hold.
      The converse also fails. That doesn’t mean Bayesian philosophers don’t ernestly want to be able to assume that low Bayesian probability of E “on the catch-all” entails H severely passes with E. That is, you may be right to suggest their intuitions would be better satisfied with a severe testing account!

      • Fisher

        Mayo: I think you’re letting the Bayesians off far to easy here. Use the following bayesian definition for “E is strong evidence for H”

        (A) P(H|E) >> P(H)

        In other words, P(H) starts out close to zero and ends up with P(H|E) close to 1. This could also serve an intuitive notion of “E accords with H”.

        Then all that’s needed for {E,not E} to be a severe test by your definition is for the following to hold:

        (B) P(E|not H) is small

        The thing is that A implies B! To see this note:

        P(H)/P(not H) = X << 1 and
        P(not H|E)/P(H|E) = Y <<1

        and from the definition of conditional probability (Bayes theorem effectively)

        P(E|not H)<X*Y

        so P(E|not H) has to be very close to zero since it's less than the product of two terms each of which is separately close to zero.

        So a Baysian automatically adhere's to severity principle, whether they know it or not. Hopefully they will have the integrity to admit the truth of the Severity Principle and admit that they've just been doing automatically what Frequentists have been talking about for some time now.

        • Fisher:
          “Then all that’s needed for {E,not E} to be a severe test by your definition is for the following to hold:
          (B) P(E|not H) is small”
          False! Not to mention we could only talk of a likelihood such as (B) within a model that exhausts hypotheses, the Bayesian catchall will not do. I take up the issue of why this is an incorrect view of SEV in many places.

          What I want people to consider, is that the very idea that strong evidence for hypothesis H should be construed as a posterior probability assigned to H, is wrong, WRONG! I may take this up in a next post.

          • Fisher

            Of course Bayesians are wrong!

            I was looking at whether it made sense even within their world view. I was trying to take your definition from the “Error Statistics” paper. Namely this passage:

            “(S-2)* with very low probability, test T would have produced a result that accords as well as or better with H than x_0 does, if H were false or incorrect.”

            In this case the test has only two outcomes {E, not E}. The only outcome that “accords as well or better with H than E” is actually just “E”. And it having a small probability if H isn’t true, is just P(E|not H)<<1. I'm not sure where I got this wrong.

            This doesn't seem to be a Bayesian vs Frequentist issue, so I look forward to clarification on what (S-2) means.

            • Look Fisher, you might be able to contrive a test that happens to get the “same numbers” (as J. Berger and others have shown with slightly less artificial examples than yours), but the interpretation will still fail to match (SEV is not a posterior prob). Note though that even imagining the highly artificial two outcomes (lumping points together) still doesn’t assure a numerical match because the Bayesian ignores selection effects. But, again, numerical matches are easy—read Shalizi on “painting by numbers”.

              • I should add that if you focus just on cases where it is a given that H has passed severely, then it will seem innocuous to slide into posterior prob talk, and it might even be harmless in many such cases. Things are far more interesting and the differences more profound when there is inseverity or low severity.

              • Fisher

                Mayo, I’m not sure where the uncivil tone comes from, but I had a very innocent mental example when writing the above:

                There are two assembly lines producing a widget. Assembly line 1 has mean m_0, Assembly line 2 has mean m_1. We are testing to determine which assembly line produced a given batch. A sample X is taken.

                H: mean of population =m_0
                not H: mean of population =m_1

                E: test statistics f(x)c

                It should be possible to take (S-2) and turn it into a statement involving just the sampling distributions (i.e. P(E|m_0) and so on). The fact that Bayesians might use these same numbers for their purposes is kind of irrelevant for figuring out what (S-2) means. It seems to be saying P(f(x)<c|m_1)<<1.

                I'm a little shocked to see you call this an artificial example

              • Fisher

                It should read:

                E: f(x) is less than some cutoff c
                not E: f(x) is greater than or equal to some cutoff c

                to reiterate: at this point we are only dealing with the sampling distribution and there is no mention of a posterior probability.

          • Corey

            There are occasions in which it is possible to define a catchall hypothesis in a Bayesian context. For example, see Bretthorst, G. Larry, 1996, “An Introduction To Model Selection Using Probability Theory As Logic,” available here. This paper looks at (a simplification of) an analysis identifying airplanes using characteristics of their radar reflection signals. The paper includes a simulation study in which the hypothesis set is 18 known airplanes, the no-airplane hypothesis, and the catchall “unknown airplane” hypothesis.

            The math of the catchall hypothesis is quite interesting. For example, in a certain sense the catchall hypothesis can fit any data — in particular, data simulated using one of the known planes fit the catchall hypothesis slightly better than the correct plane hypothesis! But mathematically, this freedom to fit any data is provided by numerous nuisance parameters; the catchall hypothesis “lives” in a parameter space of much higher dimension than any of the known plane hypotheses. The consequences of this fact are a bit too involved to go into here, but the upshot is that if one of the known plane hypotheses attains a much higher posterior probability than the catchall hypothesis, then it has indeed passed a severe test in Mayo’s error-statistical sense.

            • Corey: thanks for this. I will look at your example. I should be clear that the complement of a test hypothesis is well-defined in the error statistical test (as in N-P tests) by constraining things appropriately. Putting aside formal statistics, an example that comes to mind is Perrin’s analysis of Brownian particles, where he can partition the question into: caused by something inside the liquid medium, or outside—without having to list particular outside causes. That becomes well-testable, and the “catchall” is rejected (with severity) because anything outside would have a certain pattern of coordinated motion, and the opposite is repeatedly found (this is just from memory, to give the idea, it’s in EGEK).
              But here’s the key issue Corey, so maybe you and others will just try this out as a little exercise over the weekend: whenever you assume you must wish to say “hypothesis H attains a much higher posterior probability than its denial” say instead, H is much better tested, (or if you prefer, more highly corroborated) in the sense that the ways H can be wrong here have been well checked and found to be absent. In ordinary day to day claims of evidence, I maintain, that’s what we mean. If it’s well warranted, we may infer it to be the case or approximately the case or the like—that’s not a probability assignment to H! You may try to convert it to probability talk, because “probable” is equivocal in English, but that doesn’t mean the claim that is warranted is properly construed as mathematical probability P(H), whether in the sense of how often H is true, or how strongly we think H is true, or how many worlds H holds —well, the onus is on you to say what other thing you might mean. After you try this little experiment, the advantages may start to become evident, as the scales fall from your eyes!

    • Fisher: E may pass H with severity even though not-E fails to pass not-H with severity. If a company V’s trials on spinal chord injuries on animals succeeds in getting all of the animals them to walk again (E), it will show H: regenerative techniques are capable of undoing spinal chord damage, but, its failure would not be good evidence for not-H, that regenerative techniques are incapable of success here. Consider statistical fallacies of acceptance (with a 0 null): E: statistically significant increased positive observed differences from 0 may stringently pass H: there is a genuine positive discrepancy from 0, but non-significant results need not provide good evidence for 0 (or even very small) discrepancy. (The test may have had a very low power to detect discrepancies even if they were present.) The point is a statistical analogue to the general asymmetry of falsification and confirmation. A single case (e.g., E: an A that is not a B) may severely pass H: there is an anomaly for All A’s are B’s, but failing to observe an anomaly on this test (~E) scarcely passes not-H with severity. Not-H would be: all A are B.

  2. Thanks Jean: I think my favorite remark is his reference to rubies at the end of the review (my favorite stone)!

  3. Corey

    I notice that Shalizi initially mistook severity for power. The short-cut to seeing that these are not the same is to notice that power is calculated pre-data whereas severity is calculated post-data.

  4. Corey: But that still doesn’t tell you what’s going on with SEV, that, for example,in the case where a standard test “rejects the null”, SEV (e.g., for inferring discrepancy d’) goes in the opposite direction of power against d’. And of course, all the computations could be done pre-data, in planning the test. I’ve thought of various ways to clarify, which I can say more about at some point. (We have a little Excel program that makes the switch stand out.)

    • Corey

      That’s right, the shortcut doesn’t explain severity. But statistically knowledgeable people both in favor of and against error statistics have mistaken severity for power. The shortcut can help bounce them from system 1 to system 2 thinking.

      (Any computation can be done pre-data. The point is that one can know the actual power pre-data but can’t know the actual severity pre-data.)

      • Corey: But don’t forget the mess that is the after-data concept of “shpower”. Putting “post-data” and “power” together in the same sentence is likely to lead to thinking we are into shpower! For those who don’t know what I’m talking about, look it up in this blog.

      • I looked at your “system 1” link. So you’re saying I should help bounce people from an automatic, unconscious way of thinking to a deliberate, critical way–the latter being unique to humans, you say. Ok. Actually, though, I’ve had dogs and cats that performed severe critical scrutiny, as do many of our wild woodland animals out here in the Blue Ridge. (Seriously.) And of course Popper traced it to the amoeba!

        • Corey

          That’s more-or-less it. More precisely, I’m saying that it seems to me that when people read your definition of severity, their system 1 returns “oh, that’s just power” to their conscious minds. Kahneman writes in Thinking, Fast and Slow that in the ordinary course of things, system 1 provides the working model of the environment, and system 2 will only engage when there is a large mismatch between the model’s prediction and sensory input. (Small mismatches are ignored.) My claim is that the shortcut provides a cenceptual version of this sort of mismatch, thereby getting the attention of system 2.

          • But I still say that there’s a huge danger in people thinking I mean what some regard as post-data power or shpower. You’d think they hadn’t heard of the whole power analytic movement in psychology (e.g., Cohen) which does not fall into the shpower trap. Even though it fails to be “custom tailored”, it does give a “worst case” severity assessment for non-significant results. (some of this is in the “Neyman’s nursery” posts.)

            • Corey

              It’s a fair point. It’d be worth trying just to see what the results are…

              • Corey: How (to see such results)? Isn’t it better for people just to know what’s being talked about (rather than hoping a hint will do the trick?) ? For any inference H being entertained or evaluated, on account of its being in accord with the data, how improbable/probable is it that that so good an accordance would occur were H (specifiably) false? What erroneous construals have/have not been ruled out well? This is no different from day to day musings, at least mine.

              • Corey

                @Mayo, regarding the shortcut:

                What I mean is: the next time you encounter a statistically competent person confusing severity for power, say to them, “No, severity is not power — you can tell that because the power function can be known before the experiment is complete, but the severity is only known after the experiment. Check the definition again.” Then see if they get the concept of severity right or if they conflate it with observed power.

  5. Guest

    Methinks perhaps ‘Fisher’ is a plant …he seems a bit too ebullient!

    • Guest: Indeed! Likely a male plant who probably even believes he’s a Bayesian!

      • Corey

        That’s a curious way of phrasing it — are you alluding to Senn’s paper? (I’m not Fisher.)

        • Corey: “That’s a curious way of phrasing” what? And where did I say you were Fisher? I admit that the way the comments show up is confusing, I’ll have to look into this, so I’ve tried to indicate to whom I’m speaking. I was talking to the Guest commentator above me, who made the remark about plants. My reply just came out as a kind of system 1 reaction, but then after I reread it, and the Fisher comments, I decided it was likely true.
          I don’t ask people to give me their real names, and I don’t know if I know the vast majority of the commentators anyway.

          • Corey

            It’s curious to say that Fisher probably believes himself to be a Bayesian when one could just say that probably Fisher is a Bayesian.

            You didn’t say I was Fisher — I just thought I’d mention that I’m not, since you know me to be a Bayesian who frequents your blog and since I was jumping into a conversation of Fisher’s true statistical philosophy. I wouldn’t have bothered except that I suspect you don’t know how to check IP addresses of commenters.

            • Corey: ? ? ? ?
              “It’s curious to say that Fisher probably believes himself to be a Bayesian when one could just say that probably Fisher is a Bayesian.”
              Well but these two do not mean the same thing at all! (See Senn’s post!)

              “You didn’t say I was Fisher — I just thought I’d mention that I’m not, since you know me to be a Bayesian who frequents your blog and since I was jumping into a conversation of Fisher’s true statistical philosophy.”

              Do you meant R.A. Fisher or the Cod Fisher? Either way, to what conversation of “Fisher’s true statistical philosophy” are you referring? Not on this post, right? Anyway, I certainly didn’t know you to be a Bayesian—mind’s all made up?

              “ I wouldn’t have bothered except that I suspect you don’t know how to check IP addresses of commenters.” You wouldn’t have bothered doing what? And whether or not I care to check IP addresses, I would not assume the Elbian blog people who work very hard for me don’t know how to do something so obvious. This has got to be one of the most puzzling comments Corey, I’m mystified, and now I must work.

              • Corey

                Maybe my comments aren’t showing up for you under the comments of yours to which they reply. On my screen, it reads:

                Guest: “Methinks perhaps ‘Fisher’ is a plant …he seems a bit too ebullient!”

                Mayo: “Guest: Indeed! Likely a male plant who probably even believes he’s a Bayesian!”

                Corey: “That’s a curious way of phrasing it — are you alluding to Senn’s paper? (I’m not Fisher.)”




                Corey: “Maybe my comments aren’t showing up…”

                Anyway, perhaps you’re not familiar with the fine old internet tradition of sockpuppetry? I was trying to say that the cod Fisher is not my sockpuppet. At certain websites I have frequented sockpuppetry would be almost a default assumption in this situation.

                I thought I was pretty clear about being a Bayesian, but on review of this blog’s earliest posts, I see that I toned down my Bayesian advocacy after the second post in response to your preference for keeping discussions highly focused. Still, it should have been obvious from the way I keep trying to get you to look at Cox’s theorem.

              • Corey

                @Mayo, regarding “mind’s all made up?”

                Like the opinions of any good Bayesian, mine can be shifted by good evidence — specifically, a reason why E.T. Jayne’s interpretation of Cox’s theorem as justifying Bayesian inference is incorrect.

  6. Fisher: can’t seem to get “reply” to show beneath your comment. As I said, one can lump together pre-data points in the rejection region so that the type 1 and 2 errors are (allegedly) just likelihoods, but (a) so what? (b) Bayesians don’t do this in their tests, and (c) SEV is based on a post-data assessment of the actual data, not a predesignated rejection region. But the thing is, I really am missing what the point is here, you started out saying (sarcastically, it seems) that I am too easy on the Bayesians, and I fail to see how this lumping example demonstrates this.

  7. Corey: Never heard of socketpuppetry before. I’m not the least bit concerned about it on my blog*, so long as readers aren’t overly confused. I’m much more concerned about stock blogs, stock message boards,and “seeking alpha” comments that sometimes seem to show numerous distinct people pumping or derogating a stock, when there isn’t. There are even paid pumpers out there. But the goals of this blog are almost entirely pedagogical, and sometimes people can learn more, and feel freer, by using a blog name. Plus, it can be humorous and creative. However, I will always comment and post as myself.
    *Of course this could change, I’m a novice here on my rag tag blog. I may even stop writing altogether.

Blog at