At the start of our seminar, I said that “on weekends this spring (in connection with Phil 6334, but not limited to seminar participants) I will post some of my ‘deconstructions‘ of articles”. I began with Andrew Gelman‘s note “Ethics and the statistical use of prior information”[i], but never posted my deconstruction of it. So since it’s Saturday night, and the seminar is just ending, here it is, along with related links to Stat and ESP research (including me, Jack Good, Persi Diaconis and Pat Suppes). Please share comments especially in relation to current day ESP research.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Part 1.

Let me begin with Gelman’s last section: *“A Bayesian wants everybody else to be a non-Bayesian.”* Surely that calls for philosophical deconstruction, if anything does. It seems at the very least an exceptional view. Whether it’s widely held I can’t say (please advise). But suppose it’s true: Bayesians are publicly calling on everybody to use Bayesian methods, even though, deep down, they really, really hope everybody else won’t blend everything together before they can use the valid parts from the data—and they really, really hope that everybody else will provide the full panoply of information about what happened in other experiments, and what background theories are well corroborated, and about the precision of the instruments relied upon, and about other experiments that appear to conflict with the current one and with each other, etc., etc. Suppose that Bayesians actually would prefer, and are relieved to find, that, despite their exhortations, “everybody else” doesn’t report their posterior probabilities (whichever version of Bayesianism they are using) because then they can introduce their own background and figure out what is and is not warranted (in whatever sense seems appropriate).

At first glance, I am tempted to say that I don’t think Gelman really believes this statement himself if it were taken literally. Since he calls himself a Bayesian, at least of a special sort, then if he is wearing his Bayesian hat when he advocates others be non-Bayesian, then the practice of advocating others be non-Bayesian would itself be a Bayesian practice (not a non-Bayesian practice). But we philosophers know the danger of suggesting that authors under our scrutiny do not mean what they say—we may be missing their meaning and interpreting their words in a manner that is implausible. Though we may think, through our flawed interpretation, that they cannot possibly mean what they say, what we have done is substitute a straw view for the actual view (the straw man fallacy). (Note: You won’t get that I am mirroring Gelman unless you look at the article that began this deconstruction here.)

Rule #2 of this blog[iii] is to interpret any given position in the most generous way possible; to do otherwise is to weaken our critical evaluation of it. This requires that we try to imagine a plausible reading, taking into account valid background information (e.g., other writings) that might bolster plausibility. This, at any rate, is what we teach in philosophy. So to begin with, what does Gelman actually say in the passage (in Section 4)?

“Bayesian inference proceeds by taking the likelihoods from different data sources and then combining them with a prior distribution (or, more generally, a hierarchical model). The likelihood is key. . . . No funny stuff, no posterior distributions, just the likelihood. . . . I don’t want everybody coming to me with their posterior distribution—I’d just have to divide away their prior distributions before getting to my own analysis. Sort of like a trial, where the judge wants to hear what everybody saw—not their individual inferences, but their raw data.” (p.5)

So if this is what he means by being a non-Bayesian, then his assertion that “a Bayesian wants everybody else to be a non-Bayesian” seems to mean that Bayesians want others to basically report their likelihoods. But again, if Gelman is wearing his Bayesian hat when he advocates others not wear theirs, i.e., be non-Bayesian, then his advising that everybody else not be Bayesian (in the sense of not combining priors and likelihoods), is itself a Bayesian practice (not a non-Bayesian practice). So either Gelman is not wearing his Bayesian hat when he recommends this, or his claim is self-contradictory—and I certainly do not want to attribute an inconsistent position to him. Moreover, I am quite certain that he would not advance any such inconsistent position.

Now, I do have some background knowledge. To ignore it is to fail to supply the most generous interpretation. Our background information—that is, Gelman’s (2011) RMM paper [iv]—tells me that he rejects the classic inductive philosophy that he has (correctly) associated with the definition of Bayesianism found on Wikipedia:

“Our key departure from the mainstream Bayesian view (as expressed, for example, [in Wikipedia]) is that we do not attempt to assign posterior probabilities to models or to select or average over them using posterior probabilities. Instead, we use predictive checks to compare models to data and use the information thus learned about anomalies to motivate model improvements” (p. 71).

So now Gelman’s assertion that “a Bayesian wants everybody else to be a non-Bayesian” makes sense and is not self-contradictory. *Bayesian,* in the term *non-Bayesian,* would mean something like a standard inductive Bayesian (where priors can be subjective or non-subjective). Gelman’s non-standard Bayesian wants everybody else not to be standard inductive Bayesians, but rather, something more akin to a likelihoodist. (I don’t know whether he wants only the likelihoods rather than the full panoply of background information, but I will return to this.) If Gelman’s Bayesian is not going to assign posterior probabilities to models, or select or average over them using posterior probabilities, then it’s pretty clear he will not find it useful to hear a report of your posterior probabilities. To allude to his trial analogy, the judge surely doesn’t want to hear your posterior probability in Ralph’s guilt, if he doesn’t even think it’s the proper way of couching inferences. Perhaps the judge finds it essential to know whether mistaken judgments of the pieces of evidence surrounding Ralph’s guilt have been well or poorly ruled out.That would be to require an error probabilistic assessment.

But a question might be raised: By “a Bayesian,” doesn’t Gelman clearly mean Bayesians in general, and not just one? And if he means all Bayesians, it would be wrong to think, as I have, that he was alluding to non-standard Bayesians (i.e., those wearing a hat of which Gelman approves). But there is no reason to suppose he means all Bayesians rather than all Bayesians who reject standard, Wiki-style Bayesianism, but instead favor something closer to the view in Gelman 2011, among other places.

Having gotten this far, however, I worry about using the view in Gelman 2011 to deconstruct the passages in the current article, in which, speaking of a Bayesian combining prior distributions and likelihoods, Gelman sounds more like a standard Bayesian. It would not help that he may be alluding to Bayesians in general for purposes of the article, because it is in this article that we find the claim: “A Bayesian wants everybody else to be a non-Bayesian.” So despite my attempts to sensibly deconstruct him, it appears that we are back to the initial problem, in which his claim that a Bayesian wants everybody else to be a non-Bayesian looks self-contradictory or at best disingenuous—and this in a column on ethics in statistics!

**But we are not necessarily led to that conclusion! I’ve pasted part 2 below, and here’s a link to part 3…..**

[i]Gelman, A. “Ethics and the statistical use of prior information”

[ii] The main posts, following the first one, were:

More on using background info (9/15/12)

Statistics and ESP research (Diaconis) (9/22/12)

Insevere tests and pseudoscience (9/25/12)

Levels of inquiry (9/26/12)

[iii] This the Philosopher’s rule of “generous interpretation”, first introduced in this post.

[iv] Gelman, A. (2011). “Induction and Deduction in Bayesian Data Analysis”, *Rationality, Markets, and Morals (RMM)* 2, 67-78.

__________________________

**PART 2**

A Bayesian, Gelman tells us, “wants everybody else to be a non-Bayesian” (p. 5). Despite appearances, the claim need not be seen as self-contradictory, at least if we interpret it most generously, as Rule #2 (of this blog) directs. Whether or not “a Bayesian” refers to all Bayesians or only non-standard Bayesians (i.e., those wearing a hat of which Gelman approves), his meaning might be simply that when setting out with his own inquiry, he doesn’t want your favorite priors (be that beliefs or formally derived constructs) getting in the way. A Bayesian, says Gelman (in this article) is going to make inferences based on “trying to extract information from the data” in order to determine what to infer or believe (substitute your preferred form of output) about some aspect of a population (or mechanism) generating the data, as modeled. He just doesn’t want the “information from the data” muddied by your particular background knowledge. He would only have to subtract out all of this “funny business” to get at your likelihoods. He would only have to “divide away” your prior distributions before getting to his own analysis (p. 5). As in Gelman’s trial analogy (p. 5.), he prefers to combine your “raw data,” and your likelihoods, with his own well-considered background information. We can leave open whether he will compute posteriors (at least in the manner he recommends here) or not (as suggested in other work). So perhaps we have arrived at a sensible deconstruction of Gelman, free of contradiction. Whether or not this leaves texts open to some charge of disingenuity, I leave entirely to one side.

Now at this point I wonder:* do Bayesian reports provide the ingredients for such “dividing away”*? I take it that they’d report the priors, which could be subtracted out, but how is the rest of the background knowledge communicated and used? It would seem to include assorted background knowledge of instruments, of claims that had been sufficiently well corroborated to count as knowledge, of information about that which was not previously well tested, of flaws and biases and threats of error to take into account in future designs, etc. (as in our ESP examples 9/22 and 9/25). The evidence for any background assumptions should also be made explicit and communicated (unless it consists of trivial common knowledge).

Doesn’t Gelman’s Bayesian want all this as well? *What form would all this background information take? *

I see no reason to (and plenty of reasons not to) suppose that all the relevant background information for scientific inquiry enters by means of formal prior probability distributions, whether the goal is to interpret what this data, say x, indicate, or to make a more general inference given all the relevant background knowledge in science at the time, say, E. How much less so if one is not even planning to report posterior probabilities. Background information of all types enters in qualitatively to arrive at a considered judgment of what is known, and not known about the question of interest, and what subsequent fruitful questions might be.

In my own take on these matters, even in cases of statistical inference, it is useful to distinguish a minimum of three models (or sets of questions or the like), which I sometimes identify as the (primary) theoretical, statistical, and data models. Recall the “three levels” in my earlier post. *If one is reporting what data x from a given study indicate about the question of interest, one may very likely report something different than when reporting, say, what all available evidence E indicate or warrant.* I concur with Gelman on this. Background information enters in specifying the problem, collecting, and modeling data; drawing statistical inferences from modeled data, and in linking statistical inferences to substantive scientific questions. There is no one order either—it’s more of a cyclical arrangement.

Does Gelman agree? I am guessing he would, with the possible exception of the role of a Bayesian prior, if any, in his analysis, for purposes of injecting background information. But I’m too unclear on this to speculate.

To this same, large extent, Gelman’s view on the proper entry of background knowledge is in sync with Sir David Cox’s position; although for a minute I thought Gelman was disagreeing with Cox (about background information), this analysis suggests not. Beyond what might be extracted from the snippet from the informal (Cox-Mayo) exchange, to which Gelman refers (p. 3), Cox has done at least as much as anyone else I can think of to show us how we might generate, systematize, and organize background information, and how to establish the criteria appropriate for evaluating such information.[i]

But maybe the concordance is not all as rosy as I am suggesting here. After all, in the same article, Gelman gives a convincing example of using background information which leads him to ask:

*“Where did Fisher’s principle go wrong here? (p. 3)”*

To be continued . . .in part 3.

**Check the comments from these earlier posts for some good discussion.**

[i]I give just one old and one new reference:

Cox, D. R., (1958), *Planning of Experiments*. New York: John Wiley and Sons. (1992 Republished by Wiley Classics Library Edition.)

Cox, D. R., and C. A. Donnelly (2011), *Principles of Applied Statistics*, Cambridge: Cambridge University Press.

Gelman’s position seems to be that he doesn’t trust others to put in their priors and would rather put them in himself. OK. This suggests the data report would be likelihoods, and each person then puts in her prior, or maybe each person goes to a Gelman for his prior.

Mayo:

In retrospect I did not express myself clearly. Best would be for each researcher to present his or her full data collection procedure as well as the data themselves, and also his or her model, giving justifications for each step (not just for the “prior” but also for the data model).

My real point (which, again, I did not express so clearly) is that if people just present their inferences without clarity on what information has gone into these inferences, there can be trouble if I want to combine information from different sources. There are settings where the data model is clear and the data information cleanly separates into factors of a likelihood, and that’s what I was thinking of when I spoke of “the likelihood.” But I agree with you that the likelihood function is not in general enough, for one reason because other aspects of the data collection (for example, the stopping rule) can also be relevant and also, more importantly, there in general is not “the likelihood” as this “likelihood” typically includes lots of modeling assumptions which should be understood and justified.

I assume you’re joking when you suggests that a person “goes to a Gelman for his prior.” As I’ve said so many times, there’s nothing so special about the part of the model we call a “prior,” especially in hierarchical models. I do think that “a Gelman” (or should we say “a gelman”) can be helpful in formulating one’s models, just because gelmans do have some experience with such things. My books include much, but not all, of my understanding of statistical modeling.

Finally, I hope your readers will realize that the article you link to has four parts, and your discussion is all about part 4, which happens to be the shortest section of the paper!

Andrew: Thanks for your comment. I do say I start here with the last part, but my deconstruction is on parts 3 and 4. The few pages are fascinating because they bring up so many puzzles about what Bayesian statistical inference really is, or is intended to be. I’ll just consider one.

You wrote in your comment: “As I’ve said so many times, there’s nothing so special about the part of the model we call a “prior,” especially in hierarchical models”.

But I think there must be something special, at least with respect to “the prior” that you have in mind when you say (in the article): “I’d just have to divide away their prior distributions before getting to my own analysis”. I don’t see that you are going to divide away other aspects of their model. I agree with you about keeping prior assessments of claim C separate for purposes of evaluating how well (or badly) it is warranted by the data x from the inquiry at hand*. A distinct question might be, given all the well tested claims and theories overall, what can be said about claim C. That’s why I see no disagreement with Cox’s view.

Of course if this is a probabilistic prediction problem, with frequentist-based priors, we might both agree to use the prior rather than keep it separate. But in those cases, I take it, you would not have been talking of a Bayesian wanting everyone else to be a non-Bayesian.

(I think you were going to write an official reply to my deconstruction, although you gave some helpful comments. There’s always an open invite to do so, no $300 fee).

At the very least, it would be good for others to be aware of your recommendation (in your comment) “for each researcher to present his or her full data collection procedure as well as the data themselves, and also his or her model, giving justifications for each step (not just for the “prior” but also for the data model)”.

Mayo:

Regarding the bit about the likelihood, my main point as noted above is that it was misleading for me to speak of “the likelihood” as having some external existence, given that it, like “the prior,” is chosen by the researchers in light of information external to the data.

Andrew: OK. But the fact that something is chosen by researchers in light of info external to the data doesn’t entail, one way or another, whether it has external existence (as opposed to what, by the way? As opposed to being “in a mind”? or representing ideas/beliefs in a mind?). Every concept and model employed in the inquiry is chosen or constructed, but their elements may still refer to entities and processes with “external existence”. There are several different layers of clarification I’d want to try in order to disentangle what you might be saying and meaning. We don’t have to do it now; it’s unlikely to be the best forum anyway.

A simple example would be a researcher wanting to know the performance of a diagnostic test in the Chinese population having read a study that just reported the positive and negative predictive properties of that test (i.e. posterior probabilities) in the US population based on disease prevalence there (i.e. prior probability of disease before the test)). The prevalence of the disease in the US is not directly relevant but needs to be _divided_ out the get likelihood ratios that might generalise to China and give good estimates of the positive and negative predictive properties of that test in China given the prevalence there.

The more general problem is that when Fisher asked the question of how to summarise individual study results so that others could obtained from just those summaries, all that they could obtain from access to all the individual data, he suggested (incorrectly) the likelihood function did that (being the minimally sufficient statistic).

Perhaps he assumed the distribution of such functions could be somehow obtained just from the likelihood function (providing error statistics) but he also missed that the problem was dynamic rather than static. As each study is done, the data may seem to be Normally distributed, but as the collection of study data increases, it might clearly be seen to be say LogNormally distributed (within the individual studies, but now with many of them to look at) and now the likelihood functions calculated and saved from assuming Normal distribution in the past are problematic (deficient).

Or the likelihood function from assuming just a common treatment effect gets rejected when the individual studies consistently show gender differences. So you need to foresee all future possible study data sets to know what summary of the first study will be sufficient. I identified this in my Phd thesis, not sure if enough attention has been drawn to it.

Two U-Phil responses to Gelman on this topic are http://errorstatistics.com/2012/10/12/u-phils-hennig-and-aktunc-on-gelman-2012/

I think that my text linked there is not the one commenting on Gelman’s paper that is discussed here; it’s a different topic. I could probably find the right one myself but perhaps the Elba people have faster methods to do that kind of thing?

I commented on a blog called “prior probability”! http://priorprobability.com/about/comment-page-1/#comment-3240

I read Gelman’s article “Ethics and the Statistical Use of Prior Information” and have several concerns and impressions.

I mainly ended the reading asking myself “what really is the point here”?

The title poses a useful and interesting question, one every statistician should consider at length. For example, ethics indeed dictate the use of prior information in court cases, so any statistician considering working on a legal case should understand such issues.

Gelman rightly states that “all statistical methods allow prior information to be used in the design of a study, or in choosing what variables to include and how to transform them, or in the interpretation of results”. Fine. Off to a good start.

The trouble begins in the next sentence. “What distinguishes Bayesian methods is the expression of prior information in the form of probability distributions on parameters in a model. But this is controversial.” Why, oh why, are human beings such “splitters”? Why do we exhibit such a strong penchant to divide our kind into multiple groups, and declare ourselves members of the obviously superior group? Even the language is subtle in its messaging. Bayesian methods are distinguished. Other methods therefore must be less than distinguished.

Gelman discusses another example of ethically inappropriate use of prior information, citing a census example. All well and good, valuable information for any practitioner of the statistical sciences.

Section 2 continues the troubles, in its title: “General arguments for the superiority of a Bayesian or non-Bayesian paradigm”. Now “superiority” implies a ranking, a judgment of “greater than” and “less than”, and as practitioners of a mathematically based science, all statisticians should know that one can only assess greeter than, less than, and so also superiority, in one dimension. Greater than is a mathematical operation that only exists in one dimension. Thus judging superiority of a statistical analytical paradigm requires reduction to one dimension in order to demonstrate. On what single dimension shall we measure the superiority of the Bayesian paradigm? This is one of the main problematic points in this long, tired and fallacious argument about which statistical paradigm shall rule them all.

The first sentence of Section 2 states “More generally, various statisticians have argued that their preferred methods satisfy certain desirable general properties and thus should be preferred in practice”. Can we rephrase that, and step away from this human penchant to divide and deride? How about “In general, various statisticians have argued that their preferred methods satisfy certain desirable general properties and thus are very useful in practice”? Of course, there are a number of desirable general properties that are associated with statistical methods useful for understanding natural and other phenomena. We do disservice to reduce down to a single desirable property, with which to unequivocally pick a “winner”.

At this point in the article, I already felt lost. How does this need to harp on the “superiority” of one method fit into the theme of the article, which regards ethics and the use of prior information, which Gelman has already reminded us is practiced in all statistical methods? Bayesianism and frequentism and other statistical paradigms are all subject to the issue of ethical use of prior information. If Bayesianism is to be adjudicated the superior paradigm, I am left wondering “Which Bayesian paradigm? Subjective Bayes? Objective Bayes?”

Gelman now picks a frequentist example from the annals of the particle physicists, confidence intervals regarding physical traits, pointing out the failure in practice of such erroneous methodology. This paragraph ends with the statement “This historical miscalibration should not be taken as an indictment of classical methods – I have no doubt that Bayesian intervals would have similarly poor coverage – rather it is a warning of the dangers of taking a theoretical claim out of the lab and on to the street”. Kudos to Gelman for rightly pointing out that Bayesian intervals would indeed have yielded similarly poor coverage. The issue is not at all whether one uses a frequentist or a Bayesian paradigm, but rather whether the assumptions involved with production of said intervals were reasonable. Unreasonable assumptions will yield a silly outcome for a frequentist or a Bayesian.

The following paragraph contains this nugget: “That said, we do not find the argument compelling that Bayesian methods are uniformly better”. My question to Gelman here is “Who are ‘we'”? Is Gelman included in said “we”?

Frequentist methods and Bayesian methods can effectively shed light on understanding phenomena under investigation – I remain unclear on what was the point of Section 2 in this article?

Section 3 introduces some strawman examples with which to allow further swiping at the frequentist pinata, and deconstruct Fisher. Gelman speculates on whether Cox would agree with him. I would suggest that Gelman ask Cox whether Cox agrees with him, and report back to us. Cox is still with us, at Oxford, so ask soon. Gelman further speculates that Laplace would agree with him, but with Laplace’s untimely demise, this shall forever remain speculation. It’s an interesting human game, this claiming that some august citizen of the past would surely agree with said opinion. I beileve the goal is to try and associate the offered opinion with an undisputed authority figure, in an attempt to imbue the offered opinion with some of the glory of said august historical figure. I’m certain Einstein would agree with me on this.

The concluding sentence to the column states “In any case, I believe we are ethically required to clearly state our assumptions and, to the best of our abilities, explain the rationales for these assumptions and our sources of information.” Very well said, and of course this well expressed belief holds for any statistician, regardless of whether that statistician decided that a Bayesian analysis would be appropriate for a certain legal argument in part of a court case, and decided that a frequentist analysis would be appropriate for another legal argument, even in the same court case.

Bayesian and frequentist and other statistical paradigms are merely tools in the toolchest of a capable statistician. Understanding all of them, and their attendant limitations in this situation or that, should be the goal of a reasoned statistical practitioner, rather than forging a “belief” in the absolute “superiority” of any one of them. The discussion of ethical considerations and the statistical use of prior information would have been better presented by reviewing such issues with regards to multiple statistical paradigms, rather than infusing this necessary discussion with another attempt to prove the superiority of one of them.

Steven McKinney: Thanks for your detailed comment! I agree with the bulk of what you say, but I do think distinctions enter, rightly, in considering foundations of statistics: what are the key goals and how might statistical models and methods be used to accomplish them? The error statistical account that I favor views the main role of probability in scientific inference as determining how well warranted claims are, by controlling and assessing relevant error probabilities. It seems we agree on this. Other accounts may have distinct goals.

I hadn’t seen your name on this blog before, but after a bit of research found some very interesting critical analyses of yours on Retraction Watch. Notably, there’s this in relation to a Bayesian regression technique associated with the retracted Anil Potti work, occasionally mentioned on this blog: http://retractionwatch.com/2012/02/14/the-anil-potti-retraction-record-so-far/#comment-10488

“Given the number of retractions to date involving this methodology, and the general statistical principles violated by the methodology, my guess is that if anyone ever attempts to complete the research outlined in this paper to provide actual scientific evidence testing the unproven assertions, the outcome will be a repudiation of the assertions. Indeed Baggerly and Coombes did perform a “frequentist”-based assessment of this “Bayesian”-based methodology which did exactly that.

…

In truth, most statistical issues have a frequentist-based solution, and a Bayesian-based solution, and these solutions converge (comfortingly) to the same answer as sample sizes grow and grow (the solutions converge, asymptotically, as sample size increases towards infinity). Indeed, in many cases where a frequentist solution differs from a Bayesian solution, much valuable insight into deep statistical philosophical issues has been made.

Thus, the frequentist-based assessment undertaken by Baggerly and Coombes of this Bayesian-based methodology, from which Baggerly and Coombes concluded “Using the scores that we compute, we do not see a large story here” suggest that even if the Bayesian-based methodology was corrected to avoid the statistical no-no of using ALL your data to derive “supergenes” it would not perform well. Baggerly and Coombes’ frequentist-based methods should yield similar outcomes to the Bayesian-based method, especially as sample sizes get large. If they do not (and this is research that I have not ever found published) then we have ourselves one of these frequentist-Bayesian discordant situations that represents a valuable learning opportunity.

An unfounded and unproven assertion in the article cited above* reads “One might suspect that the method just “stores” the given class assignments in the parameters . Indeed this would be the case if one uses binary regression for n samples and n predictors without the additional restrains [sic] introduced by the priors. That this suspicion is unjustified with respect to the Bayesian method can be demonstrated by out-of-sample predictions.” Herein lies the assertion that somehow this Bayesian-based method is not subject to the known statistical phenomenon of overfitting a model to a data set. Until a proper mathematical and computer-simulation analysis of this issue is performed, all of the assertions made in dozens of papers using this methodology remain on shaky grounds.”

Steven McKinney

February 15, 2012

*The source of the article may be found at the Retraction Watch link above.

So can you give us a brief update? What has happened to the project of performing an error statistical analysis of this methodology? Has it been carried out? Why was there such reluctance to pursue the critical inquiry they launched? I was glad to hear that Efron had been instrumental in getting the Baggerly and Coombes paper published.

Steven:

I agree with much of what you say in your note and it seems that your last two paragraphs essentially express agreement with what I wrote in my article. Somehow I feel that the tone of your note implies disagreement with what I wrote, but the content is essentially in agreement with you.

I should perhaps clarify one thing: in my section, “General arguments for the superiority of a Bayesian or non-Bayesian paradigm,” I was

notsupporting the idea of “superiority” in any way! Rather, I was relating examples where Bayesians or non-Bayesians had argued for superiority, and I was saying Idon’tlike that way of thinking. I think that you were misled by the title of that section, or perhaps misled based on your expectations going into the article, so that you seemed to get the impression I was saying the opposite of what I was actually saying! Hence I wrote this comment, as it’s kind of horrible to think that someone read my article and came out with the opposite impression as what I was intending.Just to repeat: I wrote of “general arguments for superiority” because influential people, other than me, make such arguments. I don’t buy it, and I don’t think a general concept of superiority here makes sense.

For more on this point, please see footnote 1 of this article. (Mayo knows about this article as it appeared in a journal issue that she edited!) I think I’ve done as much as anyone in statistics to emphasize that neither Bayesian nor non-Bayesian methods has inherent superiority. So, again, it really bothers me to see someone thing I’ve implied otherwise.

P.S. I do not know Cox directly but I did ask Mayo to send some earlier version of some of this material. I did not hear back from him but that is fine. He is a busy person and has no obligation to respond to me. He wrote what he wanted to write and I wrote what I wanted to write. I don’t think there’s any harm in that. There’s a long tradition of discussions, scientific and otherwise, being carried out in this way, where person A writes something, then person B writes something in reaction, etc. Direct contact between the authors might be desirable but it is certainly not necessary.

P.P.S. You ask of my article: “what really is the point here?” My response is that the article has no single point. It is what it is, and I don’t think I can express its point in fewer words than the article itself. This might bother you, and you might prefer a different sort of article that can be more succinctly summarized, but that is a matter of taste, I think.

Andrew:

“P.S. I do not know Cox directly but I did ask Mayo to send some earlier version of some of this material. I did not hear back from him but that is fine. He is a busy person and has no obligation to respond to me.”

I did send it to him.

” He wrote what he wanted to write and I wrote what I wanted to write.”

Yes, and not only did he write what he wanted to write, he meant what he wanted to mean. And after all is said and done, I fail to see that you’ve shown why he shouldn’t. Further, you seem to agree with him.

Andrew:

“Somehow I feel that the tone of your note implies disagreement with what I wrote, but the content is essentially in agreement with you.”

There’s plenty of reasonable discussion in much of your writing, but your subtle (and sometimes not so subtle) continual boosting of Bayesian methods, and inappropriate disparaging of frequentist methods, needs discussion, so that people not so familiar with statistical foundational issues and the underlying mathematics of statistics can better appreciate when they are being misled.

I went to your webpage listing your publications, and just picked one out that I had not yet read. This one caught my eye, as it deals with a topic I have to handle on a regular basis:

“Why We (Usually) Don’t Have to Worry About Multiple Comparisons”

Journal of Research on Educational Effectiveness, 5: 189–211, 2012

DOI: 10.1080/19345747.2011.618213

This section illustrates the types of statements I find disingenuous, and that only serve to mislead and confuse those trying to understand how statistics helps shed light on natural phenomena:

“A DIFFERENT PERSPECTIVE ON MULTIPLE COMPARISONS

Classical methods typically start with the assumption that the null hypothesis is true—an unhelpful starting point, as we discuss next. Moreover, we argue that classical procedures fail by insufficiently modeling the ensemble of parameters corresponding to the tests of interest (see Louis, 1984). We cannot hope to have the goal of proposing an optimal statistical method for all circumstances. Rather we present an entirely different perspective on the issue and its implications and argue that, when viewed from a Bayesian perspective, many of these problems simply disappear.”

“Simply disappear” – how marvellous! So easy! So convenient!

Perhaps it is unclear what classical procedures means in this paragraph. Is Bayesian statistics, with its 250 year history, a classical procedure? I’ll use the term “frequentist” instead, correct me if I have guessed incorrectly, but that is what you appear to be discussing.

Of course frequentist procedures fail to model the parameters – that’s what Bayesians do, not frequentists. Frequentists model the data, with mathematical models containing fixed parameters, characterising a distribution being used to mimic the structure in the data. Estimates of parameters are obtained from frequentist models. Indeed sometimes too many parameters appear in a model, but sensible statistical practitioners seek simple models with enough parameters to allow proposed distributions to adequately reflect the structure in the data. All reasonable statisticians do this, Bayesian or otherwise. How this represents a “failure” is beyond me. Standing bridges and skyscrapers, and airplanes not falling from the sky, regularly remind me of the effectiveness of statistical methods in reasonably reflecting natural phenomena, and not all of those statistical methods that allow buildings to remain vertical for decades are Bayesian methods.

Of course we do not believe that the null hypothesis is true, that tau_j exactly equals zero. We posit the scenario that the null hypothesis is true, and assess how well the data fit reasonable models under that scenario. Positing the scenario does not mean we believe it. Sensible statisticians always attempt to throw off the shackles of “belief”, in an attempt to allow changes in understanding of natural phenomena when evidence appears in data that warrant changes in beliefs.

Sensible statisticians try to answer the question ‘How different from zero should tau_j be so that I care about it?’ This is context dependent, and rarely an easy question to answer, but thoughtful practitioners, scientists, doctors, engineers, understand how much different from zero is scientifically (not statistically) interesting and meaningful. A difference of scientific or medical or otherwise practical importance is determined, and tests that can reasonably distinguish such differences from zero devised. If a medical manoeuvre conducted on a million people would extend the happy life of one of them by one day, few people would be motivated to expend large amounts of effort to make such a small change happen. If a medical manoeuvre could extend the lives of one hundred thousand of those million by 5 years or more, plenty of people would lend in the effort to bring such a manoeuvre into medical practice. We can argue about the middle ground, where we might think the payoff is worth the effort, and indeed such discussions are important. The goalpost may change from time to time, but with a ballpark assessment of what kinds of differences in parameter values we care about, sensible frequentist tests abound, as do sensible Bayesian methods. Without such consideration, how on Earth could a Bayesian figure out how much information, how much data, should he collect in order to yield a highest posterior density interval smaller than such a difference of importance, so as to allow sensible resolution as to which state the system under study is in – the “not different enough that we care” or the “different enough that we need to get busy” state.

This issue is present for frequentists, and Bayesians, and sensible statisticians, and nothing in Bayesian mathematics makes this issue so simple that these issues “simply disappear”. The issues are handled, with differing mechanisms consistent with the paradigm being employed. A discussion of how the mathematics handles the issue would have been useful. A debate about how successfully particular Bayesian or frequentist methods solve the issue of misguided conclusions in the face of multiple comparisons would have been valuable. Perhaps, given your particular mathematical talents, you find these maths easy, but I can attest that the biologists I work with do not. Too many less-than-sensible statisticians also have trouble with the math, and the philosophy. Statements such as ‘these issues simply disappear’ are disingenuous and misleading.

I am unclear on how the content of your writings is in agreement with my position that sensible statisticians reasonably consider methods that work, regardless of their frequentist or Bayesian flavour. I am unclear as to how your writings do not support the idea of Bayesian superiority, given the statements I find in many of your writings, such as shown above. If you don’t like that way of thinking, why do I see it so often in your writings?

Perhaps you (and other Bayesian boosters who write with a similar style) do not even realize that you imbue your writings with statements of this tone and nature, but they do no great service in helping others understand how statistical principles help solve important problems. Bayesian methods alone are not going to save the world, and no amount of Madison Avenue sales pitch language or other subtle choice of language will change that.

For the record: Cox has since responded with several e-mails to Gelman.

Mayo:

As explained in my article (and as I discuss in the context of examples), I disagree with Cox’s statement that “when they present their information to the public or government department or whatever, they should absolutely not use prior information.” I think this puts way too strong a burden on the often arbitrary division between “data” (which in practice is often only interpretable in the context of a statistical model which is constructed based on scientific understanding, that is, prior information) and “prior information” (which in practice is often the product of hard data). In all sorts of important problems, prior information is essential in getting you from raw data to the questions of interest to the public or government department or whatever.

As I also wrote in that article, all statistical methods require choices–assumptions, if you will. For example, Cox’s proportional hazard regression is one of the most influential statistical methods of the past half-century, but additivity is a prior assumption too! It’s just not possible to determine or even validate all one’s choices from the data at hand. If you don’t want your choices to be based on prior information, what other options do you have? You can rely on convention–using methods that appear in major textbooks and have stood the test of time–or maybe on theory. Both these meta-foundational approaches have their virtues but neither is perfect: Conventional methods are not necessarily good (as can be seen by noting that for many problems there are multiple conventional methods that give different results), and theory often doesn’t help (for example, classical confidence intervals and hypothesis tests are insufficient in problems with noisy data).

I do not think the statement I quoted by Cox is an error, exactly; I just think that in this case he is working with a conceptual model that does not cover many of the problems I’ve worked on in various areas including environmental decision making, political science, and toxicology.

Andrew: I’m afraid you are repeating an entirely false view, that frequentist error statistical methods (or what you are calling conventional methods) use no background other than the “data” from the experiment at hand. Obviously others have followed you into thinking this is so, but it is absurd in the extreme.

You keep forgetting that the background at issue when it comes to testing H are claims related to already assuming H is well (severely) tested, however subliminally these background presuppositions enter. That’s because we like to avoid begging questions, bias, self-deception, and merely assuming H is well tested when that’s what we’re trying to figure out! Cox made it abundantly clear that background knowledge was essential in testing H. There’s a big difference between:

(1) I believe H, or even, there is excellent overall evidence for H from background,

and

(2) this test T with data x corroborate H. (H is well tested by T with x).

A minute ago* I read they are questioning the recent alleged evidence for gravitational waves (BICEP2 data). I haven’t looked into the details, but it’s an excellent example of my point. If you bring in previous evidence which in fact for like 40 years, gave excellent evidence for gravitational waves—for purposes of evaluating the BICEP2 experiment —then you are at odds with this critique which has to do with whether the BICEP2 experiment did a good job distinguishing aspects of polarized gravity waves from polarized stellar dust or whatever. It may not have corroborated the particular inference about an aspect of gravity waves that it was thought to, and this is distinct from whether we have other indirect evidence for the existence of gravity waves.

But never mind that example. I’m very disappointed that you especially would maintain the view that frequentist error theory requires using only the data from the experiment at hand! (Really? in experimental design? in checking assumptions? in specifying effect sizes of interest? in fraud busting? in relating statistical to substantive? in meta-analysis?) What is rare, however, is that this background comes in by means of prior probabilities–which I take it is why you yourself say “a Bayesian wants everyone else to be a nonBayesian”. We use the background info that we have and need (if it happens to be an empirical prior, fine, but we still need much more.) I guess we’ve really made no progress in correcting one of the more outlandish howlers.

* http://profmattstrassler.com/2014/05/19/will-bicep2-lose-some-of-its-muscle/#more-7659

Mayo:

I don’t understand what you’re talking about. I didn’t say anything about “frequentist error statistical methods.” All I said was that I disagreed with Cox’s statement, that “when they present their information to the public or government department or whatever, they should absolutely not use prior information.” I disagree. I think that in many problems involving the public or government department or whatever, it is a good idea to use prior information, indeed I think that in many settings it is essentially impossible

notto use prior information.Andrew: All along–as in my deconstruction– I’ve been saying that you actually agree with Cox, but the implication in your article is that Cox or Fisher or frequentists were doing something you disagreed with in interpreting the import of data,namely, using only the data at hand. Even in your previous comment you say, “It’s just not possible to determine or even validate all one’s choices from the data at hand. If you don’t want your choices to be based on prior information, what other options do you have?”

But nobody said we’re restricted to the data at hand.

In the Cox-Mayo published “conversation”–the one on which your article is commenting–one finds:

http://www.rmm-journal.de/downloads/Article_Cox_Mayo.pdf

“MAYO: I do find it striking that people could say with a straight face that frequentists are not allowed to use any background information in using our methods….

COX: Well, it’s totally ridiculous isn’t it?”

In response your article asks where did Fisher go wrong?

If we can clarify this once and for all, we can at least move on. We might make headway in supplying the statistics to accompany something I’ve been trying to do for years, but from a broad philosophical standpoint: distinguish the types of background info and how it should/shouldn’t be used for probing a given question at a given stage of inquiry, the types of selection effects (of which Cox gives a taxonomy), double counting, use-constructing, etc. etc.

Mayo:

I’m not saying that Cox does anything wrong in his statistical analyses. I’m just saying that I disagree with his statement that “when they present their information to the public or government department or whatever, they should absolutely not use prior information.”

Perhaps you are saying that, given that the term “prior information” is not really defined, there’s nothing in that statement for me to disagree with?

In examples where data are sparse, or examples where a lot of prior information is needed to map from data to quantities of interest, I think that the use of prior information is essential, whether or not the public etc is involved.

There were a couple conversations I had with David in Oxford (2002/3) that might provide some clarification.

The first was about his position that ideally an experiment should stand as much as possible on its own – any need for extraneous information that is required to process (extract) the information in the study should be minimised. I’m fairly sure he would include choice of data generating model and other assumptions in addition to other studies and background knowledge as part of this extraneous or extra-study information.

The second was that especially when this extraneous information was important, it would not be helpful to public or government department officials to represent it in the form of a probability generating model for unknown parameters (i.e. process it as prior information). This discussion was in the context of Sander Greenland’s multiple bias analysis of electro-magnetic fields and childhood leukemia which used informative priors to represent uncertainty about confounding and other biases. My sense was he felt they would not be able to process the information adequately, if this extraneous information was packaged and dealt with in terms of a probability generating model for unknown parameters in the statistical modeling. But maybe he meant everyone, everywhere, always.

So my take on what David means is “when they present their information to the public or government department or whatever, they should absolutely not use prior information [to represent extraneous information that was essential to interpret the study(ies) in hand, but instead do this by a series of ad-hoc allowances and qualifications].” (_Ad-hoc_, as David used to say, being tomorrow’s good theory.)

My take is that – not everyone, including me can actually adequately put it into a series of ad-hoc allowances and qualifications (at least in a reasonable time frame) as that often requires, amongst other things, a level of mathematics and statistics that David excelled in.

Phaneron0: I think that David Cox is clear as to what he means; Gelman’s position is the one we can’t catch. Either he’s denying that frequentist error statisticians make use of background information or he is not denying this. Either the howler will continue to be propagated or it will stop. I still seek his clarification.

Now, as for your remark (never mind that it’s too late to be blogging, let me try something quickly), it sounds like you are referring to a prior probability:

“My sense was he felt they would not be able to process the information adequately, if this extraneous information was packaged and dealt with in terms of a probability generating model for unknown parameters in the statistical modeling.”

Is the relevant background information successfully packaged in terms of “a probability generating model for unknown parameters”? Or not. Not in general, not typically. [Isn't that why Gelman says a Bayesian wants everyone else to be a non-Bayesian?] And there’s much else that would be omitted, even in cases where one could. Instead, as I have discussed in around 4 different posts on background knowledge, all in connection to this Gelman deconstruction–I don’t know if you’ve read them– the altogether crucial background info that we just about always have has to do with info about problems, flaws, and various techniques for data generation, modeling and linking statistical tests to substantive scientific claims. If you threw out, for example, background info as to the ways people can readily generate impressive-looking statistical results erroneously (be they by multiple testing, cherry-picking, selection effects,stopping rules or other), then you will not be able to control and appraise what has been warranted by the data. The error statistician has ready-made niches by which to take this all-important kind of background into account within the inference (e.g., finding p-values spurious, fraud-busting). [This happens also to be the point of Birnbaum's confidence concept in the post I currently have up since May 27.]

Those who deny the relevance of error probabilities of methods do not have these direct ways to take this background information into account.

They may try to match the simple and direct ways the frequentist error statistician takes this background into account, but those gambits–even where they give “agreement of numbers” — fall short of capturing the rationale for why we want this background to enter into our appraisal of inference. We error statisticians at least.

So I think what Cox is saying is fairly straightforward. If Gelman can be pinned down as to what exactly he’s alleging about the frequentist error statistician’s use of a host of background information, then maybe we can avoid going around and around again. Maybe we can stop hearing the absurd claim that we frequentists do not make use of a slew of background information in statistical inference.

Andrew: This is the kind of thing Cox has in mind, even when it does not involve a formal prior.

“A startling number of forensics analysts are told by prosecutors what they think the result of any given test will be. This isn’t mere prosecutorial mischief; analysts often ask for as much information about the case as possible—including the identity of the suspect—claiming it helps them know what to look for.”

Why forensic science isn’t really science and how it could be killing innocent people | National Post

http://news.nationalpost.com/2014/06/11/why-forensic-science-isnt-really-science-and-how-it-could-be-killing-innocent-people/

Mayo:

I find this extremely frustrating. You write, “Gelman’s position is the one we can’t catch. Either he’s denying that frequentist error statisticians make use of background information or he is not denying this. Either the howler will continue to be propagated or it will stop. I still seek his clarification.” But I never say anything about frequentist error statisticians. You write that I’m “alleging” something about the frequentist error statistician’s use of a host of background information, but I never say anything about this.

I’m just saying that I disagree with Cox’s statement that “when they present their information to the public or government department or whatever, they should absolutely not use prior information.” As you say, what Cox is saying is fairly straightforward. And I disagree with Cox’s straightforward statement.

If you accept that his statement is straightforward, can’t you also accept that my statement is straightforward? Cox said A, I’m saying not-A.

Cox is a great man, but I’m allowed to disagree with something he said, right?

Again, here is my position. Cox said something, I disagree with it. Cox said that, in this sort of setting, people should “absolutely not use prior information.” I disagree with that statement. That is my position. My position is disagreement with the statement that “when they present their information to the public or government department or whatever, they should absolutely not use prior information.”

I’m not saying anything about frequentist error statisticians, indeed I don’t really know what a frequentist error statistician is. Nor am I saying anything about any howlers. I don’t quite know what a howler is either.

But, since you seek my clarification, let me clarify. I disagree with Cox’s statement that “when they present their information to the public or government department or whatever, they should absolutely not use prior information.” This is not a negative comment on Cox’s statistical practice, it’s not a negative comment on Cox’s books, it’s specifically a comment on one thing that Cox said, that I disagree with.

You conclude your comment with the statement, “Maybe we can stop hearing the absurd claim that we frequentists do not make use of a slew of background information in statistical inference.” I don’t know who is making this absurd claim, but it’s not me! I’ve never made such a claim. Rather, I’ve always emphasized that all sorts of statistical analyses use prior information (indeed, in my article I referred to Cox’s hazard regression as an example of this).

But now I’m puzzled: If you agree with me that it’s commonplace for statistical inferences, Bayesian or otherwise, to use prior information, why do you mind that I disagree with Cox’s recommendation that people “absolutely not use prior information”? In all sincerity, I’d think you’d agree with me. I’m puzzled, no joke.

But, in any case, I think my position should be clear, as it’s simply the negation of Cox’s statement, which you already characterized as “fairly straightfoward.”

Andrew: Let’s just focus on your position:

Gelman: My position is disagreement with the statement that “when they present their information to the public or government department or whatever, they should absolutely not use prior information.”

What Cox claims is that

one should not include prior opinions and beliefs about the truth of H in evaluating or reporting on what the data x from test T warrant as regards H (in particular, in the form of subjective prior probabilities in H).

So then your disagreement means you DO think that people should include prior opinions and degrees of belief in H (in the form of prior probabilities) in evaluating or reporting on what the data x from test T warrant as regards H.

Is this correct?

We can’t debate all the positions at once in blog comments, so we either do them one at a time, or use a larger and much better forum. I am in favor of the latter.

Mayo:

I am disagreeing with Cox’s statement as printed, “when they present their information to the public or government department or whatever, they should absolutely not use prior information.”

This disagreement is not a comment on what you write about “the truth of H,” “test T,” “subjective prior probabilities,” etc. I don’t see how this is so complicated. As part of an article I wrote, I commented on something that was printed under Cox’s name. It was the printed words I was commenting on. That’s all. Everything else is coming from you, not me. The questions you’re interested in are important, but they’re not what I was writing about. I was reacting to a particular sentence from Cox.

Andrew: Anyone who reads the article, or even that section, can tell that his claim is you should not combine your subjective degrees of belief with the data. We frequentists philosophers actually hold to taking into account the background of the entire section or article in correctly interpreting the warranted meaning of a claim (not taking it out of context). The interlocutor (me) made this abundantly clear in the article.

Are you claiming you agree with the part about not combining one’s prior degrees of belief, but think you ought to go ahead and report your hunch, “and by the way, although this isn’t shown by the data, I really think in my gut that the substance is quite harmless”. What exactly does it mean to agree with Cox not to combine subjective priors in the report or inference, but favor a separate report of hunches?

Mayo:

This is getting pretty tiring, but . . . I’m responding to what Cox wrote, not to what you think he met. You were in the room with him during that dialogue. All I had to go on were the printed words.

I wrote nothing about “subjective degrees of belief,” and I did not have “subjective degrees of belief” in mind. Nor was I writing about “frequentists philosophers” or about philosophers at all. Nor was I talking about my “hunch” or my “gut.” All of these words are coming from you!

What I was writing is abundantly clear. It is in my words. I was reacting to a specific statement by Cox. He was quoted as saying, “when they present their information to the public or government department or whatever, they should absolutely not use prior information.” And I disagreed with that statement. His statement said nothing about subjective degrees of belief, hunches, or guts. He was making a statement about “prior information,” and I was disagreeing with that statement. It’s really that simple. I’m disagreeing with something that Cox said (or at least was written under his name). You’re imputing all sorts of things to me about hunches and guts and opinions and degrees of belief and alleging and frequentist error statisticians and all sorts of other things. I neither said nor meant any of these things.

All that happened was that I read the statement, “when they present their information to the public or government department or whatever, they should absolutely not use prior information,” and I disagreed with it.

Oh Andrew…:

Well, Cox’s comment occurred in a context that made it pretty clear. so let’s see if we can isolate the claims to which you may or may not agree:

(1) In reporting on the evidential import of data x from test T as regards the warrant for claim H, we should not combine prior subjective degrees of belief or opinions about H with the appraisal of the warrant from x (from T).

(2) In presenting information to the public or govt. dept. we may wish to report prior information.

You may indicate how you feel about these, and hopefully we can move on to the main business about the role of background in frequentist statistical inference, whether here or in a different forum. thanks.

Mayo:

You were in the room with Cox when he said these things and I can believe that there was all sorts of context. All I had to go on were the printed words. These words were: “when they present their information to the public or government department or whatever, they should absolutely not use prior information.” I disagree with those words, for the reasons I’ve already described. It sounds to me like you disagree with those words as written. So in that case I think we’re in agreement.

Andrew: the very next lines (not in front of me) had me asking “prior knowledge”? To which he replied of course prior knowledge.

I agree with what he said and meant. The most important thing, rather than playing a little game, is to say what you do mean,

and if it’s the two claims I wrote before, then that at least is clear: don’t combine prior subjective opinions with the data, but do not refrain from using (how?) prior information (well warranted or any kind of hunch?) when presenting information to the govt or public.

In an airport at the moment…..

Mayo:

I’m not “playing a little game.” I’m happy to discuss these deeper issues with you, but it would help if you would address what I write, not what you think I am writing.

In the above discussion, you wrote, “you are repeating an entirely false view, that frequentist error statistical methods (or what you are calling conventional methods) use no background other than the “data” from the experiment at hand.” But I never wrote such a thing, indeed I never said anything at any point about frequentist error statistical methods.

In the above discussion, you wrote, “If Gelman can be pinned down as to what exactly he’s alleging about the frequentist error statistician’s use of a host of background information.” But I never alleged anything, nor did I write anything about any frequentist error statistician.

In the above discussion, you wrote, “So then your disagreement means you DO think that people should include prior opinions and degrees of belief in H (in the form of prior probabilities) in evaluating or reporting on what the data x from test T warrant as regards H.” But I never wrote anything about opinions or degrees of belief or warranting.

In the above discussion, you wrote, “Are you claiming you agree with the part about not combining one’s prior degrees of belief, but think you ought to go ahead and report your hunch, “and by the way, although this isn’t shown by the data, I really think in my gut that the substance is quite harmless”. What exactly does it mean to agree with Cox not to combine subjective priors in the report or inference, but favor a separate report of hunches?” But I never wrote anything about hunches or guts or subjective anything.

I find this exhausting (but I find your work valuable enough that I am willing to continue this conversation, frustrating as it is). It’s hard for me to have a discussion when you keep taking things that I did not say, and acting as if I said them.

David Cox wrote: “when they present their information to the public or government department or whatever, they should absolutely not use prior information.” I disagree with those words, for the reasons I’ve already described. I’d prefer to discuss this, and not discuss hunches, guts, frequentist error statisticians, howlers, or all the other things that I did

notwrite about.Andrew: Maybe it would be best to consider the two central issues here without going back to the Cox thing at all; free us from potential defensiveness of what he said/she said…could have meant/might have intended,reasonable impressions and illicit misimpression’s, etc. Or do you insist on pursuing them from that one origin or not at all?

Mayo:

In all seriousness, I’d appreciate you saying that you don’t think my disagreement with the statement, “when they present their information to the public or government department or whatever, they should absolutely not use prior information,” is so unreasonable.

It’s hard for me to have a dialogue when even my simplest statements get twisted. I don’t think you’re doing the twisting on purpose: it’s just that you’ve had so many discussions on the topic of prior information, with different people over the years, that you’re attributing views to me that I’ve never stated and don’t even have. I attribute no ill-will on your part whatsoever. Nonetheless, I find it difficult to proceed if a statement as simple as the one I made, is taken by you to have so many meanings that aren’t there.

Andrew: I’ve never said the statement was “so unreasonable”–given the 2-3 things it might mean– but until you tell me which, it’s hard to know what you think. So please do.

It’s because in the same article you suggested even Cox didn’t really believe it, coupled with the section of “a Bayesian wants everyone else to be non-Bayesian” that made the article ripe for philosophical deconstruction. Can you see how these 3 pieces are in some tension?