* “Wonderful examples, but let’s not close our eyes,” * is David J. Hand’s apt title for his discussion of the recent special issue (Feb 2014) of Statistical Science called “Big Bayes Stories” (edited by Sharon McGrayne, Kerrie Mengersen and Christian Robert.) For your Saturday night/ weekend reading, here are excerpts from Hand, another discussant (Welsh), scattered remarks of mine, along with links to papers and background. I begin with David Hand:

[The papers in this collection] give examples of problems which are well-suited to being tackled using such methods, but one must not lose sight of the merits of having multiple different strategies and tools in one’s inferential armory.(Hand [1])_

…. But I have to ask, is the emphasis on ‘Bayesian’ necessary? That is, do we need further demonstrations aimed at promoting the merits of Bayesian methods? … The examples in this special issue were selected, firstly by the authors, who decided what to write about, and then, secondly, by the editors, in deciding the extent to which the articles conformed to their desiderata of being Bayesian success stories: that they ‘present actual data processing stories where a non-Bayesian solution would have failed or produced sub-optimal results.’ In a way I think this is unfortunate. I am certainly convinced of the power of Bayesian inference for tackling many problems, but the generality and power of the method is not really demonstrated by a collection specifically selected on the grounds that this approach works and others fail. To take just one example, choosing problems which would be difficult to attack using the Neyman-Pearson hypothesis testing strategy would not be a convincing demonstration of a weakness of that approach if those problems lay outside the class that that approach was designed to attack.

Hand goes on to make a philosophical assumption that might well be questioned by Bayesians:

One of the basic premises of science is that you must not select the data points which support your theory, discarding those which do not. In fact, on the contrary, one should test one’s theory by challenging it with tough problems or new observations. (This contrasts with political party rallies, where the candidates speak to a cheering audience of those who already support them.) So the fact that the articles in this collection provide wonderful stories illustrating the power of modern Bayesian methods is rather tarnished by the one-sidedness of the story.

This, of course, is the philosophical standpoint reflected in a severe or stringent testing philosophy, and it’s one that I heartily endorse. But it may be a mistake to assume it is universal: there’s an entirely distinct conception of confirmation as gathering data in order to support a position already held [2]. *I don’t mean this at all facetiously.* On the contrary, to suppose the editors of this issue share the testing conception is to implicitly suggest they are engaged in an exercise with questionable scientific standards (“tarnished by the one-sidedness of the story”). Recall my post on “who is allowed to cheat” and optional stopping with I.J. Good? It took some pondering for him to admit a different way of cashing out “allowed to cheat”. Likewise, wearing Bayesian glasses lets me take various Bayesian remarks as other than disingenuous. Hand goes on to offer a tantalizing suggestion:

Or perhaps, if one is going to have a collection of papers demonstrating the power of one particular inferential school, then, in the journalist spirit of balanced reporting, we should invite a series of similar issue containing articles which present actual data processing stories where a nonfrequentist / non-likelihood / non-[fill in your favourite school of inference] solution would have failed or produced sub-optimal results.

On the face of it, it sounds like a great idea! Sauce for the goose and all that….David Hand is courageous for even suggesting it (deserving an * honorary mention*!), and he’d be an excellent editor of such an imaginary, parallel journal issue. [Share potential names. See [3]] But if X = “a frequentist” approach, it becomes clear, on further thought, it actually wouldn’t make sense, and frequentists (or, as I prefer, error statisticians) wouldn’t wish to pursue such a thing. Besides, they wouldn’t be allowed– “Frequentist” seems to be some kind of an “F” word in statistics these days–and anyway Bayesian accounts have the latitude to mimic any solution post hoc, if they so desire; if they didn’t concur with the solution, they’d merely deny the claims to superior performance (as sought by the editors of any such imaginary, parallel, journal issue). [Yet, perhaps a good example of the kind of article that would work is Fraser’s “Is Bayes Posterior just Quick and Dirty Confidence?” in a 2011 issue of the same journal, also C. Robert’s discussion of Fraser’s quick and dirty confidence.]

Christian Robert explains that the goal was for “a collection of six-page vignettes that describe real cases in which Bayesian analysis has been the only way to crack a really important problem.” Papers should address the question: “Why couldn’t it be solved by other means? What were the shortcomings of other statistical solutions?” I’m not sure what criteria the special editors employed to judge that Bayesian methods were required. According to one of the contributors (Stone) it means the problem required subjective priors. [See Note 4] (I’m a bit surprised at the choice of name for the special issue. Incidentally, the “big” refers to the bigness of the problem, not big data. Not sure about “stories”.)

Yet scientific methods are supposed to be interconnected, fostering both interchecking via multiple lines of evidence as well as building on diverse strategies. I just read of a promising new technique that would allow a blood test to detect infectious prions (as in mad cow disease) in living animals—a first. This will be both scrutinized and built upon by multiple approaches in current prion research. Seeing how the new prion test works, those using other methods will *want* to avail themselves of the new Mad Cow test. Saying Bayesianism is *required, *by contrast, doesn’t obviously* *suggest that non-Bayesians would wish to go there.

Aside: Robert begins his description of the special issue: “Bayesian statistics is now endemic in many areas of scientific, business and social research”, but does he really mean endemic? (See [5])

All in all, I think Hand gives a strong, generous, positive endorsement, interspersed with some caveats and hesitations:

When presented with fragmentary evidence, for example, one should proceed with caution. In such circumstances, the opportunity for undetected selection bias is considerable. Assumptions about the missing data mechanism may be untestable, perhaps even unnoticed. Data can be missing only in the context of a larger model, and one might not have any idea about what model might be suitable.

Caution is voiced by another discussant, A. H. Welsh:

Another reason a model may be difficult to fit is that it does not describe the data. Forcing it to “fit”, for example by switching to a Bayesian analysis, may not be the best response. It is difficult to check complicated models,particularly hierarchical models with latent variables, measurement error,missing data etc but using an incorrect model may be a concern when the model proves difficult to fit.

Recall, in this connection, this post (on “When Bayesian Inference Shatters”.)

Do you know what would really have been impressive (in my judgement)? A special journal issue replete with articles identifying the most serious flaws, shortcomings, and problems in Bayesian applications; perhaps showing how non-Bayesian methods helped to pinpoint loopholes and improve solutions. Methodological progress is never so sure or so speedy as when subjected to severe criticism. I think people would stand up and really take notice to see Bayesians remove the rose-colored glasses for a bit. What do you think?

[Added 6/22: I see this is equivocal. I had meant that the criticism be self-criticism and that the Bayesians themselves would have vigorously brought out the problems. But mixing in constructive criticism from others would also be of value.]

Here’s some of the rest….

The editors emphasised that they were not looking for ‘argumentative rehashes of the Bayesian versus frequentist debate’. I can only commend them on that. On the other hand, times move on, ideas develop, and understanding deepens, so while ‘argumentative rehashes’ might not be desirable, re-examination from a more sophisticated perspective might be.

I couldn’t agree more as to the need for “a re-examination from a more sophisticated perspective”, and it’s a point very rarely articulated. I hear people quote Neyman and Pearson from like the first few months of exploring a brand new approach and overlook the 70 years of developments in the general frequentist, sampling or (as I prefer) error statistical domain of inference and modeling. ….

An interesting question, perhaps in part sociological, is why different scientific communities tend to favour different schools of inference. Astronomers favour Bayesian methods, particle physicists and psychologists seem to favour frequentist methods. Is there something about these different domains which makes them more amenable to attack by different approaches? In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world… …As an aside, there is also the question of what exactly is meant by ‘Bayesian’. Cox and Donnelly (2011, p144) remark that ‘the word Bayesian, however, became ever more widely used, sometimes representing a regression to the older usage of “flat” prior distributions supposedly representing initial ignorance, sometimes meaning models in which the parameters of interest are regarded as random variables and occasionally meaning little more than that the laws of probability are somewhere invoked.’

Yes that’s another thorny question that remains without a generally accepted answer. I’ve seen it used to simply mean the use of conditional probability anywhere, any time.

Turning to the papers themselves, the Bayesian approach to statistics, with its interpretation of parameters as random variables, has the merit of formulating everything in a consistent manner. Instead of trying to fit together objects of various different kinds, one merely has a single common type of brick to use, which certainly makes life easier.

What is this single brick? Managing to assess everything as a probability brick, when they actually have very different references, isn’t obviously better than recognizing and reporting the differences, possibly synthesizing in some other way. To end up with a remark by Welsh:

One motivation for doing a Bayesian analysis for this problem (and one that is commonly articulated) is that the event in question is unique so it is not meaningful to think about replications. This is not really convincing because hypothetical replications are hypothetical whether they are conceived of for an event that is extremely rare (and in the extreme happens once) or for events that occur frequently.

I concur with Welsh. The study of unique events and fixed hypotheses still involves general types of questions and theories under what I call a repertoire of background. [One might ask, if “the event in question is unique so it is not meaningful to think about replications,” then how does the methodology serve for replicable science?]

Please send any corrections to this draft (i).

**I invite comments, as always, and UPhils for guest blog posting (by July 15), if anyone is interested: error@vt.edu**

[1] The citations come from the Statistical Science posting of future articles (thus final corrected versions could differ), but I am also linking to the published discussion articles.

[2] As even Popper emphasized, even a certain degree of dogmatism has a role, to avoid rejecting a claim too soon. But this is intended to occur within an inquiry that is working hard to find flaws and weaknesses, else it falls far short of being scientific–*for Popper.*

[3] Fab frequentist “tales” (areas)?

[4] I never know whether requiring subjective priors means they required beliefs about weights of evidence, beliefs about frequencies, beliefs about beliefs, or something closer to Christian Robert’s idea that a prior “has nothing to do with ‘reality,’ it is a reference measure that is necessary for making probability statements” (2011, 317-18) in a comment on Don Fraser’s quick and dirty confidence paper.

[5] Endemic

- (of a disease or condition) regularly found among particular people or in a certain area; “areas where malaria is
**endemic**”

Denoting an area in which a particular disease is regularly found. -
(of a plant or animal) native or restricted to a certain country or area; “a marsupial
**endemic to**northeastern Australia”

Growing or existing in a certain place or region.

Not unless he means to say that “Bayesian statistics is now ‘a disease regularly found among particular people’ in many areas of scientific, business and social research”.

Perhaps it is a Freudian slip, I can’t think of an “e” word that might have been intended, not “epidemic”. Ubiquitous?

Christian Robert is French; his written English is excellent, but I have occasionally encountered similarly infelicitous wording choices in it.

I know, that’s why I didn’t make a big deal about it.But the sentence comes out funny with the endemic replacement.

By the way, do you speak French when in Quebec?

A bit. My French isn`t good enough for real conversations.

It’s true that work selected for this special issue may not have been subjected to the kind of severe criticism that yields speedy methodological progress, but at the very least, it does afford said criticism! Here we have six examples selected and published by enthusiastic promoters of Bayesian methods using the criterion that non-Bayesian solutions would fail or be sub-optimal. Now no one can say that the Bayesian camp has failed to put up or shut up.

These are your targets, non-Bayesians. Shred ’em if you can.

No Corey, these are not our targets. the discussants already pointed up their hesitations or disagreements, while retaining a polite, encouraging tone.

I haven’t read all the papers, but I did read the two discussed above. The self-criticism I described would have to have come from the authors.

I of course would first counter the “Big Stories” title with “Tall Tales.” Maybe for alliteration purposes, “Non-Testable Tall Tales,” alluding to the discussion we had here in Error Statistics and on Andrew Gelman’s blog a few months ago about the problems of doing statistics from just one realization of a random vector.

Another alliterative one might be “The Contagious Cult of Consistency.” In one of the quotes above, we again see the canard (can we work in that “C” word into the title too?) that Bayesian analysis is CONSISTENT. We’re seeing that word used a lot these days by the Bayesians, a contagion really, and yet I’ve never understood it. Presumably the claim is that inference methods based on the model P(X | theta) are ad hoc and thus inconsistent. Well, if so, why does throwing in an extra factor, P(theta) make it all consistent? The multiplicand in P(theta) x P(X | theta) is still ad hoc, thus still “inconsistent,” right?

Interesting to see mention of the concept of confirmation bias. Even better, how about definition bias? I really don’t think it’s fair that the article defines the term “Bayesian” so broadly. I looked only at the Raymond Carroll article, as I admire a lot of his work, and even then I only skimmed through it. But it seemed to me that all he had was a model like the old random effects ANOVA, or modern Hidden Markov Models. Why should that be considered “Bayesian”? Before someone refers me to Harville’s article in the current issue of the American Statistician, let me beat you to it and announce that yes, I’ve read it (or am in the midst of reading it), and I would make the same statements there. Seems that the Cult has become so strong, indeed so militant, that now everyone wants to be called a Bayesian–leading to my final alliteration, the Bayesian Bandwagon.

Norm: I was actually looking for titles for the imaginary, parallel frequentist journal as in [3], but titles for a more critical or self-critical journal are fine too.

But never mind that, your last remark is important and I’ve seen that too. I heard a talk by Gilbert Harmon (a Princeton philosopher) once on machine learning where he said, people like this word, “Bayesian” for some reason, so he threw it in.

Sticks and stones may break my bones

But words will never hurt me

Corey: I think words can be plenty hurtful, especially when associated with actions that are either career promoting or demoting, or even a climate that is inclusive or discriminatory. I may come back to this.

I had asked Norm for a title off the top of his head, (but I had in mind a title for the imaginary journal, along the lines of Note [3]) because he showed skill at such things in the past

Mayo: Of course words can be hurtful in general. I was just giving a stock response appropriate to the apparent level of discourse. (I was unaware that Norm’s comment was in response to a request for a snappy title that would presumably be followed by actual substantive content.)

Corey: Possibly statisticians–who are, after all experts–should consider administering one of those questionnaires that students and faculty are so familiar with regarding so-called “climate” issues.

Since I`m neither student nor faculty, I`m not familiar with these questionnaires.

Mayo:

I think that reactions such as yours (that, instead of merely celebrating their methods, Bayesians should be actively criticizing them) are an excellent sign. After all, it is not news that Bayesian methods can solve real applied problems. As David Hand wrote in the passage that you approvingly quote, “I am certainly convinced of the power of Bayesian inference for tackling many problems.”

I think this is a big step forward because as recently as 1997 (ok, that’s not so recent for many of our readers, but it’s recent for you and me), a prominent statistician published an article saying, “when big, real, tough problems need to be solved, there are no Bayesians.” That statement wasn’t true in 1997 and it isn’t true now, but it reflects a formerly influential attitude. In some ways, I think the recent articles that you discuss in your post above, are a response to those old-fashioned anti-Bayesian attitudes. You might be right that, from our current perspective, the relevance of the Bayesian approach for many serious applications is no longer a question, and we can and should move forward.

Andrew: When I take excerpts from someone it must not be supposed that I “approvingly quote” the claim. Were I to just cherry pick, I wouldn’t be giving an approximate sense of what’s being said, and I mention that “I think Hand gives a strong, generous, positive endorsement”, but within it are some different readings.

Were I to “deconstruct Hand’s comment” briefly (as I may do if 1-2 others indicate they will write a U-Phil on any aspect of this post by mid July 250-500 words) I would amplify some thorns in the roses. So here goes…

NOTES for a possible deconstruction: The papers in this collection, Hand observes, “give examples of problems which are well-suited to being tackled using such methods”. This, together with some other points, can be taken to mean: appealing to Bayesian methods are suited when either (a) there is an empirical prior ,or (b) there is “fragmentary evidence” where “one should proceed with caution. In such circumstances, the opportunity for undetected selection bias is considerable” and frequentists wouldn’t dare pretend it was a good thing to fill in the gaps with subjective hunches and act as if all is hunky dory!

Of course the activity of deconstruction can elicit very different interpretations. That’s why they’re deconstructions.

Aside from what he intimates about the success stories themselves, my deconstruction would continue, Hand brings out fact that the intended goal is itself “tarnished by the one-sidedness of the story” and the selection bias involved. Those who share his (and my) view of science could not be impressed with such a display (even granting that Bayesians hold a different view of science, as I propose).

My deconstruction might even go on to imagine that such a display sends a possible message about a different standard for work that is ‘politically correct’ (likely to be true in all fields, admittedly). Finally, my deconstruction might go on to say that if they had really wanted to promote Bayesian statistical science to those who hold a frequentist (error statistical) testing philosophy, they would have presented examples subjected to severe self-criticism via interrelated frequentist checks.

Deconstructions are supposed to bring out interpretations that are not explicitly intended, but might lurk implicitly below the surface. They require reading into, and, as I said, very different perspectives are possible. That’s their role, as a philosophical exercise. Any kind of analysis, not necessarily a deconstruction, of course, is welcome.

Any takers?

Sorry for not “contributing” to the hypothetical journal, Deborah. I didn’t really have any ideas for that, but did want to comment, given that Hand had thrown down the gauntlet with his collection of “Bayesian miracles.”

Interesting point about the guy who gratuitously added the word “Bayesian” to his research, just because of the growing popularity of Bayesian methods. A few years ago, I knew of an assistant professor who had not been a Bayesian when he was hired, but during a conversation in which senior faculty were enthusiastically endorsing Bayesian methods, the junior faculty member burst out with anxious statements,along the lines of “I’m a Bayesian too!” Maybe he had indeed already had such views earlier, but I found the incident to be scary, given the vulnerability of assistant professors.

Your comment that Bayesians view science diffferently than frequentists is interesting, because one of the criticisms I’ve always made about the Bayesian approach is that it is antithetical to science. Isn’t one of the central tenets of science to be impartial? Use of subjective priors (as before, when I speak of Bayesian estimators, I am referring only to the use of subjective priors) is by definition not impartial. Of course, use of Bayesian methods just to fit in socially (see above) isn’t scientific either.

Oddly, Sander’s point brings up another criticism I’ve always made of the Bayesian approach. Don’t have any data? No problem–just make up some! That’s one of the reasons I find Hand’s claim to have assembled a set of papers showing Bayesian “success stories” to be quite troubling.

And as I said, Hand’s changing the definition of “Bayesian” is dirty pool–not scientific, if you will.

Actually, I suspect the fact that the varying prevalence of Bayesian methods from one field to another is more due to “social” reasons. An influential person in some field starts using Bayesian methods, and then becomes a Pied Piper, with more and more people following him/her.

Norm: Well I’m very glad you did contribute, and I agree with everything you say except that Hand is not the editor who collected the success stories, he is a discussant. Your story about the vulnerable junior faculty member should be of concern to the field, it seems to me, especially as there are (some?many?) who move into different fields as grad students, seeing/fearing a divisive climate not found in related areas. I have only anecdotal (but direct) evidence.

It’s heartening to see Hand’s voice of pragmatic reason:

“one must not lose sight of the merits of having multiple different strategies and tools in one’s inferential armoury.”

I could not have said it better and am in complete accord with his and Gelman’s comments…

I will only attempt to extend their comments by answering a question Hand raises:

“Astronomers favour Bayesian methods, particle physicists and psychologists seem to favour frequentist methods. Is there something about these different domains which makes them more amenable to attack by different approaches?”

Yes, something obvious I think: For many astronomers (and most epidemiologists and social scientists) the bulk of pivotal data is observational, in the sense that the putative causes under study and the structure within which they and their putative effects unfold are not controlled by the investigator. Furthermore, key measurements are not only costly, but many observational uncertainty sources may remain uncontrolled and unmeasured regardless of effort and expense made to obtain them (nutrition and diet epidemiology is an extreme case of that, which is why “knowledge” generated by that specialty is so unreliable). This makes error distributions very speculative objects, often indistinguishable in practical terms from subjective priors.

In contrast, particle physics and psychology are built largely around tightly controlled experimental research in which error distributions can be deduced from the physical set-up and the established theory surrounding it, and thus have claim to and can be used for arguably objective calibration of inferences and decisions.

Sander: I promise you the astronomy community is not nearly as heavily applied Bayesian as hand’s remark seems to indicate. See the presentation of the BICEP2 results for a good example. Bayesian modeling has made inroads within the particle physics community, but they are noticeably smaller than in other physics sub-disciplines.

Section 1 of Tom Loredo’s paper http://arxiv.org/pdf/1208.3036.pdf gives a helpful short history of the adoption of Bayesian methods by physicists over the last few decades. There is a significant amount of calibration and instrument testing that goes on in making astronomical observations as well. The fact that the analysis which includes errors can be done in a frequentist framework does not at all preclude the possibility that a Bayesian can be done using the same data. The resistance to investigating new methods is understandable institutional inertia as much as anything else.

But to really answer this question, we would need to discuss the impacts of: proprietary vs public data, small analysis groups vs large projects, legacy code and cross pollination between groups. And that is a bit outside the scope of a blog comment. I guess I am trying to say I that “observational vs experimental” –> “Bayesian vs Frequentist” is neither obvious nor correct at least for physicists. It’s just too cute an explanation.

West: Thanks for the information, especially appreciated since the astronomy literature is far outside my research area.

Nonetheless, the quote of Hand to which I was replying is “Is there something about these different domains which makes them more amenable to attack by different approaches?” Note that he wrote “more amenable to attack by different approaches,” not that Bayes is more popular in them. So my reply stands, albeit it is based solely on my experience in health and medical science: In examining controlled experiments (like randomized trials) I and others have found little need for Bayesian methods, apart from use of weak priors to improve frequency properties; explanations offered for that non-need include that the data-generating process is relatively simple, and that randomization cuts connections to prior expectations about who gets treated.

In contrast, in purely observational settings the data-generating process is largely unknown and often incredibly complex, making frequentist inference as hypothetical (subjunctive) as Bayesian inference. The latter only takes on the modest goal of avoiding internal inconsistencies, rather than controlling error rates (which in these settings is like controlling cancer). NB: This is not saying that Bayesian inference rescues observational research; it is saying that observational reality can grind away the advantages of frequentist inference over Bayesian inference, since calibration cannot be guaranteed. This isn’t cute speculation, it’s a grim fact of life in my topic areas.

Still, it would not surprise me if this leveling of inference is weakened or inoperative in astronomy, where background theory accounts for much more variation and measurement is far more precise, bringing it more in line with the controlled-trial setting. The same might be said of classical outbreak investigation and control. My grim view stems from specializing in drug and device research, with forays into occupational and environmental research, where the magnitude of uncontrolled data-generation artifacts (not just noise, but bias) can rival the targeted effects and invalidate all formal statistical inferences.

Sander: While your explanation may work to explain the preferred inference in medical science and related fields, I do not believe it has the same explanatory power with regards to physics. As I know nothing about your particular area of expertise, I will happily defer to your assessment of it.

Hand starts the paragraph by reflecting, “An interesting question, perhaps in part sociological, is why different scientific communities tend to favour different schools of inference.” He then jumps immediately to an epistemological(?) explanation: that there must be something intrinsic to the nature of the inquiry that explains his observed differentiation. What about the sociology option?!?

And if the assertion that “Astronomers favour Bayesian methods, particle physicists … seem to favour frequentist methods” was incorrect and based on a small unrepresentative sample, wouldn’t that fact make the question “is there something about these different domains which makes them more amenable to attack by different approaches” moot? But ok, while Bayesian methods may not be dominant in astronomy, they do have a significantly higher profile there than in particle physics. From all that I know and have worked on in physics, there is little that makes one method generically preferable over another. Often a problem can be attacked in multiple ways and which is chosen depends on other factors, some of which I enumerated in my previous comment.

Maybe it’s my undergrad history background that makes the sociological far more palatable than the philosophical one. A better question, I think, is “what about the astronomy community compared to particle physics in the 70s (or 90s) made it a more acceptable to experiment with Bayesian analysis given that frequentist methods were the standard in both at the time?”

West: I’m happy to leave the sociological question for others to ponder, as I have no knowledge of the sociology of the physical sciences. It may well be very different from health and medicine (where there is far less math skill in general, and often thus willingness to simply fall in line with what is being approved or promoted by statisticians). Nonetheless, in sociology as in epidemiology, complex phenomena are multifactorial and explanations are not mutually exclusive. Thus I still see no reason to discount the possibility that the observational basis for astronomical observations vs. the lab experiment basis for physics is a contributing factor (as it is in health and medical science) regardless of other factors.

I do want to thank you very much for the Loredo article – section 2 is a fine exposition targeting exactly the misconceptions I have to contend with constantly: The mistakes of

1) Equating variability with uncertainty – which seems to be a common adverse side effect of pure frequentist thinking, and can moderated by switching to a Bayesian perspective and then back again to frequentist analysis;

2) Claiming Bayesian computation is harder than frequentist computation – which, these days, neo-Bayesians promulgate (by pushing MCMC, or “Bayesian cocaine” as some call it) more than frequentists do; and

3) Thinking that good Bayesian analysis is just frequentist analysis with priors tossed in – it takes a lot more contextual study and work to get a good context-summarizing prior than to get a prior that improves frequency behavior (the latter are often available off the shelf).

So Loredo will be useful as a cite, even though his illustrations will be lost on my audience. I have addressed his points here and there in my own writings, but not pulled them together as he does – I suppose I (or someone) should write a parallel piece for my field.

Sander: Glad to be of assistance with the Loredo piece. It’s also nice to know that researchers in other disciplines have similar frustrations.

Sander; I agree with Norm on the Pied Piper effect, but when it comes to psych, my speculation is that they’re in so much trouble with train wrecks and ‘repligates’ that they dare not pile on with yet more flexibility.

The pied-piper effect is an interesting speculative hypothesis (one that explains the rapid spread of significance testing in the wake of Fisher), but unless there is some scholarly study of that phenomenon in the statistics of these fields, or some other supportive data beyond an anecdote, it isn’t more than that. My hypothesis is speculative too, but does point at an easily checked difference between the fields (experimental vs observational), a difference I have found to matter crucially in my own work when deciding whether a Bayesian analysis might be worthwhile. That said, I agree about psych but would go further and point to it as an example of how adherence to a frequentist approach is not enough to save even an experimental science from human frailty.

Nor from pseudoscience.

Andrew; thank you for the link to the Breiman article. I wonder if he’d be convinced of the success stories in the “Big Bayes” issue. I honestly don’t know. I am interested to see him voice a matter that you and I have often discussed: the entry of background.

http://www.stat.columbia.edu/~gelman/stuff_for_blog/breiman.pdf

“The Bayesian claim that priors are the only (or best) way to incorporate domain knowledge into the algorithms is simply not true. Domain knowledge is often incorporated into the structure of the methods used.” (p. 22) Examples from machine learning follow.

His remarks about the distinction between a method’s use of probabilistic models and its being Bayesian also still stand, and helps to avoid some confusions.There’s much else of interest in this article and I need to look up the associated contributions when I can.

NOTE ON DRAFT (ii). [Added 6/22: I see this is equivocal. I had meant that the criticism be self-criticism and that the Bayesians themselves would have vigorously brought out the problems. But mixing in constructive criticism from others would also be of value.]

The fanatic is at it again. I wouldn’t know except that someone who keeps up with statsblogs glimpsed his titles, and thought I’d want to know. The man is fixated, driven, desperately seeking to convince someone, anyone, that the death of frequentist statistics, and especially error statistics is “inevitable”—the same line used by Howson and Urbach, and other Bayesian philosophers who warned, cajoled, pressured some of us years ago. (I don’t see Howson, Urbach working in this area at all any more.) Actually predicting the inevitable total dominance of Bayesianism is subdued for this fellow, as compared to his more usual “stake through the heart” invective e.g., of June 15. I’ve already discovered long ago that discussion with him is unconstructive in the extreme. Too bad.

What is inevitable is that error statistical reasoning will live on just so long as there is scientific progress and so long as genuine inquiry is not squelched. (I agree with C.S. Peirce, Popper and others). Keeping science truly alive takes vigilance and is not guaranteed.

(That Bayesians often appear to rely on a different notion of science is discussed in this blogpost.) Anyway, this extremist should try to understand the error statistical philosophy instead of piling on with misdirected, abusive punching and put-downs. Oh, did I mention he was part of the cult of J? That’s the style of unphilosophical rhetoric that J teaches his followers I’m afraid.

To refer to his latest, which as a rule I do not do, and hopefully, won’t do again, the notion of “severity” goes beyond formal statistics to include any case of scientific inference, so I don’t limit the “accordance” or fit measure to statistical ones. When it comes to satisfying severity, the “accordance” notion is already taken care of by dint of the SEV principle.

http://www.statsblogs.com/2014/06/23/mayos-error-statistics-as-a-case-study-in-the-inevitability-of-b ayes/

Corey, in defending me, rightly mentions my 3 steps of tests echoing that of E. Pearson’s, captured by sensible test statistics (e.g., http://errorstatistics.com/2013/08/13/blogging-e-s-pearsons-statistical-philosophy/ )be they likelihood ratios or, for that matter, posterior probabilities.

It’s important to see that satisfying severity differs from applying severity.

For ex. if a high posterior for not-H is very probable under H, the severe tester would deny that the high posterior provided an evidential warrant for not-H.

The inability to control error probabilities results from accepting the likelihood principle (unless it is severely restricted). However, we would not be prevented from applying SEV. For instance, “applying” it to the comic howler of taking an event that is improbable, but has nothing to do with H, as evidence for not-H would fail even the “accordance” requirement. Informally: it would not be improbable because of H; formally, low power. These are corollaries of the severity requirement. This post also cites Fraser who gives other examples that, if put in my terms, would be problems reflected in lack of severity. http://errorstatistics.files.wordpress.com/2014/06/fraser-quick-and-dirty-discussion-c-robert.pdf

Well, I can’t discuss this further here, already too long. I do quite a bit on the blog and in my papers which I invite him to read.

Forecasting the complete demise of frequentist methodology seems a tad out of touch with most of applied stats, by which I mean the stat science of Box, Cox, Efron, and others since, who see roles for all types of statistical methodologies. To be sure there is huge variation in the frequency with which they would deploy Bayesian tools or concepts as part of an analysis, ranging from most of the time for Box to little of the time for Cox; and some of the time for the subsequent generation including Efron (and Gelman and me). Admittedly, one could make the correct technical note that (at least in so-called “regular” problems) realized frequentist intervals are limiting cases of Bayesian posterior intervals as certain priors degenerate to zero and others expand to infinity, making us always Bayesians whether we like it or not, although not in a very aware sense. Still, fanatics are those who declare that one or the other toolkit or way of thinking should never be used, as if they understand what is needed and works best in every conceivable application, or as if only one methodology can be deployed in a real scientific analysis. I see frequentist fanaticism on this blog (and thread) more than I see Bayesian fanaticism, although I suppose that is to balance an excess of Bayesian fanaticism seen elsewhere.

Sander: I strongly object to be branded as “being a Bayesian whether I like it or not”. For me the essence of frequentism is a certain interpretation of the concept of probabilities, and following from that a certain attitude to evaluating the performance of statistical methodology. This is indeed compatible with using Bayesian methods occasionally, but it is not a technicality in the first place and therefore it is rather irrelevant for it what techniques used by frequentist can be approximated in a Bayesian fashion. If the probability of a Bayesian credibility interval is interpreted the Bayesian way, it’s not frequentist. If it’s interpreted in a frequentist/error statistics way, it’s no longer Bayesian. Whether the numbers are (approximately) the same or not doesn’t matter in that respect.

I should probably add that I’m not a frequentist all the time and I occasionally use a Bayesian interpretation of probability. But I’d like to think that in these cases I do this consciously and as long as I do something else, I don’t want to be portrayed as “unconscious Bayesian”.

Christian: You seem to have taken my generic comment too personally, much as if someone objected personally if I had said “people are jealous creatures.” I did not have you in mind as I am not familiar with any of your real analyses. I was instead thinking of those who make much of the mathematical fact I recounted – which I think is worth making much of, for reasons I shall attempt to explain.

There seems to be some kind of disconnect on this blog from what I see in the actual scientific literature (at least in the medical literature): Widespread interpretation (including by many statisticians who would “know better” if pressed on the issue) of confidence intervals (CIs) as if they were posterior intervals (PIs). (NB: I did not say “misinterpretation”!) Cox and Hinkley commented similarly when discussing CIs in their 1974 book, in passing and without alarm (BTW they also discuss Birnbaum’s nontheorem in Ch. 2). Seeing a CI interpreted as if it were a PI in a given context, a constructive reaction is to ask: What conditions make this interpretation is roughly valid? This lead one to critically evaluate the interpretation in that context without reflexively branding it as “wrong”.

The general answer in regular cases is pretty simple, as I recounted and both Box and Cox knew. The devil is in the details of whether the required conditions make any contextual sense, and the answer will vary dramatically across context. The exercise provides a Bayesian diagnostic for the frequentist result. I can say that few CIs I see in my topic areas rate a Bayesian pass (there is no way they represent a coherent bet based on actually available information); this does not make them useless, except to fanatic Bayesians. I take this as one of Mayo’s key points, as it was Cox’s long before.

As Bayesian methods appear more frequently in applications, I also observe PIs being presented as CIs, sometimes even with the title “Bayesian confidence intervals”! Well, that raises the converse question: what conditions would make a PI a roughly valid CI of some sort? Again in regular cases that’s pretty simple to answer abstractly: The PIs become CIs for certain parameters under random-effects models with effects drawn from the prior distributions when the latter represent real random processes such as multilevel sampling. Again, the devil is in the details of whether the required conditions make any contextual sense. The exercise provides a frequentist diagnostic of the Bayesian results. I can say that few Bayesian results I see in my topic areas rate a frequentist pass (there is no way I would expect them to be well calibrated); this does not make them useless, except to fanatic frequentists. I take this as one of Robert’s key points, as it was Lindley’s long before.

In both cases, to cut oneself off from either direction of equivalence is to sacrifice an important critical diagnostic for asserted inferences. That making these observations seems to be treated as some kind of heresy by fanatics of both stripes is only a reflection of the fact that scientists are as human as everyone else. But as with any natural phenomenon, a scientist (or any curious person) will pursue the meaning of numerical convergences and divergences, not dismiss it (as it often is on this blog). A pragmatist can rationalize the pursuit as is a form of cross-validation. When divergence occurs, we need to pinpoint why and avoid confusing CIs and PIs; when convergence occurs, it warns us that both approaches are depending on what would be seen as contextually equivalent assumptions.

In convergent cases, it can happen that everyone is being misled by their own statistics, regardless of where they stand on the frequentist-Bayes spectrum. In these cases, critical input from outside the current spectrum will be essential to finding error. The problem I see in stat theory and philosophy is that it seems to operate as if there is no “outside” from which to make such critical observations; that strikes me as a recipe for blindness to certain types of error.

Sander: I agree that knowing about equivalence results and trying to find more is helpful, and also to think about what they mean. However, I don’t think that such results are very helpful regarding the fact that there are so many careless interpretations around. If somebody interprets a frequentist CI as posterior interval, I don’t think that the correct reaction to this is to assume that this person is right in some sense because of some more or less deep equivalence results the person is not aware of (as you know, in order to check whether this makes Bayesian sense, one would for example have to think about whether the prior is appropriate, which the person of course didn’t do). I rather think that the person didn’t care much and this should better be pointed out.

By the way, how am I supposed to know that the “we” in “we are always Bayesians whether we like it or not” doesn’t refer to all of us? It pretty much looked like it.

Christian: I don’t see where I said “the correct reaction to this is to assume that this person is right in some sense because of some more or less deep equivalence results the person is not aware of”. I said it is helpful to ask under what conditions a Bayesian interpretation would be correct; that applies regardless of what the writer had in mind or is right in some sense.

The impersonal subjunctive mood is crucial here in making the interpretation a matter of fact, not a matter of personal probability (as Senn once noted in commenting on my views): We can say “As a matter of fact, combining this data model and data with that prior Pr using a Bayes procedure produces the x% CI in question as roughly the x% PI.” In doing so one may or may not find the PI interpretation acceptable, depending on the context (DOC). If the implied (subjunctive) prior Pr turns out to conflict with available information (as often happens), so much the better to know that (and know that the original writer has been misled by treating the CI as a PI), just as it is good to know if the data model conflicts with that information (and thus has misled the writer about confidence as well as posterior probability). The contextual evaluation of these constructs may involve P-values and any other tools in our arsenal.

A converse analysis applies when seeing PI presented as CI; again, the idea is to see what it takes to make that correct in theory, not to accept that it is correct in fact. In doing so one may or may not find the CI interpretation is acceptable, DOC, and from an error-statistics perspective I certainly would want to know when it isn’t!

Again you seem to be reading much more personalistic elements into what I am recommending here than I ever intended. So I must ask for more precision in reading and interpreting what I wrote (and for that matter what anybody wrote including Neyman, Pearson, etc. as it seems to me a lot of fighting with straw men mars the foundational literature, on all sides).

Sander: I think this takes us back to the “duality” exchanges of exactly a year ago. http://errorstatistics.com/2013/06/26/why-i-am-not-a-dualist-in-the-sense-of-sander-greenland/

There’s much we agree on, but you speak of a non-problematic, breezy, easy fluency between the error statistical and Bayesian interpretations, e.g., we just ask “under what conditions a Bayesian interpretation would be correct”? that escapes some of us. Never mind translation between the perspectives, I don’t really know what even this question means or how I’d answer it. I think we’re still stuck there.

Mayo: Stuck we are, since the logical substrate of what I’m saying consists of some math facts which are pretty well known in the stat circles I’m in. In regular problems (a term with technical meaning; suffice to say it covers most of what I do and see in health, medical, and social-science stats) I can back-calculate to priors that make a CI an approximate PI; if those priors make no contextual sense (say, they make it likely that residential fields cause leukemia in all those exposed) then I can deduce that the PI interpretation of the CI also makes no contextual sense. Conversely, I can back-calculate to hierarchical sampling models that make a PI an approximate CI; if those sampling models make no contextual sense (e.g., they say field effects are being drawn from a uniform distribution) then I can deduce that the CI interpretation of the PI also makes no contextual sense. What do these exercises get me? A measure of how misleading is an interpretation of a CI as a PI (which I see often) or a PI as as CI (which I see more and more). This all seems clear to me, so I don’t know what you find problematic (as opposed to uninteresting, distasteful, or some other value divergence), especially since it in no way excludes your methodology.

You may find problematic however my agreement with those who think we can at once improve both error control and betting odds by trying to harmonize our CIs and PIs. Some have attempted to formalize this process with notions of calibrated Bayesianism and the like, adding a calibration axiom or strong repeated-sampling principle to the usual conditions for coherent betting. That runs into problems, as you would expect, but those can be addressed along the lines of the Bayes/nonBayes compromise Good wrote of. A very long story which I must leave aside for lack of time here…

I would however be remiss if I did not add (risking the boomerang of rhetorical flourish): whether the CI and PI are about the same or not, they are both junk if the observation model being used to deduce them is junk. And that model can be perilously close to junk in many applications in health and medical science; I mean not just the model for study data but also the meta-analytic model used for synthesis. Something to ponder the next time one is prescribed a pill to take the rest of life or is advised to go under the knife, at least if one is concerned about whether the risk-benefit information your physician gives you has any sound basis in data, evidence, or statistics.

“A measure of how misleading is an interpretation of a CI as a PI (which I see often) or a PI as as CI (which I see more and more).”

Except that misinterpretation is misleading regardless of how well the numbers look.

Sander: Not sure how many realise the personal health risk of problematic science.

As Xiao-Li Meng put is in his A trio of inference problems that could win you a Nobel Prize in statistics – “useful exercise to imagine ourselves in a situation where our statistical analysis would actually be used to decide the best treatment for a serious disease for a loved one or even for ourselves”

(I agree) You certainly do not want to discourage methods and thinking that might help people get things less wrong there.

Sander: OK, I was a bit polemic, as you had been before. I agree, it may be of some interest to do these analyses. Still, if I want a PI, I’d rather have a proper PI computed with a prior that I like, than to wonder whether somebody who tries to sell me a CI for a PI could accidentally have produced something that is fairly OK as a PI, and the other way round. And as information usually doesn’t come in form of a given prior distribution, problems with having a CI interpreted as a PI don’t only occur if “the data model is in conflict with the information” but also if it is unclear and ambiguous how to translate the available information into a prior, and I may rather not want to have any.

I have seen Bayesians elsewhere arguing that this-or-that CI amounts to a PI with prior X, and because X is not appropriate for some context-dependent reason, the CI is not valid either (which looks connected to something that you just wrote). I don’t buy that. This is not a valid argument if you think about the problem in a frequentist way, because then there is no such thing as a correct/appropriate or inappropriate prior. (I do admit though that I have learnt something about the model and the frequentist method from this exercise, although my conclusion is not the one the Bayesians wanted me to have.)

Christian: If I read your reply correctly we may not be in any logical disagreement, leaving the issue to how much value we find in particular exercises. That value being determined by quirks of personality and context (ours don’t overlap), among other things, I’m glad to see you conclude with “I do admit though that I have learnt something about the model and the frequentist method from this exercise”. That your “conclusion is not the one the Bayesians wanted me to have” only troubles me in this regard: It speaks as if “the Bayesians” is some singular entity, I fear too much like “the Moslems” or “the Christians”. I would not have objected had you written instead “certain Bayesians” or even (if in a polemic mood) “fanatic Bayesians”. How you took my own comment about “we are all Bayesians” shows that I can overlook appearances as well, and your reaction shows how appearances can be important in a charged atmosphere.

Now to be precise about the comment you made about frequentism: “This is not a valid argument if you think about the problem in a frequentist way, because then there is no such thing as a correct/appropriate or inappropriate prior.” I agree that not being anyone’s Bayesian PI does not invalidate a frequentist CI, since the latter’s validity is defined under the data-generating model alone (I seem to recall that was NPs rationale for introducing the concept). But for a frequentist seeking improved estimators and decisions in a real application, there are are appropriate and inappropriate priors to use as error-reducing devices in a given problem, e.g., to fully exploit the “Stein effect” (shrinkage) in estimating the mean of a multivariate-normal population, I want the prior form to be normal, not Cauchy or uniform. In practice, the same observation holds even for single-parameter inference, e.g., for estimating a population mean, a normal prior centered at zero may work fine for MSE reduction relative to the sample mean if its variance is large enough, but could be a disaster if its variance is too small, where “large enough” and “too small” require context to quantify. So for the frequentist who knows how to use priors to improve frequentist methods, there is indeed “appropriateness” to consider (just not in the sense of delivering Bayesian PIs). Do you disagree?

Christian: I could see no “reply” button to your latest reply, so I had to hit the one on mine (to which you replied). Anyway:

We are in complete agreement about MSE and priors.

You make the distinction I often see between between science on the one hand and business decision on the other, then seem to echo the common academic view that science can afford to wait for more data. In my research environment, the science/decision distinction is not so clear and big decisions are constantly on the table: Observations are incredibly expensive to obtain, and current findings are used to decide what to obtain next. This is the crucial “science business”, unavoidable in my field (and I suspect in most science today). In addition, in modern health and medical research, current findings are often immediately input to practice and policy decisions. In these settings it is not only natural but imperative to have multiple approaches to the data, ranging from effective summarization of data (by which I mean real data descriptors like tables of counts, not P-values or CIs) all the way to decision-theoretic analyses. Any strong priors in the latter will only appear arbitrary to the extent they are not accompanied by an explanation of how they capture important contextual information (as we agree, some weak priors may of course be rationalized for the improvement they bring to general frequency behavior).

Sander: By “the Bayesians” I just meant those who made the remarks I was referring to, not all Bayesians. I didn’t have over-generalization in mind, I just used the language sloppily.

“So for the frequentist who knows how to use priors to improve frequentist methods, there is indeed “appropriateness” to consider (just not in the sense of delivering Bayesian PIs). Do you disagree?”

No. I agree, and I have come across situations in practice where this was of some use. However, I’d be very careful about being guided by such reasoning too strongly in cases in which there is no real problem with frequentist methodology and one could just improve the MSE a bit by introducing a prior that upweights regions of the parameter space where we may expect the truth (normally for rather imprecise reasons) – I’d rather go for a slightly less precise method than doing something that requires a justification that can only be very imprecise. (In this respect, it is also different whether you give decision support to a business that wants to earn money but can’t wait for years until more evidence is in or whether you want to contribute to the scientific knowledge of humankind, which as a whole in the long run will have better ways of finding out things than that some individuals make up priors that may look arbitrary to others.)

Christian: I’m glad you bring up that style of argument in your second para, as it occurs all over the place, and I don’t think I’ve seen anyone question it–although it appears in the book I’m close to finishing….. Even sensible Box simply declares that since the priors you’d need for the confidence levels to be posteriors are problematic, the CI inference is problematic. It’s a wonderful little fallacy for we logicians.

P.S. Christian: I wonder if your responses to me involve an implicit feeling that I am Bayesian. So, let me reiterate that I am not, nor am I a frequentist (or likelihoodist, or whatever). I might not be offended if called a scientist, but that’s about it.

I’ve read the Bayesian scriptures (DeFinetti, Lindley Savage, Harold Jeffreys in stats, Howson, Urbach, Earman, Richard Jeffrey in philosophy) as well as frequentist counterparts, but I’ve had little direct contact with pure Bayesians. Having instead taken stats at UC Berkeley (including a course from Neyman) and then been on the UCLA faculty almost my entire career (where biostatistics was dominated by early Berkeley grads all the way through the 1990s), I was often subjected to frequentist extremism and was grateful for the ecumenical writings of Box and Good to invoke in my defense. Nonetheless, in my nominally ‘Bayesian methods for epi’ short course, I have since the 1990s always made a point to say that if I had been in the alternative universe where Bayesians dominated 20th-century stats, I would be giving a ‘frequentist methods for epi’ class instead (and the audience understands why I say that by the time I say it).

Also, for the 20 years I’ve been giving the course I’ve started by saying the controversy is like fighting about whether nails or screws are better for joining wood: any competent carpenter knows how and when to use one or both or neither, so ideological labels like “Bayesian” and “frequentist” are as silly as “hammerist” and “screwist”. I recently saw Larry Wasserman use that quip in his blog (with no indication of its source – which may precede me for all I know, as it seems so obvious) and it always gets knowing laughs from epi and stat audiences. But when I made the same remark at the Philosophy of Science meeting a few years ago it instead elicited gasps and even a few expressions of disbelief, as if I were espousing Christian ecumenicism in Germany during the Thirty-Year’s War. This made me appreciate how radically different were the environments in which my thinking and Mayo’s evolved (although a private skirmish I had with Howson in the early 1990s should have informed me).

Sander: I don’t know how this comment wound up in spam, sorry. I wanted to respond to it earlier, then didn’t see it. Ok, well I would have gotten a kick out of hearing the gasps of disbelief at that PhilSci meeting, and am glad maybe you believe me now. It isn’t as if there are more than a teeny tiny handful of what one might call Bayesian philosophers of statistics any more, or philosophers of stat of any brand any more. It’s just that philosophers inherited an old-fashioned view of “X knows that P” in terms of belief, so “uncertain” knowledge becomes degree of belief, and they’re big on Bayesian decision theory. There had been a very vital field of philo of stat that regularly included both philosophers and statisticians when I was starting out, I’m sure you know. My quick view of it: The frequentists were chased off, and philosophers of “confirmation” went back to Carnapian stuff with hints of “objective Bayesianism” thrown in, a bit of causal modeling, and lots of formal stuff on the logic of decision and belief functions. Thus, despite their promises to be “relevant to science”—and other fields of phil sci, e.g., biology, do interact –the work of philosophers of science, of confirmation and “formal epistemology”, with few exceptions, is quite distant from statistical science or statistical practice. For my take on PhilStat and PhilSci: http://www.rmm-journal.de/downloads/Article_Mayo.pdf

Sander: Don’t forget, too, as Wasserman points out, practitioners also tend to assume PIs have a confidence concept construal (don’t have the link). so it works in the other direction as well.

But your main point is exactly in sync with what I say in my post: “Yet scientific methods are supposed to be interconnected, fostering both interchecking via multiple lines of evidence as well as building on diverse strategies.”That’s the very idea that I’m suggesting is at least in tension with Christian Robert’s stated aim in seeking “cases in which Bayesian analysis has been the only way to crack a really important problem” and “couldn’t it be solved by other means”.

As for the alleged “disconnect” of this blog, I think your have a biased read in that I only send you a link of a certain type from time to time, not representative of the blog as a whole. Of course, it’s deliberately unified to include stat aspects of PhilSci, PhilStat, PhilLaw, junk science, etc.and it may well be a lone voice (to my knowledge) for “frequentists in (foundational) exile”.

“Don’t forget, too, as Wasserman points out, practitioners also tend to assume PIs have a confidence concept construal (don’t have the link). so it works in the other direction as well.”

Mayo: SG’s comment is entirely symmetrical in this regard. Did you skim it and miss that?

Corey: I didn’t miss it, I did notice he said about Bayesian PIs “there is no way I would expect them to be well calibrated”, but when I came to write my comment, the first part was the one I had more strongly in mind (I just saw it too on Briggs’ twitter), and it’s the one to which Sander gravitates–based on background knowledge. I wanted to emphasize not just that, sure, there’s also a frequentist long-run “performance” side (like acknowledging there are some who might worry if a 95% PI has 95% “coverage probability”), but that the corroboration or testing imperative may be primary for the case at hand–at least from the perspective of my philosophy. In other words, merely granting a performance concern is not yet to acknowledge the epistemic construal of error probabilities. Still I take the point, thanks.

Thanks Mayo…

You said “practitioners also tend to assume PIs have a confidence concept construal… so it works in the other direction as well.” But I said exactly that further down in my post. So, no disagreement at all on this point.

Re claims like “cases in which Bayesian analysis has been the only way to crack a really important problem” and “couldn’t it be solved by other means”. On the face of it, such claims sound naive at best (as if serious attempts by other means have been conducted, presented and judged to fail by the community at large). As usual, my inclination is to translate into something more credible, like “only tools of Bayesian mathematical form are currently available to apply in this case”, which may or may not be was meant, and which may or may not be true regardless of what was meant.

As for my “disconnect” comment, I am guilty as charged of overlooking the selection bias. I’ll add that links to other portions of this vast blog have taken me to some less charged portions, like your invaluable (for me) accounts of E.S. Pearson’s divergence from Neyman (who was my first and for many years my only source regarding NP foundations, and whose hostility to Fisher put me off of NP testing from the start).

Sander: Thanks, and please see my remark to Corey acknowledging that you also mention the flip side.

I’m glad that you found the E.S. Pearson material revelatory, and I hope you drop in regularly.

“error statistical reasoning will live on just so long as there is scientific progress and so long as genuine inquiry is not squelched. (I agree with C.S. Peirce”

I do think this is misquoting Peirce, “error statistical reasoning would live on if exhaustive inquiry did not find it deficient” would more in line with Peirce with even “error statistical reasoning will live on if exhaustive inquiry did not find it deficient” being more due to one of the kidnappers of pragmatism.

So no one knows or can know and that is why the productive ongoing discourse is required like Sander and Christian’s comments below which I can’t quite decide which to agree with.

Keith: Thanks for your comment, always glad to hear from a Peircean.

First, this was not given as a quote (in fact putting Peirce down was an afterthought). Second, most important, Peirce makes it clear how he view scientific induction (severe testing)… [I]nduction, for Peirce, is a matter of subjecting hypotheses to “the test of experiment” (7.182).

The process of testing it will consist, not in examining the facts, in order to see how well they accord with the hypothesis, but on the contrary in examining such of the probable consequences of the hypothesis … which would be very unlikely or surprising in case the hypothesis were not true. (7.231)

This sort of inference it is, from experiments testing predictions based on a hypothesis, that is alone properly entitled to be called induction. (7.206)

See the posts starting: http://errorstatistics.com/2013/09/10/peircean-induction-and-the-error-correcting-thesis-part-i/

My point here is very simple: not just any kind of criticism can be consistent with Peirce’s conception of enabling inquiry. For instance, if “authority” came along to banish a conception, it would not count as having been refuted (or found deficient, to use your word) on scientific grounds. Authority, of course, is the worst of all, for Peirce.

Note too that Peircean induction-as-severe-testing fares better than deduction in error correcting. http://errorstatistics.com/2013/09/10/part-2-peircean-induction-and-the-error-correcting-thesis/

And while we’re on Peirce, he’s clear:

“the Bayesian inverse probability calculation seems forced to rely on subjective probabilities for computing inverse inferences, but “subjective probabilities” Peirce charges “express nothing but the conformity of a new suggestion to our prepossessions, and these are the source of most of the errors into which man falls, and of all the worse of them” (2.777).

http://errorstatistics.com/2013/09/10/peircean-induction-and-the-error-correcting-thesis-part-i/

Here he’s describing that other view of confirmation I mention in my post. I recall philosopher Henry Kyburg expressing great surprise to hear Peirce using the concept of “subjective probabilities” which he thought came later. (Readers: Peirce is writing late 19th, very early 20th century.) I hadn’t thought of that.

It is too easy to be taken as using Peirce for authority rather than inspiration, I meant to stress Peirce’s view that he thinks he will be found wrong on everything (other than math, as Ramsey put it, the lesser logic of consistency) which would include much of scientific induction (as Ramsey put it, the larger logic of discovery). For instance in his quip ”it is good we die, otherwise we would live long enough to find out we were wrong about everything.” Given he has been dead for 100, he likely would be disappointed with the progress.

He was clear about priors based on no experience (indifference) but I do not think priors based on past experience (that try to represent it) that are subject to testing (open to rejection due to brute force surprise) or simply defaults would be a problem. After all, all that is of question, is how to best (temporarily) fix belief and I believe Sander is pointing to why we want both CIs and PIs to do that.

Keith: I was using Peirce as inspiration and not as authority;but then you said I’d misquoted him, so I explained the broader intuition underlying self-correction. I don’t think Peirce could spoze ever being found “wrong” as regards the self-correcting nature of inquiry….and whether to count this as analytic (I would not) will take us too far into philosophical subtleties.

Are you speaking of Peirce on priors?

Mayo: I was referring to myself in clarifying inspiration versus authority but you have clarified for me that you are using “error statistical reasoning” as an equivalent phrase to “self-correcting nature of inquiry”. So when people were using Bayesian techniques and discovered they had to modify their approach – they were using error statistical reasoning whether they knew it or not ;-)

Peirce I believe would have considered self-correcting nature of inquiry an unjustified assumption but one that was unavoidable or a regulative assumption. Also, my comments on priors and Peirce were simply to raise the possibility that he was only aware of priors as a means to represent no experience, indifference, no information as Stigler has suggested other interpretations were exceptional way back then. Priors represent different things are used differently these days and there probably needs to be more discussion papers on that topic.

Keith. No that’s not how I was using those terms; I gave (in an earlier comment) Peirce’s quote from which I obtain his conception the nature of induction, in his view,and it is severe testing. Now he’s very clear that this sort of inductive testing requires ability to control what we call error probabilities of the test process (trustworthiness of proceeding) , and thus, in turn, it requires taking into account features of data generation and hypothesis construction: predesignation and randomization. He loosens these requirements only when you could effectively show the necessary error probability control is attained.

I think you are incorrect to say: “Peirce I believe would have considered self-correcting nature of inquiry an unjustified assumption but one that was unavoidable or a regulative assumption” since he very explicitly considers it its ESSENTIAL property. I’m surprised you say that, but now we’d get into Peirce exegesis, which is not our topic.

> I’m surprised you say that

Perhaps just our _usual_ miscommunication.

Agree, it is it its ESSENTIAL property, so it must unavoidably be assumed true even if it cannot be justified.

The only topic I think might be relevant is the possibility that you are being more certain about Peirce’s writings than he was.

Keith You wrote: “Agree, it is it its ESSENTIAL property, so it must unavoidably be assumed true even if it cannot be justified.”

No , that’s not at all entailed by its being essential. Objects and process may have essential properties without any assumption of their truth much less their truth being unavoidable assumptions or any kind of assumptions. Back to Peirce’s case: Peirce does hold induction as severe testing to be justified.

A couple of people have alerted me to this, the same guy who was writing scathing criticisms of frequentism in general and severity in particular has apparently replaced his Entsophy blog with one cursing note (now with a cartoon). But I am removing the link.

Maybe someone who knows him should offer to help him.

An example of a severe indication of a claim H I just read: “It is highly, highly likely that [H] the aircraft was on autopilot, otherwise it could not have followed the orderly path that has been identified through the satellite sightings,” Australian Deputy Prime Minister. It I am calling it a mere indication, in and of itself,but my interest is in the reasoning.