“Wonderful examples, but let’s not close our eyes,” is David J. Hand’s apt title for his discussion of the recent special issue (Feb 2014) of Statistical Science called “Big Bayes Stories” (edited by Sharon McGrayne, Kerrie Mengersen and Christian Robert.) For your Saturday night/ weekend reading, here are excerpts from Hand, another discussant (Welsh), scattered remarks of mine, along with links to papers and background. I begin with David Hand:
[The papers in this collection] give examples of problems which are well-suited to being tackled using such methods, but one must not lose sight of the merits of having multiple different strategies and tools in one’s inferential armory.(Hand )_
…. But I have to ask, is the emphasis on ‘Bayesian’ necessary? That is, do we need further demonstrations aimed at promoting the merits of Bayesian methods? … The examples in this special issue were selected, firstly by the authors, who decided what to write about, and then, secondly, by the editors, in deciding the extent to which the articles conformed to their desiderata of being Bayesian success stories: that they ‘present actual data processing stories where a non-Bayesian solution would have failed or produced sub-optimal results.’ In a way I think this is unfortunate. I am certainly convinced of the power of Bayesian inference for tackling many problems, but the generality and power of the method is not really demonstrated by a collection specifically selected on the grounds that this approach works and others fail. To take just one example, choosing problems which would be difficult to attack using the Neyman-Pearson hypothesis testing strategy would not be a convincing demonstration of a weakness of that approach if those problems lay outside the class that that approach was designed to attack.
Hand goes on to make a philosophical assumption that might well be questioned by Bayesians:
One of the basic premises of science is that you must not select the data points which support your theory, discarding those which do not. In fact, on the contrary, one should test one’s theory by challenging it with tough problems or new observations. (This contrasts with political party rallies, where the candidates speak to a cheering audience of those who already support them.) So the fact that the articles in this collection provide wonderful stories illustrating the power of modern Bayesian methods is rather tarnished by the one-sidedness of the story.
This, of course, is the philosophical standpoint reflected in a severe or stringent testing philosophy, and it’s one that I heartily endorse. But it may be a mistake to assume it is universal: there’s an entirely distinct conception of confirmation as gathering data in order to support a position already held . I don’t mean this at all facetiously. On the contrary, to suppose the editors of this issue share the testing conception is to implicitly suggest they are engaged in an exercise with questionable scientific standards (“tarnished by the one-sidedness of the story”). Recall my post on “who is allowed to cheat” and optional stopping with I.J. Good? It took some pondering for him to admit a different way of cashing out “allowed to cheat”. Likewise, wearing Bayesian glasses lets me take various Bayesian remarks as other than disingenuous. Hand goes on to offer a tantalizing suggestion:
Or perhaps, if one is going to have a collection of papers demonstrating the power of one particular inferential school, then, in the journalist spirit of balanced reporting, we should invite a series of similar issue containing articles which present actual data processing stories where a nonfrequentist / non-likelihood / non-[fill in your favourite school of inference] solution would have failed or produced sub-optimal results.
On the face of it, it sounds like a great idea! Sauce for the goose and all that….David Hand is courageous for even suggesting it (deserving an honorary mention!), and he’d be an excellent editor of such an imaginary, parallel journal issue. [Share potential names. See ] But if X = “a frequentist” approach, it becomes clear, on further thought, it actually wouldn’t make sense, and frequentists (or, as I prefer, error statisticians) wouldn’t wish to pursue such a thing. Besides, they wouldn’t be allowed– “Frequentist” seems to be some kind of an “F” word in statistics these days–and anyway Bayesian accounts have the latitude to mimic any solution post hoc, if they so desire; if they didn’t concur with the solution, they’d merely deny the claims to superior performance (as sought by the editors of any such imaginary, parallel, journal issue). [Yet, perhaps a good example of the kind of article that would work is Fraser’s “Is Bayes Posterior just Quick and Dirty Confidence?” in a 2011 issue of the same journal, also C. Robert’s discussion of Fraser’s quick and dirty confidence.]
Christian Robert explains that the goal was for “a collection of six-page vignettes that describe real cases in which Bayesian analysis has been the only way to crack a really important problem.” Papers should address the question: “Why couldn’t it be solved by other means? What were the shortcomings of other statistical solutions?” I’m not sure what criteria the special editors employed to judge that Bayesian methods were required. According to one of the contributors (Stone) it means the problem required subjective priors. [See Note 4] (I’m a bit surprised at the choice of name for the special issue. Incidentally, the “big” refers to the bigness of the problem, not big data. Not sure about “stories”.)
Yet scientific methods are supposed to be interconnected, fostering both interchecking via multiple lines of evidence as well as building on diverse strategies. I just read of a promising new technique that would allow a blood test to detect infectious prions (as in mad cow disease) in living animals—a first. This will be both scrutinized and built upon by multiple approaches in current prion research. Seeing how the new prion test works, those using other methods will want to avail themselves of the new Mad Cow test. Saying Bayesianism is required, by contrast, doesn’t obviously suggest that non-Bayesians would wish to go there.
Aside: Robert begins his description of the special issue: “Bayesian statistics is now endemic in many areas of scientific, business and social research”, but does he really mean endemic? (See )
All in all, I think Hand gives a strong, generous, positive endorsement, interspersed with some caveats and hesitations:
When presented with fragmentary evidence, for example, one should proceed with caution. In such circumstances, the opportunity for undetected selection bias is considerable. Assumptions about the missing data mechanism may be untestable, perhaps even unnoticed. Data can be missing only in the context of a larger model, and one might not have any idea about what model might be suitable.
Caution is voiced by another discussant, A. H. Welsh:
Another reason a model may be difficult to fit is that it does not describe the data. Forcing it to “fit”, for example by switching to a Bayesian analysis, may not be the best response. It is difficult to check complicated models,particularly hierarchical models with latent variables, measurement error,missing data etc but using an incorrect model may be a concern when the model proves difficult to fit.
Recall, in this connection, this post (on “When Bayesian Inference Shatters”.)
Do you know what would really have been impressive (in my judgement)? A special journal issue replete with articles identifying the most serious flaws, shortcomings, and problems in Bayesian applications; perhaps showing how non-Bayesian methods helped to pinpoint loopholes and improve solutions. Methodological progress is never so sure or so speedy as when subjected to severe criticism. I think people would stand up and really take notice to see Bayesians remove the rose-colored glasses for a bit. What do you think?
[Added 6/22: I see this is equivocal. I had meant that the criticism be self-criticism and that the Bayesians themselves would have vigorously brought out the problems. But mixing in constructive criticism from others would also be of value.]
Here’s some of the rest….
The editors emphasised that they were not looking for ‘argumentative rehashes of the Bayesian versus frequentist debate’. I can only commend them on that. On the other hand, times move on, ideas develop, and understanding deepens, so while ‘argumentative rehashes’ might not be desirable, re-examination from a more sophisticated perspective might be.
I couldn’t agree more as to the need for “a re-examination from a more sophisticated perspective”, and it’s a point very rarely articulated. I hear people quote Neyman and Pearson from like the first few months of exploring a brand new approach and overlook the 70 years of developments in the general frequentist, sampling or (as I prefer) error statistical domain of inference and modeling. ….
An interesting question, perhaps in part sociological, is why different scientific communities tend to favour different schools of inference. Astronomers favour Bayesian methods, particle physicists and psychologists seem to favour frequentist methods. Is there something about these different domains which makes them more amenable to attack by different approaches? In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world… …As an aside, there is also the question of what exactly is meant by ‘Bayesian’. Cox and Donnelly (2011, p144) remark that ‘the word Bayesian, however, became ever more widely used, sometimes representing a regression to the older usage of “flat” prior distributions supposedly representing initial ignorance, sometimes meaning models in which the parameters of interest are regarded as random variables and occasionally meaning little more than that the laws of probability are somewhere invoked.’
Yes that’s another thorny question that remains without a generally accepted answer. I’ve seen it used to simply mean the use of conditional probability anywhere, any time.
Turning to the papers themselves, the Bayesian approach to statistics, with its interpretation of parameters as random variables, has the merit of formulating everything in a consistent manner. Instead of trying to fit together objects of various different kinds, one merely has a single common type of brick to use, which certainly makes life easier.
What is this single brick? Managing to assess everything as a probability brick, when they actually have very different references, isn’t obviously better than recognizing and reporting the differences, possibly synthesizing in some other way. To end up with a remark by Welsh:
One motivation for doing a Bayesian analysis for this problem (and one that is commonly articulated) is that the event in question is unique so it is not meaningful to think about replications. This is not really convincing because hypothetical replications are hypothetical whether they are conceived of for an event that is extremely rare (and in the extreme happens once) or for events that occur frequently.
I concur with Welsh. The study of unique events and fixed hypotheses still involves general types of questions and theories under what I call a repertoire of background. [One might ask, if “the event in question is unique so it is not meaningful to think about replications,” then how does the methodology serve for replicable science?]
Please send any corrections to this draft (i).
I invite comments, as always, and UPhils for guest blog posting (by July 15), if anyone is interested: firstname.lastname@example.org
 The citations come from the Statistical Science posting of future articles (thus final corrected versions could differ), but I am also linking to the published discussion articles.
 As even Popper emphasized, even a certain degree of dogmatism has a role, to avoid rejecting a claim too soon. But this is intended to occur within an inquiry that is working hard to find flaws and weaknesses, else it falls far short of being scientific–for Popper.
 Fab frequentist “tales” (areas)?
 I never know whether requiring subjective priors means they required beliefs about weights of evidence, beliefs about frequencies, beliefs about beliefs, or something closer to Christian Robert’s idea that a prior “has nothing to do with ‘reality,’ it is a reference measure that is necessary for making probability statements” (2011, 317-18) in a comment on Don Fraser’s quick and dirty confidence paper.
- (of a disease or condition) regularly found among particular people or in a certain area; “areas where malaria is endemic”
Denoting an area in which a particular disease is regularly found.
(of a plant or animal) native or restricted to a certain country or area; “a marsupial endemic to northeastern Australia”
Growing or existing in a certain place or region.