I am posting two U-Phils I received in relation to the 9/12 call call on Andrew Gelman’s (2012): “Ethics and the statistical use of prior information”
A Deconstruction of Gelman by Mayo in 3 parts:
(10/5/12) Part 1: “A Bayesian wants everybody else to be a non-Bayesian”
(10/7/12) Part 2: Using prior Information
(10/9/12) Part 3: beauty and the background knowledge
Comments on “How should prior information enter in statistical inference”
Christian Hennig
Department of Statistical Science
University College London
Reading the blog entries on this topic, the Cox-Mayo Conversation and the linked paper by Gelman, I appreciate the valuable thoughts in both, which to me all make sense, specifying situations where prior information is rather not desired to enter, or rather in the Bayesian way.
Thinking more about the issue, however, I find both the frequentist and the Bayesian approach seriously wanting in this respect (and I don’t have a better one myself either).
A difference between the approaches seems to be that Cox/Mayo rather look at the analysis of data in an isolated situation whereas Gelman rather writes about conclusions from not only analysing a particular data set, but from aggregating all the information available.
Cox/Mayo do not advocate to ignore prior knowledge, but they prefer to keep it out of the process of actually analysing the data. Mayo talks of a piecemeal approach in which results from different data analyses can be put together in order to get an overall picture.
The problem is that the frequentists don’t have an in-built machinery for combining evidence. Think for example of a question like “Is improved street lightning good for reducing crime?” A local government may collect some data and hire a statistician to analyse them. There may be various
observational studies already existing, under varying side conditions, and lots of informal experience. There are related questions in opinion surveys, and there should be quite a bit of relevant information in studies that were originally set up for addressing different questions. So there is a lot of knowledge, but it is not unified and much of it is only marginally relevant. Still it is more than just an unfounded expert opinion (of which Cox is understandably wary to incorporate it into the analysis) to add to the results of the possibly rather limited amount of data that the local government has collected itself.
A frequentist can look at all results but doesn’t have a formalism to put them together. There is meta-analysis, but meta-analysis has a rather restricted scope; results need to be sufficiently standardised in order to aggregate them via meta-analysis.
The Bayesian approach looks superior at first sight. All different chunks of knowledge can be used in order to design an appropriate prior. Priors are versatile so that in principle all kinds of background information are allowed.
Unfortunately, only in the rarest cases does information come in a form that can be easily translated into a prior. The Bayesian approach tells the researcher what to do with the prior but not how to arrive at it. In the given situation, the Bayesian has probably as hard a time constructing the prior as the frequentist has afterwards, making some overall sense of all results.
Using subjective judgement to construct a prior in such a situation may indeed be useful if a real decision has to be made with a readily available loss function. The prior then is a tool to generate a decision out of a lot of imprecise information with dubious reliability, which still may be better than ignoring it. However, if no instantaneous decision is required, there is no strong reason to assign much authority to such a prior. Aggregating all available information into a prior will affect the information strongly, particularly by mixing it up with considerations regarding mathematically convenient priors (the existing information will rarely determine the parametric form of the prior, for example). I’d stay clear of this unless there is a strong positive reason for doing it.
The advantage of the frequentist approach according to Cox/Mayo is a neater separation of considerations regarding the design, the model choice and what is contributed by the data themselves (although a Bayesian can look at the likelihood alone, too, as Gelman mentions). As argued by Cox, this as well is what is desired in many practical situations. Following Mayo, at least it enables the results to be clearly presented together with assumptions and unresolved knowledge gaps.
One shouldn’t however be distracted from realising that frequentist tools for aggregating information are largely missing (using the information for study design only is something else than aggregation) and the Bayesian ones are rather weak when facing typically complex kinds of existing
information. I’m not sure whether this is a problem that can be solved in a better way, though.
***
Comments on “How should prior information enter in statistical inference”
Emrah Aktunc,
Graduate School of Social Sciences and Humanities
Koç University
Istanbul, Turkey
I think Gelman commits two different types of fallacies. The first one is a category mistake; I don’t understand how he can say something like “prior information is stronger than the data.” How can these be compared? They are different types of entities; to me it sounds like comparing a hammer with the framing of a house…
It seems to me that the second fallacy is a simple fallacy of false dichotomy. The thinking seems to be like this: either you use background information in the form of a “prior” (informative or not) or you don’t (or can’t) use background information at all. I think it is a serious mistake to assume that these two options exhaust the possibilities in using background information in experimental research.
I’m currently working on the influence of traditional cognitive psychology on fMRI research. In this field, background knowledge (theoretical as well as experimental) coming from cognitive psychology is relied on to a great extent. In fMRI, most research questions studied are chosen on the basis of theoretical knowledge which has been provided by cognitive psychology. To give an example, when cognitive neuroscientists want to study different memory systems (or types) they mostly start from cognitive psychological theories of memory and rely on the distinctions these theories provide. For example, they investigate to see whether there are any neural differences between declarative versus procedural memory. Also, in order to investigate this question, they design their experiments on the basis of prior experimental knowledge, which, again, has accumulated in behavioral experiments done by cognitive psychologists. All these background theories and experimental knowledge help cognitive neuroscientists choose what to study and define their research programs. Perhaps more importantly, the background knowledge from cognitive psychology enables researchers to identify the ways in which they can study these topics by designing the correct type of cognitive stimuli and tasks as well as helping them identify and remedy possible flaws in their experimental designs. The repository of previous research on declarative and procedural memory is rich in well-proven means, in terms of tasks and stimuli, to study these cognitive phenomena of interest while minimizing or ruling out any experimental flaws and errors.
The crucial thing to note here is this: nowhere in these stages of cognitive neuroscientific research does anybody speak, or has a need, of “priors.” Even though there is no talk of a “prior”, the way in which fMRI studies build on background knowledge from cognitive psychology provides a fine example of how background knowledge can and should be used in experimental research. Without the background knowledge coming from behavioral cognitive psychology it would take decades longer to even start designing experiments in cognitive neuroscience. In a sense, each single fMRI study can be thought of as a hammer while the cognitive psychological background knowledge provides the framing for the house of cognitive neuroscience. This relates to the false dichotomy: who can say that fMRI researchers do not use prior knowledge because they don’t use it in the form of prior probabilities? The background theories of cognitive psychology are heavily relied on in cognitive neuroscience but not in the form of priors. In addition, it would be a category mistake to compare the results of one fMRI experiment with the whole of background knowledge coming from traditional cognitive psychology and to say one is “stronger” than the other…
Thank you Christian and Emrah, I will post my reactions to the U-Phils next time.
Emrah,
You can judge the relative importance of prior info and data by seeing how much they effect final conclusions. Both Bayesians and Frequentists do this.
It’s fair to say that every applier of statistics always utilizes background information other than a prior (if a prior is even used at all). No serious statistician, least of all Gelman, has ever denied this.
Guest: We regularly hear that the prior is the best way to summarize and incorporate prior information, and that not using a prior is a liability for non-Bayesians. If Bayesians deny this, they lose their main selling point.
Reader:
I do not think that in Bayesian Data Analysis we say that our method is “best”; rather, we demonstrate how to do our method (Bayesian data analysis = model building + inference + model checking), and we let the readers to decide from there what is best for them.
I don’t really care if you think that Bayesians are “losing their main selling point.” That might have been an issue 20 years ago, but it’s not such an issue now. I don’t need to sell Bayesianism; it sells itself. Which is good: I can devote my efforts to trying to understand it better (and to clear up misconceptions such as appear here).
Andrew:
Gelman: As I’ve been saying through many of my posts, I think you are most definitely an exception, but then again, I don’t know any other Bayesians who think that a Bayesian wants everybody else to be a non-Bayesian. Do you?
Like Reader, I too find Bayesian textbooks claiming that the use of Bayesian priors is the best way to incorporate background information (you don’t need me to give quotes).
You may not be selling that notion, but that’s what many think they’re buying. (Remember the Stephen Senn paper and discussion? https://errorstatistics.com/2012/01/15/mayo-philosophizes-on-stephen-senn-how-can-we-cultivate-senns-ability/).
I hope to hear from other Bayesians that they agree with the guest that “every applier of statistics always utilizes background information other than a prior (if a prior is even used at all)”.
I was responding to this by Emrah:
“The thinking seems to be like this: either you use background information in the form of a “prior” (informative or not) or you don’t (or can’t) use background information at all.”
As far as I know, no one has ever held this absurd position. It’s completely impossible to defend since there is always other information being used beyond a prior or data. Even the choice of variables to include, let alone the choice of sampling distribution is guided by such background info. Emrah is attacking (at length) a complete straw man.
Emrah:
1. You write, “nowhere in these stages of cognitive neuroscientific research does anybody speak, or has a need, of ‘priors.’” I had the impression that there are researchers in cognitive neuroscience who use priors. Just for laffs, I went to Google Scholar and searched on *”cognitive neuroscience” “prior distribution”*. There were 377 hits, including papers that had been cited 227 times, 416 times, 153 times, 271 times, 101 times, etc. But that’s ok, I guess they are working in a different stage of cognitive neuroscience than you are.
2. You write, “I don’t understand how [Gelman] can say something like ‘prior information is stronger than the data.’ How can these be compared? They are different types of entities…” I recommend you read chapter 2 of Bayesian Data Analysis. The concept of prior distributions as equivalent data is an old old idea, very standard in statistics. For example, if I have a prior distribution that a probability is 0.485 with a prior standard deviation of 0.005, this corresponds approximately to a certain Beta(alpha,beta) distribution, which in turn is roughly equivalent to an experiment with alpha+beta binary data points. If you haven’t heard of this, that’s fine, it just means you’re way ignorant of Bayesian statistics, and you’re probably not the right person to go around thinking you’re finding fallacies in my reasoning here.
3. You write, “The thinking seems to be like this: either you use background information in the form of a ‘prior’ (informative or not) or you don’t (or can’t) use background information at all.” As Guest notes, I never said this or anything close to this.
Here you’re committing the particularly irritating (to me) fallacy of disagreeing with something I never said, and then calling it a fallacy. That’s really obnoxious and I’d appreciate it you’d never do it again, to me or anybody else. Thank you.
Professor Gelman:
Honestly, I did not expect such an angry response to my criticisms but that’s all right, I guess we all feel strongly about our opinions and that is good. Below are my first responses.
1. Just because some researchers in cognitive neuroscience use priors does not mean they are using the correct methods. Of course, the same can be said about those using frequentist methods. Indeed, functional neuroimaging is still a new field and a lot more work has to be done to find out the most useful statistical methods for it. This is also a field in which it is especially easy to commit methodological errors due to its immense complexity. What really matters in this endeavor is how these errors can be identified and controlled for; and I think this is better done within the error-statistical context using the notions of error probabilities and severe tests. (You can see my dissertation on the details of how this can be done, you can easily find it if you google me 🙂
2. One thing that I wanted to call attention to is that, especially in our day and age, the notion of the use of background knowledge in experimental research is something much wider than only the question of whether priors should be used. Eg., in cognitive neuroscience, knowledge from cognitive psychology and neurobiology enters in the design of experiments as well as knowledge from physics and physiology becasue fMRI works on the bases of physics of magnetic resonance and physiology of hemodynamics. It appears to me that as we try to make sense of how all this affects cognitive neuroscientific experiments the notion of a prior cannot do justice to the complexity of the question. This complex picture can be dealt with in the frequentist error-statistical context using a hierarchy of models of inquiry (eg., primary models, experimental models, and data models) where different contributions to the design of an experiment can be identified and analyzed with respect to any errors they may introduce. The question of the use of background knowledge in experimental science is a wider question than the one on the use of priors and the error-statistical approach has the right conceptual machinery to tackle it.
3. In my comment, I said: “The thinking seems to be like this: either you use background information in the form of a “prior” (informative or not) or you don’t (or can’t) use background information at all.” This was the honest impression I got from your piece. As Mayo notes, I am not the only person who got this impression. You say that this is not what you think and of course I believe you. But instead of bashing me, perhaps a more constructive approach would be to go back to your own writing and revise it accordingly to prevent future readers from getting this impression.
Thank you.
Emrah:
You write, “I did not expect such an angry response to my criticisms but that’s all right, I guess we all feel strongly about our opinions and that is good.”
You still don’t seem to understand. I have no problem being criticized. But I don’t like being criticized for something I _didn’t_ say. Yes, that really really annoys me. I understand you made a mistake, that you quickly read my article and thought I said something that I didn’t actually say. I don’t think you were being malicious, I just think you made a mistake. But it was a mistake in which you attributed to me a particularly stupid attitude. That annoys me. I prefer to be criticized for what I say, not what I don’t say. It’s fine for you to state your position without being so quick to attribute fallacies to others.
Let me just say something to highlight what seems central to me:
The real question isn’t: Do Bayesians maintain that any and all relevant background information enters inference through prior probabilities? The answer we may agree is no. The real question is whether Bayesians maintain
(a) that the use of Bayesian priors is a good if not the best way to incorporate background information, and
(b) accounts that do not incorporate priors, e.g., ordinary frequentist significance tests, run into serious problems on account of not being able to do so.
Now Gelman may likely be an exception, but (a) and (b) are standard. Recall one of my ESP posts taken up precisely to discuss background knowledge (sept. 26, 2012).
https://errorstatistics.com/2012/09/26/levels-of-inquiry/
I have repeatedly read and been told, again and again, I can not avoid inferring evidence of ESP without computing a low prior degree of belief in ESP, and I can show texts. I have very patiently explained my response to the challenge here, the nature of the prior information that is really required, the repertoire of errors gleaned from magician observers, etc.
If there are Bayesians out there (other than Gelman) prepared to deny (a) or (b) I’d be very glad to hear from them (I haven’t awarded an honorable mention in awhile). Have I convinced no one?
Anyone using a Bayesian network incorporates some of the background knowledge in the directed acyclic graph, which is not a prior distribution.
I agree that simply assigning a low prior probability to ‘ESP’ when evaluating the results of an experiment is a pretty weak scientific approach. (Or have I misunderstood your complaint?)