I am posting two U-Phils I received in relation to the 9/12 call call on Andrew Gelman’s (2012): “Ethics and the statistical use of prior information”
A Deconstruction of Gelman by Mayo in 3 parts:
(10/5/12) Part 1: “A Bayesian wants everybody else to be a non-Bayesian”
(10/7/12) Part 2: Using prior Information
(10/9/12) Part 3: beauty and the background knowledge
Comments on “How should prior information enter in statistical inference”
Department of Statistical Science
University College London
Reading the blog entries on this topic, the Cox-Mayo Conversation and the linked paper by Gelman, I appreciate the valuable thoughts in both, which to me all make sense, specifying situations where prior information is rather not desired to enter, or rather in the Bayesian way.
Thinking more about the issue, however, I find both the frequentist and the Bayesian approach seriously wanting in this respect (and I don’t have a better one myself either).
A difference between the approaches seems to be that Cox/Mayo rather look at the analysis of data in an isolated situation whereas Gelman rather writes about conclusions from not only analysing a particular data set, but from aggregating all the information available.
Cox/Mayo do not advocate to ignore prior knowledge, but they prefer to keep it out of the process of actually analysing the data. Mayo talks of a piecemeal approach in which results from different data analyses can be put together in order to get an overall picture.
The problem is that the frequentists don’t have an in-built machinery for combining evidence. Think for example of a question like “Is improved street lightning good for reducing crime?” A local government may collect some data and hire a statistician to analyse them. There may be various
observational studies already existing, under varying side conditions, and lots of informal experience. There are related questions in opinion surveys, and there should be quite a bit of relevant information in studies that were originally set up for addressing different questions. So there is a lot of knowledge, but it is not unified and much of it is only marginally relevant. Still it is more than just an unfounded expert opinion (of which Cox is understandably wary to incorporate it into the analysis) to add to the results of the possibly rather limited amount of data that the local government has collected itself.
A frequentist can look at all results but doesn’t have a formalism to put them together. There is meta-analysis, but meta-analysis has a rather restricted scope; results need to be sufficiently standardised in order to aggregate them via meta-analysis.
The Bayesian approach looks superior at first sight. All different chunks of knowledge can be used in order to design an appropriate prior. Priors are versatile so that in principle all kinds of background information are allowed.
Unfortunately, only in the rarest cases does information come in a form that can be easily translated into a prior. The Bayesian approach tells the researcher what to do with the prior but not how to arrive at it. In the given situation, the Bayesian has probably as hard a time constructing the prior as the frequentist has afterwards, making some overall sense of all results.
Using subjective judgement to construct a prior in such a situation may indeed be useful if a real decision has to be made with a readily available loss function. The prior then is a tool to generate a decision out of a lot of imprecise information with dubious reliability, which still may be better than ignoring it. However, if no instantaneous decision is required, there is no strong reason to assign much authority to such a prior. Aggregating all available information into a prior will affect the information strongly, particularly by mixing it up with considerations regarding mathematically convenient priors (the existing information will rarely determine the parametric form of the prior, for example). I’d stay clear of this unless there is a strong positive reason for doing it.
The advantage of the frequentist approach according to Cox/Mayo is a neater separation of considerations regarding the design, the model choice and what is contributed by the data themselves (although a Bayesian can look at the likelihood alone, too, as Gelman mentions). As argued by Cox, this as well is what is desired in many practical situations. Following Mayo, at least it enables the results to be clearly presented together with assumptions and unresolved knowledge gaps.
One shouldn’t however be distracted from realising that frequentist tools for aggregating information are largely missing (using the information for study design only is something else than aggregation) and the Bayesian ones are rather weak when facing typically complex kinds of existing
information. I’m not sure whether this is a problem that can be solved in a better way, though.
Comments on “How should prior information enter in statistical inference”
Graduate School of Social Sciences and Humanities
I think Gelman commits two different types of fallacies. The first one is a category mistake; I don’t understand how he can say something like “prior information is stronger than the data.” How can these be compared? They are different types of entities; to me it sounds like comparing a hammer with the framing of a house…
It seems to me that the second fallacy is a simple fallacy of false dichotomy. The thinking seems to be like this: either you use background information in the form of a “prior” (informative or not) or you don’t (or can’t) use background information at all. I think it is a serious mistake to assume that these two options exhaust the possibilities in using background information in experimental research.
I’m currently working on the influence of traditional cognitive psychology on fMRI research. In this field, background knowledge (theoretical as well as experimental) coming from cognitive psychology is relied on to a great extent. In fMRI, most research questions studied are chosen on the basis of theoretical knowledge which has been provided by cognitive psychology. To give an example, when cognitive neuroscientists want to study different memory systems (or types) they mostly start from cognitive psychological theories of memory and rely on the distinctions these theories provide. For example, they investigate to see whether there are any neural differences between declarative versus procedural memory. Also, in order to investigate this question, they design their experiments on the basis of prior experimental knowledge, which, again, has accumulated in behavioral experiments done by cognitive psychologists. All these background theories and experimental knowledge help cognitive neuroscientists choose what to study and define their research programs. Perhaps more importantly, the background knowledge from cognitive psychology enables researchers to identify the ways in which they can study these topics by designing the correct type of cognitive stimuli and tasks as well as helping them identify and remedy possible flaws in their experimental designs. The repository of previous research on declarative and procedural memory is rich in well-proven means, in terms of tasks and stimuli, to study these cognitive phenomena of interest while minimizing or ruling out any experimental flaws and errors.
The crucial thing to note here is this: nowhere in these stages of cognitive neuroscientific research does anybody speak, or has a need, of “priors.” Even though there is no talk of a “prior”, the way in which fMRI studies build on background knowledge from cognitive psychology provides a fine example of how background knowledge can and should be used in experimental research. Without the background knowledge coming from behavioral cognitive psychology it would take decades longer to even start designing experiments in cognitive neuroscience. In a sense, each single fMRI study can be thought of as a hammer while the cognitive psychological background knowledge provides the framing for the house of cognitive neuroscience. This relates to the false dichotomy: who can say that fMRI researchers do not use prior knowledge because they don’t use it in the form of prior probabilities? The background theories of cognitive psychology are heavily relied on in cognitive neuroscience but not in the form of priors. In addition, it would be a category mistake to compare the results of one fMRI experiment with the whole of background knowledge coming from traditional cognitive psychology and to say one is “stronger” than the other…
Thank you Christian and Emrah, I will post my reactions to the U-Phils next time.
You can judge the relative importance of prior info and data by seeing how much they effect final conclusions. Both Bayesians and Frequentists do this.
It’s fair to say that every applier of statistics always utilizes background information other than a prior (if a prior is even used at all). No serious statistician, least of all Gelman, has ever denied this.