Andrew Gelman, sent me an interesting note of his, “Ethics and the statistical use of prior information,”[i]. In section 3 he comments on some of David Cox’s remarks in a conversation we recorded:
“A Statistical Scientist Meets a Philosopher of Science: A Conversation between Sir David Cox and Deborah Mayo,“ published in Rationality, Markets and Morals [iii] (Section 2 has some remarks on L. Wasserman.)
This was a part of a highly informal, frank, and entirely unscripted conversation, with minimal editing from the tape-recording [ii]. It was first posted on this blog on Oct. 19, 2011. A related, earlier discussion on Gelman’s blog is here.
I want to open this for your informal comments ( “U-Phil”, ~750 words,by September
21 25)[iv]. (send to email@example.com)
Before I give my own “deconstruction” of Gelman on the relevant section, I will post a bit of background to the question of background. For starters, here’s the relevant portion of the conversation:
COX: Deborah, in some fields foundations do not seem very important, but we both think foundations of statistical inference are important; why do you think that is?
MAYO: I think because they ask about fundamental questions of evidence, inference, and probability. I don’t think that foundations of different fields are all alike; because in statistics we’re so intimately connected to the scientific interest in learning about the world, we invariably cross into philosophical questions about empirical knowledge and inductive inference.
COX: One aspect of it is that it forces us to say what it is that we really want to know when we analyze a situation statistically. Do we want to put in a lot of information external to the data, or as little as possible. It forces us to think about questions of that sort.
MAYO: But key questions, I think, are not so much a matter of putting in a lot or a little information. …What matters is the kind of information, and how to use it to learn. This gets to the question of how we manage to be so successful in learning about the world, despite knowledge gaps, uncertainties and errors. To me that’s one of the deepest questions and it’s the main one I care about. I don’t think a (deductive) Bayesian computation can adequately answer it.
COX: There’s a lot of talk about what used to be called inverse probability and is now called Bayesian theory. That represents at least two extremely different approaches. How do you see the two? Do you see them as part of a single whole? Or as very different?
MAYO: It’s hard to give a single answer, because of a degree of schizophrenia among many Bayesians. ….[I]n reality default Bayesians seem to want it both ways. They say: ‘All I’m trying to do is give you a prior to use if you don’t know anything. But of course if you do have prior information, by all means, put it in.’ It’s an exercise that lets them claim to be objective, while inviting you to put in degrees of belief, if you have them. …
COX: Yes, Fisher’s resolution of this issue in the context of the design of experiments was essentially that in designing an experiment you do have all sorts of prior information, and you use that to set up a good experimental design. Then when you come to analyze it, you do not use the prior information. In fact you have very clever ways of making sure that your analysis is valid even if the prior information is totally wrong. If you use the wrong prior information you just got an inefficient design, that’s all.
MAYO: What kind of prior, not prior probability?
COX: No, prior information, for example, a belief that certain situations are likely to give similar outcomes, or a belief that studying this effect is likely to be interesting. There would be informal reasons as to why that is the case that would come into the design, but it does not play any part in the analysis, in his view, and I think that is, on the whole, a very sound approach. Prior information is always there. It might be totally wrong but the investigator must believe something otherwise he or she he wouldn’t be studying the issue in the first place.
COX: There are situations where it is very clear that whatever a scientist or statistician might do privately in looking at data, when they present their information to the public or government department or whatever, they should absolutely not use prior information, because the prior opinions on some of these prickly issues of public policy can often be highly contentious with different people with strong and very conflicting views.
MAYO: But they should use existing knowledge.
COX: Knowledge yes. Prior knowledge will go into constructing the model in the first place or even asking the question or even finding it at all interesting. It’s not evidence that should be used if let’s say a group of surgeons claim we are very, very strongly convinced, maybe to probability 0.99, that this surgical procedure works and is good for patients, without inquiring where the 0.99 came from. It’s a very dangerous line of argument. But not unknown.
COX: Similar issues arise in public policy on education or criminology, or things like that. There are often very strong opinions expressed that if converted into prior probabilities would give different people very high prior probabilities to conflicting claims. That’s precisely what the scientist doesn’t want.
MAYO: Yes, I agree. …..
COX: I have often been connected with government decision-making. The idea that we would present people’s opinions unbacked by evidence would have been treated as ludicrous. We were there as scientists to supposedly provide objective information about the issue. Of course I know there is difficulty with the idea of total objectivity but at least it should connect with truth, to the goal of getting it right.
MAYO: The evidential report should be constrained by the world, by what is actually the case.
MAYO: I do find it striking that people could say with a straight face that we frequentists are not allowed to use any background information in using our methods. I have asked them to show me a book that says that, but they have not produced any. I don’t know if this is another one of those secrets shared only by the Bayesian Brotherhood.
COX: Well it’s totally ridiculous isn’t it.
MAYO: Then again, I suppose we don’t see statistical texts remedying this in a way that makes it conspicuous, that acknowledges this criticism and emphasizes that frequentists never advocated doing inference from a blank slate, but that you need to put together pieces, combine other tests and well-probed hypotheses. (We emphasize this in Cox and Mayo 2010.)
COX: Yes, you have to look at all the evidence but the main purpose of statistical analysis is to clarify what it is reasonable to learn from the specific set of limited data. It is a limited objective.
Read the full “conversation” in Rationality, Markets and Morals.
[i] From a series by Gelman on ethics in statistics (5th in a series): Ethics and the statistical use of prior information. Gelman has allowed me to link to this article here.
(ii) As recorded, June, 2011. (This is a small portion of a much longer (unpublished) “conversation”; while it began with me as interviewer, here David Cox demonstrates how to be an effective interlocutor).
[iii]RMM Vol. 2, 2011, 103–114: Special Topic: Statistical Science and Philosophy of Science Edited by Deborah G. Mayo, Aris Spanos and Kent W. Staley http://www.rmm-journal.de/ This special volume grew out of a conference I organized (with others) in June 2010 at the LSE.)
[iv] Some links to earlier U-Phils may be found here.
Ordinary comments are of course welcome.
“Then when you come to analyze it, you do not use the prior information. In fact you have very clever ways of making sure that your analysis is valid even if the prior information is totally wrong. If you use the wrong prior information you just got an inefficient design, that’s all.”
Curiously, the “luckiness” procedures discussed briefly on this blog by Peter Gruenwald can be described thus: you *can and do* make use of the prior information. In fact you have very clever ways of making sure that your analysis is valid even if the prior information is totally wrong. If you use the wrong prior information you just have a less powerful procedure, that’s all.
Just for my own enjoyment, I’m currently working out “luckiness” confidence interval procedures that minimize prior expected CI length. The math is quite fun — the optimization gives rise to a somewhat unusual problem in the calculus of variations.