Head of the Methodology and Statistics Group,
Competence Center for Methodology and Statistics (CCMS), Luxembourg
This paradox is clearly inspired by and in a sense is just another form of Philip Dawid’s selection paradox. See my paper in The American Statistician for a discussion of this. However, I rather like this concrete example of it.
Imagine that you are about to carry out a Bayesian analysis of a new treatment for rheumatism. However, just to avoid various complications I am going to assume that you are looking at a potential side effect of the treatment. I am going to take the effect on diastolic blood pressure (DBP) as the example of a side-effect one might look at.
Now, to be truly Bayesian I think that you ought to have a look at a long list of previous treatments for rheumatism but time is short and this is not always so easy. So instead you argue like this.
- I know from the results of the WHO Monica project that the standard deviation of DBP is about 11mmHg in a general population.
- I have no prior opinion as to whether anti-rheumatics as a class have a beneficial or harmful effect on DBP
- I think that large effects on DBP, whether harmful or beneficial, are rather improbable for a drug designed to treat rheumatism.
- I believe the data are approximately Normal
- I am going to use a conjugate prior for the effect of treatment with mean 0 and standard deviation = 4 mm Hg. This makes very large beneficial or harmful effects unlikely but still allows reasonable play for the data. This means that the prior variance is 16mgHg2 compared to a data variance I am expecting to be about 120 mmHg2. This means that as soon as I have treated 8 subjects the data mean variance should be smaller (about 15 mmHg2) that the prior mean and so I will actually be weighting the data more than the prior at that point. This seems about reasonable to me.
You can choose different figures if you want but here I am attempting to apply a standard Bayesian analysis in a reasonably honest manner.
A few days after you have decided on your prior distribution a colleague announces very excitedly that the collaborative data summary project they have been working on jointly sponsored by the FDA and the EMA and involving a huge data collection exercise going many years back into the archives of dozens of sponsors has now concluded. They are now in a position to make a statement about the distribution of true effects on DBP in anti-rheumatics (and indeed a host of other drugs) whether or not they came to market. (In doing this of course they have avoided the naïve error of using the observed variation amongst drugs, since different drugs will have been measured with different precision and none of them with infinite precision.) Would you like to make some use of this?
You now make a disturbing discovery. In the framework you set up you can make no use of this. This is because it is only (potentially) relevant to your prior probability distribution and although this prior probability distribution is not very informative about the new drug it is 100% informative about itself. There was no prior distribution for your prior distribution. As soon as you know that mean effects over all drugs, past possible and future are given by a N(0,16mmHg2) then in that case you can give the probability of the true effect of a random drug falling between any limits you like. Imagine the task of doing this empirically; you would need dozens if not hundreds and perhaps thousands of true effects of various pharmaceuticals using either a histogram or some sort of density estimation approach to get anywhere near what your Normal distribution says.
The problem is thus that prior distributions that are fairly uninformative about the given parameter we are trying to estimate are infinitely informative about themselves. To deal with that requires a higher level of the hierarchy and as Jack Good was wont to point out dealing with this honestly is harder than many suppose.
1. Dawid AP. Selection paradoxes of Bayesian inference. In Multivariate Analysis and its ApplicationsAnderson TW, Fang Ka-ta, Olkin I (eds), 1994.
2. Senn S. A note concerning a selection “Paradox” of Dawid’s. American Statistician 2008; 62: 206-210.
3. Good I.J. Good Thinking: The Foundations of Probability and its Applications. University of Minnesota Press: Minneapolis, 1983.
Stephen: thanks for this. I’m not familiar with this paradox, and not quite catching the meaning of the prior being 100% informative about itself.
Consider first the distribution of measured blood pressure ‘effects’ for individual patients. I am claiming that these have mean mu (where mu is unknown) with variance (approximately) 120mmHg squared. (If mu is zero the drug will have no effect on blood pressures, on average at least.) What can I say about mu? I can’t say what mu is for sure. If I knew, there would be nothing to learn aboute this drug. I am going to use the data to learn but I am also going to use some prior ‘information’. As prior information I am going to use a distribution of presumed effects of drugs..I regard mu as a random realisation from this population of effects opf drugs. This ‘prior’ distribution has mean tau=0 and variance gamma= 16mmHg squared. This variance expreses my prior uncertainty about mu. But what expresses my uncertainy about tau, or, for that matter, gamma? Nothing. Tau just is and so is gamma. I can’t learn about them.
Stephen: OK but if the prior distribution is uninformative about the given parameter we are trying to estimate, then why are priors deemed important background information by Bayesians? Would this new info override the initial conjugate prior?