Today is George Barnard’s 101st birthday. In honor of this, I reblog an exchange between Barnard, Savage (and others) on likelihood vs probability. The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962).[i] Six other posts on Barnard are linked below: 2 are guest posts (Senn, Spanos); the other 4 include a play (pertaining to our first meeting), and a letter he wrote to me.
BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important.
SAVAGE: Surely, as you say, we cannot always enumerate hypotheses so completely as we like to think. The list can, however, always be completed by tacking on a catch-all ‘something else’. In principle, a person will have probabilities given ‘something else’ just as he has probabilities given other hypotheses. In practice, the probability of a specified datum given ‘something else’ is likely to be particularly vague–an unpleasant reality. The probability of ‘something else’ is also meaningful of course, and usually, though perhaps poorly defined, it is definitely very small. Looking at things this way, I do not find probabilities unnormalizable, certainly not altogether unnormalizable.
Whether probability has an advantage over likelihood seems to me like the question whether volts have an advantage over amperes. The meaninglessness of a norm for likelihood is for me a symptom of the great difference between likelihood and probability. Since you question that symptom, I shall mention one or two others. …
On the more general aspect of the enumeration of all possible hypotheses, I certainly agree that the danger of losing serendipity by binding oneself to an over-rigid model is one against which we cannot be too alert. We must not pretend to have enumerated all the hypotheses in some simple and artificial enumeration that actually excludes some of them. The list can however be completed, as I have said, by adding a general ‘something else’ hypothesis, and this will be quite workable, provided you can tell yourself in good faith that ‘something else’ is rather improbable. The ‘something else’ hypothesis does not seem to make it any more meaningful to use likelihood for probability than to use volts for amperes.
Let us consider an example. Off hand, one might think it quite an acceptable scientific question to ask, ‘What is the melting point of californium?’ Such a question is, in effect, a list of alternatives that pretends to be exhaustive. But, even specifying which isotope of californium is referred to and the pressure at which the melting point is wanted, there are alternatives that the question tends to hide. It is possible that californium sublimates without melting or that it behaves like glass. Who dare say what other alternatives might obtain? An attempt to measure the melting point of californium might, if we are serendipitous, lead to more or less evidence that the concept of melting point is not directly applicable to it. Whether this happens or not, Bayes’s theorem will yield a posterior probability distribution for the melting point given that there really is one, based on the corresponding prior conditional probability and on the likelihood of the observed reading of the thermometer as a function of each possible melting point. Neither the prior probability that there is no melting point, nor the likelihood for the observed reading as a function of hypotheses alternative to that of the existence of a melting point enter the calculation. The distinction between likelihood and probability seems clear in this problem, as in any other.
BARNARD: Professor Savage says in effect, ‘add at the bottom of list H1, H2,…”something else”’. But what is the probability that a penny comes up heads given the hypothesis ‘something else’. We do not know. What one requires for this purpose is not just that there should be some hypotheses, but that they should enable you to compute probabilities for the data, and that requires very well defined hypotheses. For the purpose of applications, I do not think it is enough to consider only the conditional posterior distributions mentioned by Professor Savage.
LINDLEY: I am surprised at what seems to me an obvious red herring that Professor Barnard has drawn across the discussion of hypotheses. I would have thought that when one says this posterior distribution is such and such, all it means is that among the hypotheses that have been suggested the relevant probabilities are such and such; conditionally on the fact that there is nothing new, here is the posterior distribution. If somebody comes along tomorrow with a brilliant new hypothesis, well of course we bring it in.
BARTLETT: But you would be inconsistent because your prior probability would be zero one day and non-zero another.
LINDLEY: No, it is not zero. My prior probability for other hypotheses may be ε. All I am saying is that conditionally on the other 1 – ε, the distribution is as it is.
BARNARD: Yes, but your normalization factor is now determined by ε. Of course ε may be anything up to 1. Choice of letter has an emotional significance.
LINDLEY: I do not care what it is as long as it is not one.
BARNARD: In that event two things happen. One is that the normalization has gone west, and hence also this alleged advantage over likelihood. Secondly, you are not in a position to say that the posterior probability which you attach to an hypothesis from an experiment with these unspecified alternatives is in any way comparable with another probability attached to another hypothesis from another experiment with another set of possibly unspecified alternatives. This is the difficulty over likelihood. Likelihood in one class of experiments may not be comparable to likelihood from another class of experiments, because of differences of metric and all sorts of other differences. But I think that you are in exactly the same difficulty with conditional probabilities just because they are conditional on your having thought of a certain set of alternatives. It is not rational in other words. Suppose I come out with a probability of a third that the penny is unbiased, having considered a certain set of alternatives. Now I do another experiment on another penny and I come out of that case with the probability one third that it is unbiased, having considered yet another set of alternatives. There is no reason why I should agree or disagree in my final action or inference in the two cases. I can do one thing in one case and other in another, because they represent conditional probabilities leaving aside possibly different events.
LINDLEY: All probabilities are conditional.
BARNARD: I agree.
LINDLEY: If there are only conditional ones, what is the point at issue?
PROFESSOR E.S. PEARSON: I suggest that you start by knowing perfectly well that they are conditional and when you come to the answer you forget about it.
BARNARD: The difficulty is that you are suggesting the use of probability for inference, and this makes us able to compare different sets of evidence. Now you can only compare probabilities on different sets of evidence if those probabilities are conditional on the same set of assumptions. If they are not conditional on the same set of assumptions they are not necessarily in any way comparable.
LINDLEY: Yes, if this probability is a third conditional on that, and if a second probability is a third, conditional on something else, a third still means the same thing. I would be prepared to take my bets at 2 to 1.
BARNARD: Only if you knew that the condition was true, but you do not.
GOOD: Make a conditional bet.
BARNARD: You can make a conditional bet, but that is not what we are aiming at.
WINSTEN: You are making a cross comparison where you do not really want to, if you have got different sets of initial experiments. One does not want to be driven into a situation where one has to say that everything with a probability of a third has an equal degree of credence. I think this is what Professor Barnard has really said.
BARNARD: It seems to me that likelihood would tell you that you lay 2 to 1 in favour of H1 against H2, and the conditional probabilities would be exactly the same. Likelihood will not tell you what odds you should lay in favour of H1 as against the rest of the universe. Probability claims to do that, and it is the only thing that probability can do that likelihood cannot.
You can read the rest of pages 78-103 of the Savage Forum here.
HAPPY BIRTHDAY GEORGE!
*Six other Barnard links on this blog:
Aris Spanos: Comment on the Barnard and Copas (2002) Empirical Example
Stephen Senn: On the (ir)relevance of stopping rules in meta-analysis
Posts by Mayo:
Barnard, Background Information, and Intentions
Statistical Theater of the Absurd: Stat on a Hot tin Roof
George Barnard’s 100th Birthday: We Need More Complexity and Coherence in Statistical Education
Letter from George Barnard on the Occasion of my Lakatos Award
Links to a scan of the entire Savage forum may be found at: https://errorstatistics.com/2013/04/06/who-is-allowed-to-cheat-i-j-good-and-that-after-dinner-comedy-hour/
This reminds me of many old discussions about the reference class problem, Fisher-Behrens, etc. It also came up in computational linguistics and language learnability with regard to “poverty of the stimulus” arguments and the innateness hypothesis.
The Peircean point would be that we cannot escape the abductive factor in statistical inference or scientific inquiry generally.
ON hsi 101st birthday I wanted to record three interactions I had with George Barnard and which I remember with great pleasure.
In 1986 I was fortunate enough with my Ciba-Geigy colleagues Amy Racine and Hugo Flűhler and my future PhD supervisor Adrian Smith to present a read paper to the Royal Statistical Society entitled “Bayesian methods in practice: experience in the pharmaceutical industry”. At the time it was the practice that the presenters of a read paper were invited to a dinner after the presentation and discussion by the Statistical Dinner Club. The tradition begun in 1839. The club still exists but no longer performs the same role. RA Fisher
I was seated opposite George Barnard at the dinner and he asked me why I had become interested in applying Bayesian approaches to pharmaceutical problems. One of the topics covered in our paper was estimation of the median lethal dose, the LD50, in animal toxicology studies. I told him that my motivation had begun because in many practical problems, including the estimation of the LD50, traditional approaches often did not provide sensible solutions. I then quoted RA Fisher.
Fisher had provided an appendix to a paper by Chester Bliss in 1935 giving for the first time a methods for determining the maximum likelihood estimates of the parameters of a probit model. Bliss had told the story of Fisher developing the method to account for those groups with 0 and 100 percent deaths because “When a biologist believes there is information in an observation, it is up to the statistician to get it out”. (The story is reminiscent of Feynman’s conclusion during the Challenger Disaster Commission that one shouldn’t ignore temperature data from shuttle flights in which there were no problems with the O-rings. That data were relevant rather like Sherlock Holmes’ “The Dog That Didn’t Bark”.)
George’s response was “Young man don’t quote Fisher at me!!”
My second interaction with George was in the correspondence columns of the Royal Statistical Society’s News & Notes during 1992. The exchange began with a self-important letter of mine responding to an Opinion piece by a future colleague, Nigel Smeeton, on Confidence Intervals.
AP Grieve Letter to the Editor of RSS News & Notes. July 1992. Confidence Intervals
Would Nigel Smeeton have us believe that we are likely to have greater success in explaining the concepts behind confidence intervals (CI) to clinicians than we have had in explaining the concepts that lie behind hypothesis tests and p-values? It is 43 years since the recognised beginning of the modern era in clinical statistics; 43 years since Bradford Hill was successful in introducing Fisherian ideas into medical research, yet 43 years in which clinicians have still not grasped that a p-value does not represent the probability that the null hypothesis is true. Is it our fault as a profession for not explaining, is it their fault for not understanding, or is it the fault of the concepts themselves?
The mental gymnastics necessary when the classical definition of a CI is accepted, is magnificently illustrated by an editorial in the Annals of Internal Medicine which deals with an estimate of the proportion of patients showing complete response to treatment for ovarian cancer (LE Braitman, Annals of Internal Medicine, 108, 296-298):
“. . the proper interpretation of confidence intervals requires that we consider a large number of hypothetical random samples (each of the same size). Then “95% confidence” means that approximately 95% of the 95% confidence intervals from these random samples would include the unknown true value, and about 5 % would not. Because the true fraction in the population is unknown, it is impossible to tell if the 95 % confidence interval of 28% to 55% that was obtained from the observed sample data actually included the true fraction. Strictly speaking, we cannot even tell how likely the 95% confidence interval of 28% to 55% is to include the unknown true fraction. Nevertheless, the usual interpretation is that we are 95 % confident that the unknown true value is between 28%) and 55%”.
Magnificent. But surely not logical. The clear conclusions which I draw from this passage, and I would not claim to be the first, are that the probability statements involved in CI’s are statements concerning the procedure of calculating the intervals and that the “usual interpretation”, although not supported by the CI procedure is precisely the form of statement that users of CI’s would like to make. Herein lies the problem. Classical statistics provides inferential statements which are, in my view, not in the form in which scientists wish to have them and it therefore seems to me that it is the concepts themselves which are at fault, as are we for not putting forward alternative concepts, which are available, and which do meet their needs. From my perspective, the recent campaign to supplant the p-value and to replace it by CI’s has been conducted in a conceptual vacuum. Whether we can successfully persuade clinicians to change their habits and to use CI‘s in preference to p-values, and I believe the campaign has been a success, begs the question as to whether we should. If we are unable to educate clinicians, then merely persuading them to use CI’s rather than p-values is to replace the unthinking use of one technique with that of another. Indeed, it is not at all clear that we will have achieved anything since one of the items of campaign propaganda was to point out that a CI was an inverted propaganda recognised that the p-value itself contains information not contained in the CI so that both are necessary.
Being a card-carrying Bayesian I view these machinations with detached amusement. But should I?
In the following edition of News & Notes George teased me and told me off.
GA Barnard Letter to the Editor of RSS News & Notes. August 1992. Confidence Intervals
Andrew Grieve may have missed a sale for the ICI book on Statistical Methods in Research and Production. He could have recommended that his Annals of Internal Medicine editorial writer read section 4.1 where, in the middle of page 59, he would find the terse, clear, and accurate sentence: “The limits (x ) ̅±3σ/√n are known as the 99.7 percent confidence limits for µ, and the confidence coefficient of 99.7 percent reflects the fact that, of every thousand such assertions we make, only three, on average, will be incorrect.” Confusing references to repeated random samples of the same size are quite unnecessary.
Andrew’s self-description as a card-carrying Bayesian prompts me to ask whether he routinely points out to his clients that none of the posterior probability statements he might suggest they make need be acceptable to anyone else, though they may share the client’s model and his data. Should he do so, I would be interested in the reactions especially of those who may have to deal with committees on the safety of medicines.
As the very grateful dedicatee of a book by a bevy of Bayesians I suppose I might call myself an honorary Bayesian. There are problems where we cannot do without Bayesian assumptions. In such cases we do well to bear in mind Student’ s view, expressed in a letter to Fisher dated 3rd April 1922: “When I was in the lab in 1907 I tried to work out variants of Bayes with a priori probabilities other than G=C [Editorial note: this means a uniform prior] but I soon convinced myself that with ordinary sized samples one’ s a priori hypothesis made a fool of the actual sample… and since then have refused to use any other hypothesis than the one that leads to your likelihood… Then each piece of evidence can be considered on its own merits.”
George A Barnard
In my reply I was able to use the same the source as support for my views.
AP Grieve Letter to the Editor of RSS News & Notes. September 1992. Confidence Intervals
George Barnard mildly admonishes me for not having read page 59 of the ICI publication Statistical Methods in Research and Production and for failing to recommend the confidence interval (CI) definition to be found there. I admit it. Unfortunately, I have to admit that I did not read page 81 either. Had I done so I would have found support for my position in the statement that:
‘. . there is an essential incongruity in attempting to apply frequency-ratio concepts of probability to the outcome of unique events; any probability measure in such circumstances can only describe the strength of belief, or the confidence with which we are prepared to make a particular assertion.’
We could of course argue about whether a clinical trial is a unique event. One might, for example, wish to imbed an individual trial in a series of trials with the same treatment and say therefore that it is not a unique event. Such a series of trials could form the basis of a meta-analysis/overview of the treatment. Within which series of meta-analyses would one wish to imbed that particular meta-analysis for the purpose of making probability statements?
Further support to my position is given by this extract from a footnote on page 81 where the authors are commenting on the indistinguishability of the CI for a normal mean from an integrated likelihood
approach, that is Bayes with an improper, uniform prior:
‘In this instance, however, it was also possible to attach a frequency ratio interpretation to the confidence coefficient by considering as the “event” , the making of an assertion and not the occurrence of a particular value in connection with any one assertion.’
Again emphasising that probability statements associated with CI’s concern the calculation procedure and not the particular results.
The second issue which George Barnard raises is crucial. David Spiegelhalter and Laurence Freedman have identified three groups of individuals, each with their own motivations, who interact during the complex development process which culminates in the implementation of a new medical treatment. They term these groups the experimenters, the reviewers and the consumers. The objective of the experimenters, among whom are pharmaceutical companies and research organisations, is to influence the consumers, who are the clinicians. They do this by providing them with information which is “sanitised” to ensure objectivity by the reviewers, who are the journal editors and regulatory authorities, whom Sir David Cox has called the “last holders of absolute power”. The statistician’s job is not over when the last analysis, Bayesian or not, is performed since consideration has to be given to the transmission of information to these different groups of “remote clients.”
The problem is to determine what is the appropriate method of transmitting information to remote clients. This issue is not new, in fact the term ”remote clients” comes from the title of a 1963 Econometrica paper by Clifford Hildreth in which he examines the difficulty of transmitting information to vaguely known clients, whose use of the information may extend long after the statistician’s work has been completed. He considers what parcels of information can be efficiently transmitted to remote clients and lists a number, among which are the data, the likelihood and posterior distributions for a series of representative prior distributions. I personally lean towards the last of these three parcels, but it may be that we need to consider providing more than one of the parcels. Indeed, at the LSE meeting on Ethical and Methodological Issues in Clinical Trials last year, I was surprised by the degree of unanimity among Bayesians and frequentists to the suggestion that in journal articles reporting on clinical trials the Results section should contain the data, or the likelihood, and that the Discussion was the proper place for the posterior analysis.
As far as what I would say to my clients goes, I think that it should be pointed out to them that they are not the only clients of the analysis, and that other more remote clients, with different perspectives and motivations, may well interpret the results differently, indeed it would be surprising if they did not. But I do not believe that it is solely a problem for a Bayesian.
My third interaction with George was at the last meeting of the ICI Mathematics and Statistics Panel, shortly after the above exchange, in November 1992. The ICI Panel had produced Statistical Methods in Research and Production, referred to above and had run for many years until the biological divisions of ICI demerged to form Zeneca in the early 1990s. I sat next to George at lunch and we had an interesting exchange on Bayes, Fisher, confidence, p-values and ….
In none of these exchanges did George talk down to me, nor denigrate my ideas. He was charming, instructive, supportive, everything one could wish from a senior member of the profession and teacher.