G.A. Barnard’s 105th Birthday: The Bayesian “catch-all” factor: probability vs likelihood

barnard-1979-picture

G. A. Barnard: 23 Sept 1915-30 July, 2002

Yesterday was statistician George Barnard’s 105th birthday. To acknowledge it, I reblog an exchange between Barnard, Savage (and others) on likelihood vs probability. The exchange is from pp 79-84 (of what I call) “The Savage Forum” (Savage, 1962).[i] A portion appears on p. 420 of my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP). Six other posts on Barnard are linked below, including 2 guest posts, (Senn, Spanos); a play (pertaining to our first meeting), and a letter Barnard wrote to me in 1999. 

 ♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠♠

BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important.

SAVAGE: Surely, as you say, we cannot always enumerate hypotheses so completely as we like to think. The list can, however, always be completed by tacking on a catch-all ‘something else’. In principle, a person will have probabilities given ‘something else’ just as he has probabilities given other hypotheses. In practice, the probability of a specified datum given ‘something else’ is likely to be particularly vague­–an unpleasant reality. The probability of ‘something else’ is also meaningful of course, and usually, though perhaps poorly defined, it is definitely very small. Looking at things this way, I do not find probabilities unnormalizable, certainly not altogether unnormalizable.

Whether probability has an advantage over likelihood seems to me like the question whether volts have an advantage over amperes. The meaninglessness of a norm for likelihood is for me a symptom of the great difference between likelihood and probability. Since you question that symptom, I shall mention one or two others. …

On the more general aspect of the enumeration of all possible hypotheses, I certainly agree that the danger of losing serendipity by binding oneself to an over-rigid model is one against which we cannot be too alert. We must not pretend to have enumerated all the hypotheses in some simple and artificial enumeration that actually excludes some of them. The list can however be completed, as I have said, by adding a general ‘something else’ hypothesis, and this will be quite workable, provided you can tell yourself in good faith that ‘something else’ is rather improbable. The ‘something else’ hypothesis does not seem to make it any more meaningful to use likelihood for probability than to use volts for amperes.

Let us consider an example. Off hand, one might think it quite an acceptable scientific question to ask, ‘What is the melting point of californium?’ Such a question is, in effect, a list of alternatives that pretends to be exhaustive. But, even specifying which isotope of californium is referred to and the pressure at which the melting point is wanted, there are alternatives that the question tends to hide. It is possible that californium sublimates without melting or that it behaves like glass. Who dare say what other alternatives might obtain? An attempt to measure the melting point of californium might, if we are serendipitous, lead to more or less evidence that the concept of melting point is not directly applicable to it. Whether this happens or not, Bayes’s theorem will yield a posterior probability distribution for the melting point given that there really is one, based on the corresponding prior conditional probability and on the likelihood of the observed reading of the thermometer as a function of each possible melting point. Neither the prior probability that there is no melting point, nor the likelihood for the observed reading as a function of hypotheses alternative to that of the existence of a melting point enter the calculation. The distinction between likelihood and probability seems clear in this problem, as in any other.

BARNARD: Professor Savage says in effect, ‘add at the bottom of list H1, H2,…”something else”’. But what is the probability that a penny comes up heads given the hypothesis ‘something else’. We do not know. What one requires for this purpose is not just that there should be some hypotheses, but that they should enable you to compute probabilities for the data, and that requires very well defined hypotheses. For the purpose of applications, I do not think it is enough to consider only the conditional posterior distributions mentioned by Professor Savage.

LINDLEY: I am surprised at what seems to me an obvious red herring that Professor Barnard has drawn across the discussion of hypotheses. I would have thought that when one says this posterior distribution is such and such, all it means is that among the hypotheses that have been suggested the relevant probabilities are such and such; conditionally on the fact that there is nothing new, here is the posterior distribution. If somebody comes along tomorrow with a brilliant new hypothesis, well of course we bring it in.

BARTLETT: But you would be inconsistent because your prior probability would be zero one day and non-zero another.

LINDLEY: No, it is not zero. My prior probability for other hypotheses may be ε. All I am saying is that conditionally on the other 1 – ε, the distribution is as it is.

BARNARD: Yes, but your normalization factor is now determined by ε. Of course ε may be anything up to 1. Choice of letter has an emotional significance.

LINDLEY: I do not care what it is as long as it is not one.

BARNARD: In that event two things happen. One is that the normalization has gone west, and hence also this alleged advantage over likelihood. Secondly, you are not in a position to say that the posterior probability which you attach to an hypothesis from an experiment with these unspecified alternatives is in any way comparable with another probability attached to another hypothesis from another experiment with another set of possibly unspecified alternatives. This is the difficulty over likelihood. Likelihood in one class of experiments may not be comparable to likelihood from another class of experiments, because of differences of metric and all sorts of other differences. But I think that you are in exactly the same difficulty with conditional probabilities just because they are conditional on your having thought of a certain set of alternatives. It is not rational in other words. Suppose I come out with a probability of a third that the penny is unbiased, having considered a certain set of alternatives. Now I do another experiment on another penny and I come out of that case with the probability one third that it is unbiased, having considered yet another set of alternatives. There is no reason why I should agree or disagree in my final action or inference in the two cases. I can do one thing in one case and other in another, because they represent conditional probabilities leaving aside possibly different events.

LINDLEY: All probabilities are conditional.

BARNARD: I agree.

LINDLEY: If there are only conditional ones, what is the point at issue?

PROFESSOR E.S. PEARSON: I suggest that you start by knowing perfectly well that they are conditional and when you come to the answer you forget about it.

BARNARD: The difficulty is that you are suggesting the use of probability for inference, and this makes us able to compare different sets of evidence. Now you can only compare probabilities on different sets of evidence if those probabilities are conditional on the same set of assumptions. If they are not conditional on the same set of assumptions they are not necessarily in any way comparable.

LINDLEY: Yes, if this probability is a third conditional on that, and if a second probability is a third, conditional on something else, a third still means the same thing. I would be prepared to take my bets at 2 to 1.

BARNARD: Only if you knew that the condition was true, but you do not.

GOOD: Make a conditional bet.

BARNARD: You can make a conditional bet, but that is not what we are aiming at.

WINSTEN: You are making a cross comparison where you do not really want to, if you have got different sets of initial experiments. One does not want to be driven into a situation where one has to say that everything with a probability of a third has an equal degree of credence. I think this is what Professor Barnard has really said.

BARNARD: It seems to me that likelihood would tell you that you lay 2 to 1 in favour of H1 against H2, and the conditional probabilities would be exactly the same. Likelihood will not tell you what odds you should lay in favour of H1 as against the rest of the universe. Probability claims to do that, and it is the only thing that probability can do that likelihood cannot.

You can read the rest of pages 78-103 of the Savage Forum here.

 HAPPY BIRTHDAY GEORGE!

References

[i] Savage, L. (1962), “Discussion”, in The Foundations of Statistical Inference: A Discussion, (G. A. Barnard and D. R. Cox eds.), London: Methuen, 76.
 
 

 

Categories: Barnard, phil/history of stat, Statistics

Post navigation

10 thoughts on “G.A. Barnard’s 105th Birthday: The Bayesian “catch-all” factor: probability vs likelihood

  1. In 1982 I joined ICI in the UK as an applied statistician specialising in agricultural research. ICI at that time was partitioned into eight divisions which, in total, employed quite a lot of statisticians. Twice a year we attended the maths and stats panel meetings, in May and November. The first day featured a series of presentations on a theme and was open to anyone. This was followed by a dinner in the evening, and the panel itself, attended one member from each division, met the following day.

    George Barnard was a regular invitee to the presentations and the dinner, so I got to know him quite well. He would have been in his late 60s at the time and I remember him as a very nice man and very pleasant company.

  2. Michael J Lew

    I don’t understand why we need to worry about a “catch-all” hypothesis or the unknown possibilities in practice. The only way to calculate a likelihood is to combine the observations with a statistical model (a collection of mathematical and data assumptions). The only hypotheses available for consideration are those hypotheses that correspond to the possible values of the parameter(s) of interest within that statistical model.

    If the model is well specified then the hypothesis space is well defied and there is no room for a catch all. If the model is not sufficiently specified then the values that it generates will not be likelihoods.

    • Michael: I thought the whole reason likelihoods are appealing, certainly this is the case for Barnard, is that they do not require the exhaustive list of hypotheses and priors that the posterior requires. Isn’t that Barnards whole point, and the point of other likelihoodists? Else, why stop at the ‘half way house’, as Savage called it, and not go all Bayesian?

      • Michael J Lew

        I know that we’ve discussed this before, but you really should consider how it would even be possible to calculate a likelihood for a hypothesis that is not a point within the parameter space of the statistical model. It’s not possible. Any argument that ignores that mathematical reality is entirely moot no matter how interesting it might seem to historians and philosophers.

        It may be surprising that Barnard, Savage, Pearson, and the others did not realise the model-restricted existence of likelihoods, but the evidence shows that they missed it. They may be the heavyweight heroes of statistics, but there is no reason for us, today, to persist with their fundamental error.

        • Yes, the likelihoods are to points, the catchall-factor is something the likelihoodist avoids, but the Bayesian has to deal with. I understand what I think your point is regarding likelihoods, but it’s different from the catchall point that Barnard is making, right?

          • Michael J Lew

            Actually, a Bayesian avoids having to deal with the catch-all as well (within a model) because the likelihood function defies the space over which the posterior probabilities of hypotheses exist. If there is an interesting hypothesis that does not occur within the parameter space of the model, then the Bayesian (and likelihoodlum) will need to choose a model that allows such a hypothesis. Just as a frequentist would need to do.

            Barnard’s catch all point is ill-founded, as is the idea that we need to be concerned that there’s always a hypothesis that what happened had to happen. They do not occur within well-specified models and so they are outside of the scope of statistical inquiry.

            • Michael:
              I disagree, I think his point is fundamental. All hypotheses not already considered get probability 0 for the Bayesian. They they will always have probability 0. One has to start over. That’s Bayesian incoherent. But Barnard raises a serious problem even for getting to that point. The frequentist never has to exhaust the space of models but can ask any question about a given model and learn how to create new ones. There’s no summing up to a posterior.

              • Michael J Lew

                Nope. That’s not right. Any hypothesis that is not part of the parameter space does not get probability zero, it is simply undefined. It is necessarily unspecified by the model that generates the probabilities of all of the hypotheses that make up the parameter space.

                One does not have to start over with the model that fails to contain the hypothesis, but instead one has to start over by choosing a different model and that will entail different probabilities for the other hypotheses.

                There is no such thing as “Bayesian incoherent” across alternative models. The probabilities given by multiple models sum to more than unity but that is an irrelevancy because they are only constrained to unity _within_ a model.

                There is nothing special about how frequentists treat (or choose) their models and there is nothing special about how Bayesians do it. (Although I cannot resist the snide remark that both will often choose their models without appropriate consideration…)

                The Frequentist may not sum up to a posterior, but they do sum up to a decision, a statement, or an action. One approach is not intrinsically better than the other, it is only potentially more relevant to the inferential objectives of the analyst and the experimental or observational particulars.

                • You are disagreeing, then, with Barnard, Savage and M. Lew of the past. Duly recorded.
                  But what you say about frequentist error statisticians isn’t so, because we are not summing up a posterior probability. Comparativism is the way others get around the problem; error statisticians instead reject “probabilism” (as defined in SIST) and view statistical inference in terms of tests with error probability control.

                  • Michael J Lew

                    “You are disagreeing, then, with”… Well, no news there. I am quite explicit about the mistake of the elders, and I my opinions are updated as my understanding improves. (I am happy to say that discussions here have helped with that.)

                    I have two questions.

                    1. What exactly do you mean by “what you say about frequentist error statisticians isn’t so”? I ask this because I do not think that you have adequately dealt with the possibility (strong!) that I’m right about the lack of influence of hypotheses outside the parameter space.

                    2. What part of “Lew of the past” do you think that I’m disagreeing with? I have no doubt that I have held different opinions in the past, but my realisation that likelihoods cover the entire parameter space of a model occurred many years ago and I’ve been trying to make that point clear in comments on this blog on many occasions. The fact that posterior probabilities of hypotheses outside the parameter range are undefined rather than zero is a pretty obvious consequence and so I expect that I’ve not made statements to the contrary for many years.

Blog at WordPress.com.