G.A. Barnard: The “catch-all” factor: probability vs likelihood

 

Barnard 23 Sept.1915 – 9 Aug.20

With continued acknowledgement of Barnard’s birthday on Friday, Sept.23, I reblog an exchange on catchall probabilities from the “The Savage Forum” (pp 79-84 Savage, 1962) with some new remarks.[i] 

 BARNARD:…Professor Savage, as I understand him, said earlier that a difference between likelihoods and probabilities was that probabilities would normalize because they integrate to one, whereas likelihoods will not. Now probabilities integrate to one only if all possibilities are taken into account. This requires in its application to the probability of hypotheses that we should be in a position to enumerate all possible hypotheses which might explain a given set of data. Now I think it is just not true that we ever can enumerate all possible hypotheses. … If this is so we ought to allow that in addition to the hypotheses that we really consider we should allow something that we had not thought of yet, and of course as soon as we do this we lose the normalizing factor of the probability, and from that point of view probability has no advantage over likelihood. This is my general point, that I think while I agree with a lot of the technical points, I would prefer that this is talked about in terms of likelihood rather than probability. I should like to ask what Professor Savage thinks about that, whether he thinks that the necessity to enumerate hypotheses exhaustively, is important.

SAVAGE: Surely, as you say, we cannot always enumerate hypotheses so completely as we like to think. The list can, however, always be completed by tacking on a catch-all ‘something else’. In principle, a person will have probabilities given ‘something else’ just as he has probabilities given other hypotheses. In practice, the probability of a specified datum given ‘something else’ is likely to be particularly vague­–an unpleasant reality. The probability of ‘something else’ is also meaningful of course, and usually, though perhaps poorly defined, it is definitely very small. Looking at things this way, I do not find probabilities unnormalizable, certainly not altogether unnormalizable.

Whether probability has an advantage over likelihood seems to me like the question whether volts have an advantage over amperes. The meaninglessness of a norm for likelihood is for me a symptom of the great difference between likelihood and probability. Since you question that symptom, I shall mention one or two others. …

On the more general aspect of the enumeration of all possible hypotheses, I certainly agree that the danger of losing serendipity by binding oneself to an over-rigid model is one against which we cannot be too alert. We must not pretend to have enumerated all the hypotheses in some simple and artificial enumeration that actually excludes some of them. The list can however be completed, as I have said, by adding a general ‘something else’ hypothesis, and this will be quite workable, provided you can tell yourself in good faith that ‘something else’ is rather improbable. The ‘something else’ hypothesis does not seem to make it any more meaningful to use likelihood for probability than to use volts for amperes.

Let us consider an example. Off hand, one might think it quite an acceptable scientific question to ask, ‘What is the melting point of californium?’ Such a question is, in effect, a list of alternatives that pretends to be exhaustive. But, even specifying which isotope of californium is referred to and the pressure at which the melting point is wanted, there are alternatives that the question tends to hide. It is possible that californium sublimates without melting or that it behaves like glass. Who dare say what other alternatives might obtain? An attempt to measure the melting point of californium might, if we are serendipitous, lead to more or less evidence that the concept of melting point is not directly applicable to it. Whether this happens or not, Bayes’s theorem will yield a posterior probability distribution for the melting point given that there really is one, based on the corresponding prior conditional probability and on the likelihood of the observed reading of the thermometer as a function of each possible melting point. Neither the prior probability that there is no melting point, nor the likelihood for the observed reading as a function of hypotheses alternative to that of the existence of a melting point enter the calculation. The distinction between likelihood and probability seems clear in this problem, as in any other.

BARNARD: Professor Savage says in effect, ‘add at the bottom of list H1, H2,…”something else”’. But what is the probability that a penny comes up heads given the hypothesis ‘something else’. We do not know. What one requires for this purpose is not just that there should be some hypotheses, but that they should enable you to compute probabilities for the data, and that requires very well defined hypotheses. For the purpose of applications, I do not think it is enough to consider only the conditional posterior distributions mentioned by Professor Savage.

LINDLEY: I am surprised at what seems to me an obvious red herring that Professor Barnard has drawn across the discussion of hypotheses. I would have thought that when one says this posterior distribution is such and such, all it means is that among the hypotheses that have been suggested the relevant probabilities are such and such; conditionally on the fact that there is nothing new, here is the posterior distribution. If somebody comes along tomorrow with a brilliant new hypothesis, well of course we bring it in.

BARTLETT: But you would be inconsistent because your prior probability would be zero one day and non-zero another.

LINDLEY: No, it is not zero. My prior probability for other hypotheses may be ε. All I am saying is that conditionally on the other 1 – ε, the distribution is as it is.

BARNARD: Yes, but your normalization factor is now determined by ε. Of course ε may be anything up to 1. Choice of letter has an emotional significance.

LINDLEY: I do not care what it is as long as it is not one.

BARNARD: In that event two things happen. One is that the normalization has gone west, and hence also this alleged advantage over likelihood. Secondly, you are not in a position to say that the posterior probability which you attach to an hypothesis from an experiment with these unspecified alternatives is in any way comparable with another probability attached to another hypothesis from another experiment with another set of possibly unspecified alternatives. This is the difficulty over likelihood. Likelihood in one class of experiments may not be comparable to likelihood from another class of experiments, because of differences of metric and all sorts of other differences. But I think that you are in exactly the same difficulty with conditional probabilities just because they are conditional on your having thought of a certain set of alternatives. It is not rational in other words. Suppose I come out with a probability of a third that the penny is unbiased, having considered a certain set of alternatives. Now I do another experiment on another penny and I come out of that case with the probability one third that it is unbiased, having considered yet another set of alternatives. There is no reason why I should agree or disagree in my final action or inference in the two cases. I can do one thing in one case and other in another, because they represent conditional probabilities leaving aside possibly different events.

LINDLEY: All probabilities are conditional.

BARNARD: I agree.

LINDLEY: If there are only conditional ones, what is the point at issue?

PROFESSOR E.S. PEARSON: I suggest that you start by knowing perfectly well that they are conditional and when you come to the answer you forget about it.

BARNARD: The difficulty is that you are suggesting the use of probability for inference, and this makes us able to compare different sets of evidence. Now you can only compare probabilities on different sets of evidence if those probabilities are conditional on the same set of assumptions. If they are not conditional on the same set of assumptions they are not necessarily in any way comparable.

LINDLEY: Yes, if this probability is a third conditional on that, and if a second probability is a third, conditional on something else, a third still means the same thing. I would be prepared to take my bets at 2 to 1.

BARNARD: Only if you knew that the condition was true, but you do not.

GOOD: Make a conditional bet.

BARNARD: You can make a conditional bet, but that is not what we are aiming at.

WINSTEN: You are making a cross comparison where you do not really want to, if you have got different sets of initial experiments. One does not want to be driven into a situation where one has to say that everything with a probability of a third has an equal degree of credence. I think this is what Professor Barnard has really said.

BARNARD: It seems to me that likelihood would tell you that you lay 2 to 1 in favour of H1 against H2, and the conditional probabilities would be exactly the same. Likelihood will not tell you what odds you should lay in favour of H1 as against the rest of the universe. Probability claims to do that, and it is the only thing that probability can do that likelihood cannot.

[i]Anyone who thinks we really want a Bayesian probability assignment to a hypothesis must come to grips with the fact that it depends on having a catchall factor-of all possible hypotheses that could explain the data-and the probability of data given “something else”.  Barnard is telling Savage this is unrealistic and when something new enters, the original probability assessments are wrong. In their attempts  to get the “catchall factor” to disappear, most probabilists appeal to comparative assessments–likelihood ratios or Bayes’ factors. So, Barnard was saying, back then, there’s no advantage to likelihoods. Several key problems remain: (i) the appraisal is always relative to the choice of alternative, and this allows “favoring” one or the other hypothesis, without being able to say there is evidence for either; (ii) although the hypotheses are not exhaustive, many give priors to the null and alternative that sum to 1 (iii) the ratios do not have the same evidential meaning in different cases (what’s high? 10, 50, 800?), and (iv) there’s a lack of control of the probability of misleading interpretations, except with predesignated point against point hypotheses or special cases (this is why Barnard later rejected the Likelihoodism). You can read the rest of pages 78-103 of the Savage Forum here. This exchange was first blogged here. Share your comments.

 References

Savage, L. (1962), “Discussion”, in The Foundations of Statistical Inference: A Discussion, (G. A. Barnard and D. R. Cox eds.), London: Methuen, 76.
 Links to a scan of the entire Savage forum may be found here.
Categories: Barnard, highly probable vs highly probed, phil/history of stat

Post navigation

6 thoughts on “G.A. Barnard: The “catch-all” factor: probability vs likelihood

  1. Michael Lew

    Statistical evaluations always take place within a defined statistical model, and within a statistical model the alternatives are fully defined so that there is no “something else” within the model. The possible existence of “something else” is therefore a problem related to model selection.

    A philosopher might become attached to hypotheses that are not expressible within a particular statistical model, or perhaps within any statistical model. Such hypotheses cannot be evaluated statistically. Barnard falls into error in the second sentence: “Now probabilities integrate to one only if all possibilities are taken into account.” If he meant, as I think he did, that all possible, thinkable, known and unknown possibilities have to be taken into account then he was wrong. The Bayesian posterior is a product of a statistical model and only the set of probabilities within that model need to be integrated for the probabilities to integrate to unity.

    It is disappointing that the greats of the past got confused about this, but they did. Lindley seems to have come closest to realising what is going on, but didn’t reach clarity. Is there a time when the concept of a statistical model was formalised?

    • Michael: Sure but you’re only agreeing with Barnard that in genuine scientific learning, when we seek brand new hypotheses, the Bayesian posterior that excludes it is no longer relevant. If one kept updating on the given model, one isn’t directed to the improved alternative. That’s why posterior probabilism fails. So, at that stage anyway, Barnard was going comparative likelihoodist, as I thought you recommend.
      Sorry to be slow in responding, I’m under it at the moment.

      • Michael Lew

        Sure, the posterior from a now irrelevant statistical model becomes irrelevant, but so does a severity function from that same irrelevant model.

        If the hypothesis in question is not a parameter value within a statistical model then it is immune to statistical interrogation, whether that interrogation is in a Bayesian form, a likelihood form, or an error rate form.

        • Michael: No that’s actually not true, though I should refer you to papers rather than try to answer it here. When we severely rule out, say, deflection effect < l, we rule out all theories–known and unknown–that would conflict. Our assessment is never of the entire theory (nor is it limited to a comparison though). We can infer, with severity, variants of a theory and in so doing rule out infinitely many theories not yet thought up. If you have to give a posterior to the entire theory, you need the catchall. If you give it a low prior, as Savage recommends, then it might take quite awhile before it gets big enough so that you're directed to start looking for new theories. What passes severely remains, even through reinterpretations of theory. We split off and test severely any piece we choose without needing a catchall–let alone a probability to it. With comparative accounts, you may say you avoid a catchall, but I don't see how you falsify: you can give a falsification rule: but would you ever dare to do so? Royall says you don't even obtain evidence against (or for) one hypothesis because the two being compared aren't exhaustive.

          I'm sorry this is quick since I have visitors all week, but underdetermination is a very complex topic:

          These discuss underdetermination

          Click to access Learning%20from%20Error%20Henle.pdf

          Click to access (1997)%20Severe%20Tests,%20Arguing%20from%20Error,%20and%20Methodological%20Underdetermination.pdf

          • Michael Lew

            Surely if you can “infer, with severity, variants of a theory and in so doing rule out infinitely many theories not yet thought up” you do so by recalculating the severity from the data in the new model that contains the newly thought up theories (hypotheses, I presume). If that is the case then I can do the same thing with a likelihood function to get a new Bayesian posterior.

            Can you give an example, please?

          • Michael Lew

            “With comparative accounts, you may say you avoid a catchall, but I don’t see how you falsify: you can give a falsification rule: but would you ever dare to do so?” Do I have to falsify, and if I do is it necessary that I falsify on the basis of a statistical result rather than a thoughtful and principled analysis of all of the available information (which is a super-set of the statistical result)?

Blog at WordPress.com.