“The subjective Bayesian theory as developed, for example, by Savage … cannot solve the deceptively simple but actually intractable old evidence problem, whence as a foundation for a logic of confirmation at any rate, it must be accounted a failure.” (Howson, (2017), p. 674)

What? Did the “old evidence” problem cause Colin Howson to recently abdicate his decades long position as a leading subjective Bayesian? It seems to have. I was so surprised to come across this in a recent perusal of* Philosophy of Science* that I wrote to him to check if it is really true. (It is.) I thought perhaps it was a different Colin Howson, or the son of the one who co-wrote 3 editions of Howson and Urbach: *Scientific Reasoning: The Bayesian Approach* espousing hard-line subjectivism since 1989.[1] I am not sure which of the several paradigms of non-subjective or default Bayesianism Howson endorses (he’d argued for years, convincingly, against any one of them), nor how he handles various criticisms (Kass and Wasserman 1996), I put that aside. Nor have I worked through his, rather complex, paper to the extent necessary, yet. What about the “old evidence” problem, made famous by Clark Glymour 1980? What is it?

Consider Jay Kadane, a well-known subjective Bayesian statistician. According to Kadane, the probability statement: Pr(d(** X**) ≥ 1.96) = .025

“is a statement about d(

) before it is observed. After it is observed, the event {d(X) ≥ 1.96} either happened or did not happen and hence has probability either one or zero” (2011, p. 439).X

Knowing d_{0}= 1.96, (the specific value of the test statistic d(** X**)), Kadane is saying, there’s no more

*uncertainty*about it.* But would he really give it probability 1? If the probability of the data

**is 1, Glymour argues, then Pr(**

*x***|**

*x**H*) also is 1, but then Pr(

*H*|

**) = Pr(**

*x**H*)Pr(

**|**

*x**H*)/Pr(

**) = Pr(**

*x**H*), so there is no boost in probability for a hypothesis or model arrived at after

**. So does that mean known data doesn’t supply evidence for**

*x**H*? (Known data are sometimes said to violate

*temporal novelty*: data are temporally novel only if the hypothesis or claim of interest came first.) If it’s got probability 1, this seems to be blocked. That’s the old evidence problem. Subjective Bayesianism is faced with the old evidence problem if known evidence has probability 1, or so the argument goes.

*What’s the accepted subjective Bayesian solution to this? * (I’m really asking.) One attempt is to subtract out, or try to, the fact that ** x** is known, and envision being in a context prior to knowing

**That’s not very satisfactory or realistic, in general. Subjective Bayesians in statistics, I assume, just use the likelihoods and don’t worry about this: known data are an instance of a general random variable**

*x.**, and you just use the likelihood once it’s known that {*

**X***=*

**X****}. But can you do this and also hold, with Kadane, that it’s an event with probability 1? I’ve always presumed that the problem was mainly for philosophers who want to assign probabilities to statements in a language, rather than focusing on random variables and their distributions, or statistical models (a mistake in my opinion). I also didn’t think subjective Bayesians in statistics were prepared to say, with Kadane, that an event has probability 1 after it’s observed or known. Yet if probability measures your uncertainty in the event, Kadane seems right.**

*x**So how does the problem of old evidence get solved by subjective Bayesian practitioners?*I asked Kadane years ago, but did not get a reply.

Any case where the data are known prior to constructing or selecting a hypothesis to accord with them, strictly speaking, would count as cases where data are known, or so it seems.** The most well known cases in philosophy allude to a known phenomenon, such as Mercury’s perihelion, as evidence for Einstein’s GTR. (The perihelion was long known as anomalous for Newton, yet GTR’s predicting it, without adjustments, is widely regarded as evidence for GTR.)[2] You can read some attempted treatments by philosophers in Howson’s paper; I discuss Garber’s attempt in Chapter 10, Mayo 1996 [EGEK], 10.2.[3] *I’d like to hear from readers, regardless of statistical persuasion, how it’s handled in practice *(or why it’s deemed unproblematic).

But wait, are we sure it isn’t also a problem for non-subjective or default Bayesians? In this paradigm (and there are several varieties), the prior probabilities in hypotheses are not taken to express degrees of belief but are given by various formal assignments, so as to have minimal impact on the posterior probability. Although the holy grail of finding “uninformative” default priors has been given up, default priors are at least supposed to ensure the data dominate in some sense.[4] A true blue subjective Bayesian like Kadane is unhappy with non-subjective priors. Rather than quantify prior beliefs, non-subjective priors are viewed as primitives or conventions or references for obtaining posterior probabilities. How are they to be interpreted? It’s not clear, but let’s put this aside to focus on the “old evidence” problem.

*OK, so how do subjective Bayesians get around the old evidence problem?*

*I thank Jay Kadane for noticing I used the inequality in my original post 11/27/17. I haven’t digested his general reaction yet, stay tuned.

**There’s a place where Glymour (or Glymour, Scheines, Spirtes, and Kelly 1987) slyly argues that, strictly speaking, the data are always known by the time you appraise some some model–or so I seem to recall. But I’d have to research that or ask him.

[1] I’ll have to add a footnote to my new book (*Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars*, CUP, 2018), as I allude to him as a subjectivist Bayesian philosopher throughout.

[2] I argue that the reason it was good evidence for GTR is precisely because it was long known, and yet all attempts to explain it were ad hoc, so that they failed to pass severe tests. The deflection effect, by contrast, was new and no one had discerned it before, let alone tried to explain it. Note that this is at odds with the idea that *novel* results count more for a theory H when H (temporally) *pre*dicts them, than when H accounts for known results. (Here the known perihelion of Mercury is thought to be better evidence for GTR than the novel deflection effect.) But the issue isn’t the novelty of the results, it’s how well-tested H is, or so I argue. (EGEK Chapter 8, p. 288).

[3] I don’t see that the newer attempts avoid the key problem in Garber’s. I’m not sure if Howson is rescinding the remark I quote from him in EGEK, p. 333. Here he was trying to solve it by subtracting the data out from what’s known.

[4] Some may want to use “informative” priors as well, but their meaning/rationale is unclear. Howson mentions Wes Salmon’s style of Bayesianism in this paper, but Salmon was a frequentist.

REFERENCES

-Glymour, C. (1980), *Theory and Evidence,* Princeton University Press. I’ve added a link to the relevant chapter, “Why I am Not a Bayesian” (from Fitelson resources). The relevant pages are 85-93.

–Howson, C (2017), “Putting on the Garber Style? Better Not”, *Philosophy of Science*, 84 (October 2017) pp. 659–676.

-Kass, R. and Wasserman, L. (1996), “The Selection of Prior Distributions By Formal Rules”, JASA 91: 1343-70.

*Further References to Solutions (to this or Related problems): *

-Garber, Daniel. 1983. “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In Minnesota Studies in the Philosophy of Science, ed. J. Earman, 99–131. Minneapolis: University of Minnesota Press.

-Hartmann, Stephan, and Branden Fitelson. 2015. “A New Garber-Style Solution to the Problem of Old Evidence.” Philosophy of Science 82 (4): 712–17. H

–Seidenfeld, T., Schervish, M., and Kadane, T. 2012. “What kind of uncertainty is that ?” *Journal of Philosophy*, (2012), pp 516-533.