“The subjective Bayesian theory as developed, for example, by Savage … cannot solve the deceptively simple but actually intractable old evidence problem, whence as a foundation for a logic of confirmation at any rate, it must be accounted a failure.” (Howson, (2017), p. 674)

What? Did the “old evidence” problem cause Colin Howson to recently abdicate his decades long position as a leading subjective Bayesian? It seems to have. I was so surprised to come across this in a recent perusal of* Philosophy of Science* that I wrote to him to check if it is really true. (It is.) I thought perhaps it was a different Colin Howson, or the son of the one who co-wrote 3 editions of Howson and Urbach: *Scientific Reasoning: The Bayesian Approach* espousing hard-line subjectivism since 1989.[1] I am not sure which of the several paradigms of non-subjective or default Bayesianism Howson endorses (he’d argued for years, convincingly, against any one of them), nor how he handles various criticisms (Kass and Wasserman 1996), I put that aside. Nor have I worked through his, rather complex, paper to the extent necessary, yet. What about the “old evidence” problem, made famous by Clark Glymour 1980? What is it?

Consider Jay Kadane, a well-known subjective Bayesian statistician. According to Kadane, the probability statement: Pr(d(** X**) ≥ 1.96) = .025

“is a statement about d(

) before it is observed. After it is observed, the event {d(X) ≥ 1.96} either happened or did not happen and hence has probability either one or zero” (2011, p. 439).X

Knowing d_{0}= 1.96, (the specific value of the test statistic d(** X**)), Kadane is saying, there’s no more

*uncertainty*about it.* But would he really give it probability 1? If the probability of the data

**is 1, Glymour argues, then Pr(**

*x***|**

*x**H*) also is 1, but then Pr(

*H*|

**) = Pr(**

*x**H*)Pr(

**|**

*x**H*)/Pr(

**) = Pr(**

*x**H*), so there is no boost in probability for a hypothesis or model arrived at after

**. So does that mean known data doesn’t supply evidence for**

*x**H*? (Known data are sometimes said to violate

*temporal novelty*: data are temporally novel only if the hypothesis or claim of interest came first.) If it’s got probability 1, this seems to be blocked. That’s the old evidence problem. Subjective Bayesianism is faced with the old evidence problem if known evidence has probability 1, or so the argument goes.

*What’s the accepted subjective Bayesian solution to this? * (I’m really asking.) One attempt is to subtract out, or try to, the fact that ** x** is known, and envision being in a context prior to knowing

**That’s not very satisfactory or realistic, in general. Subjective Bayesians in statistics, I assume, just use the likelihoods and don’t worry about this: known data are an instance of a general random variable**

*x.**, and you just use the likelihood once it’s known that {*

**X***=*

**X****}. But can you do this and also hold, with Kadane, that it’s an event with probability 1? I’ve always presumed that the problem was mainly for philosophers who want to assign probabilities to statements in a language, rather than focusing on random variables and their distributions, or statistical models (a mistake in my opinion). I also didn’t think subjective Bayesians in statistics were prepared to say, with Kadane, that an event has probability 1 after it’s observed or known. Yet if probability measures your uncertainty in the event, Kadane seems right.**

*x**So how does the problem of old evidence get solved by subjective Bayesian practitioners?*I asked Kadane years ago, but did not get a reply.

Any case where the data are known prior to constructing or selecting a hypothesis to accord with them, strictly speaking, would count as cases where data are known, or so it seems.** The most well known cases in philosophy allude to a known phenomenon, such as Mercury’s perihelion, as evidence for Einstein’s GTR. (The perihelion was long known as anomalous for Newton, yet GTR’s predicting it, without adjustments, is widely regarded as evidence for GTR.)[2] You can read some attempted treatments by philosophers in Howson’s paper; I discuss Garber’s attempt in Chapter 10, Mayo 1996 [EGEK], 10.2.[3] *I’d like to hear from readers, regardless of statistical persuasion, how it’s handled in practice *(or why it’s deemed unproblematic).

But wait, are we sure it isn’t also a problem for non-subjective or default Bayesians? In this paradigm (and there are several varieties), the prior probabilities in hypotheses are not taken to express degrees of belief but are given by various formal assignments, so as to have minimal impact on the posterior probability. Although the holy grail of finding “uninformative” default priors has been given up, default priors are at least supposed to ensure the data dominate in some sense.[4] A true blue subjective Bayesian like Kadane is unhappy with non-subjective priors. Rather than quantify prior beliefs, non-subjective priors are viewed as primitives or conventions or references for obtaining posterior probabilities. How are they to be interpreted? It’s not clear, but let’s put this aside to focus on the “old evidence” problem.

*OK, so how do subjective Bayesians get around the old evidence problem?*

*I thank Jay Kadane for noticing I used the inequality in my original post 11/27/17. I haven’t digested his general reaction yet, stay tuned.

**There’s a place where Glymour (or Glymour, Scheines, Spirtes, and Kelly 1987) slyly argues that, strictly speaking, the data are always known by the time you appraise some some model–or so I seem to recall. But I’d have to research that or ask him.

[1] I’ll have to add a footnote to my new book (*Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars*, CUP, 2018), as I allude to him as a subjectivist Bayesian philosopher throughout.

[2] I argue that the reason it was good evidence for GTR is precisely because it was long known, and yet all attempts to explain it were ad hoc, so that they failed to pass severe tests. The deflection effect, by contrast, was new and no one had discerned it before, let alone tried to explain it. Note that this is at odds with the idea that *novel* results count more for a theory H when H (temporally) *pre*dicts them, than when H accounts for known results. (Here the known perihelion of Mercury is thought to be better evidence for GTR than the novel deflection effect.) But the issue isn’t the novelty of the results, it’s how well-tested H is, or so I argue. (EGEK Chapter 8, p. 288).

[3] I don’t see that the newer attempts avoid the key problem in Garber’s. I’m not sure if Howson is rescinding the remark I quote from him in EGEK, p. 333. Here he was trying to solve it by subtracting the data out from what’s known.

[4] Some may want to use “informative” priors as well, but their meaning/rationale is unclear. Howson mentions Wes Salmon’s style of Bayesianism in this paper, but Salmon was a frequentist.

REFERENCES

-Glymour, C. (1980), *Theory and Evidence,* Princeton University Press. I’ve added a link to the relevant chapter, “Why I am Not a Bayesian” (from Fitelson resources). The relevant pages are 85-93.

–Howson, C (2017), “Putting on the Garber Style? Better Not”, *Philosophy of Science*, 84 (October 2017) pp. 659–676.

-Kass, R. and Wasserman, L. (1996), “The Selection of Prior Distributions By Formal Rules”, JASA 91: 1343-70.

*Further References to Solutions (to this or Related problems): *

-Garber, Daniel. 1983. “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In Minnesota Studies in the Philosophy of Science, ed. J. Earman, 99–131. Minneapolis: University of Minnesota Press.

-Hartmann, Stephan, and Branden Fitelson. 2015. “A New Garber-Style Solution to the Problem of Old Evidence.” Philosophy of Science 82 (4): 712–17. H

–Seidenfeld, T., Schervish, M., and Kadane, T. 2012. “What kind of uncertainty is that ?” *Journal of Philosophy*, (2012), pp 516-533.

What about this? http://fitelson.org/probability/howson_oe.pdf

Enrique: That appeals to “subtracting out” the known data. This is what Howson accepted before he recently rejected the “counterfactual” solution he previously supported. I think in formal statistical settings, this is doable, by viewing the data as a general instance of the data generation procedure (as frequentists do). But in historical cases, as Glymour argues in some detail (I’ve linked to the relevant pages of his chapter on the blog now), this isn’t tenable. Subtracting out in statistics may avoid the old evidence problem but, in my view, it introduces a new one: data dependent selections don’t alter the evidential import of x on the selected H––at least, not without adding some additional stipulations not standard. In other words, the result is the reverse problem where using known data to reach H never matters. (It’s not the time element, it’s the manner in which the data are used to construct or select H.)

This is the view Howson champions in the paper you cite.

This is very helpful. Let me digest this. I will report back soon …

“What’s the accepted subjective Bayesian solution to this?” I have no idea. But I do know what is the complete and generalisable solution to this. The solution is pretty easy, but requires clarity concerning the relationship between the observed data and probabilities.

The data exist in the real world, but the probabilities exist only within a statistical model. Therefore we are not dealing with the probability of the actual real world data, but the probability within the model of data like the actual real world data, where likeness is assessed on the scale of the test statistic.

No problemmo! That’s all we need to deal with the ‘problem’ of ‘old evidence’.

Michael: Then probability is not measuring an agent’s “uncertainty”.

Does a falling apple support Newton’s theory of gravitation?

Laurie: It does if all that’s required is a B-boost.

I’ve added 3 references to this or related problems (the first 2 are cited in the paper, the third is one Kadane directed me to). The issues they take up are special cases of, or side routes to, the old evidence problem. The key issue is the problem of assigning probabilities < 1 to logical or mathematical truths, say H entails x. (They will correct me if I’m wrong.) Logical omniscience would require them to be assigned probability 1. The Garber-style solutions alter the probability calculus so that mathematical truths aren’t assigned probability 1.The Seidenfeld, et.al., (2012) paper, as I see it– I do not claim to have worked through it– develops a way to measure degrees of Bayesian incoherence. Let’s just separate the matter of how a Bayesian should assign probabilities to mathematical claims (the opening of their paper gives a fascinating discussion of Savage by the way) from the general problem of old evidence. The general issue is still whether known evidence, be it Mercury’s perihelion, or the data x used in building a model H to explain x, counts as evidence for GTR and H respectively.

Garber-style accounts are also Bayesian incoherent as is default/non-subjective Bayesianism, such as the one Howson now embraces. Garber assumes that if data are used in building a hypothesis H to explain it, that is x is not use-novel, then H does not get Bayesian support by

x. This is sometimes called the use-novelty requirement: for x to count in support of H, x should not have been used in constructing or selecting H. I say requiring use-novelty across the board is a mistake, and that’s the real issue: to distinguish cases where non use-novel data do, and do not, provide evidence for,or a good test of, H. This is an especially important issue in today’s world of data-driven science. My last paper on novel or surprising facts was in (2013) and is linked to here: https://errorstatistics.com/2013/12/15/surprising-facts-about-surprising-facts/ABSTRACT: A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such ‘‘double-counting’’ continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for predesignated hypotheses and ‘‘surprising’’ predictions. I have argued that it is the severity or probativeness of the test—or lack of it—that should determine whether a double-use of data is admissible. I examine a number of surprising ambiguities and unexpected facts that continue to bedevil this debate.

Apples fall to the ground and planets move in ellipses around the sun. All this was known before Newton’s theory. Hooke and others proposed a form of attraction between bodies to account for these known facts. Again all before Newton. The comes Newton’s theory. It was applied and did indeed account for the facts it was constructed to explain.

Let x be the data of mercury orbiting the sun. This data requires a star of the right size, a planet very close to it also of the right size and a form of life somewhat further from the star capable of making the measurement which form the data. the probability of x is if it indeed makes sense to talk of probability in this situation, small, very small.

Laurie: It would be unusual for the data x to itself be combined with theoretical and causal requirements for x. In any event, the issue crops up for subjective Bayesians because it’s given that x is known, and coherence requires or appears to require assigning it probability 1.

Indeed the data x is stripped of all context and if you are trying to develop a formal calculus for evidential support you probably have no option but to denude it in this fashion. This is why it is ‘unusual’ for x to be combined with etc. If x is the orbit of mercury an important part of the context is that this orbit is anomalous in the context of Newtonian gravitation. All attempts to explain this within the Newtonian theory, existence of other bodies perturbing the orbit, failed. So the orbit of Mercury poses an important unsolved problem. Einstein’s general relativity predicted this anomaly, and I mean predicted in the sense that it was an unforeseen consequence of Einstein’s theory. The general theory not related to Mercury’s orbit. It seems to me that this is a very good reason for this explanation of a known fact to be interpreted a supporting evidence for the theory. The context is what makes it convincing, stripped of the context the arguments are sterile. There cannot be a formal calculus of evidential support because such a calculus cannot include the context. There is a connection with the Likelihood Principle. The view from a distance is banned just as context is banned. So considerations of regularization and perturbations are banned, once again making the arguments sterile.

Laurie: I agree with your point:

“There cannot be a formal calculus of evidential support because such a calculus cannot include the context. There is a connection with the Likelihood Principle.”

Formal philosophers try to pack the desired background into the letters in a Bayesian computation but the result is just a reconstruction of intuitions and history rather than being a forward-looking account.

On Mercury, though, I think Einstein did use known data on the perihelion, but it’s still not the only relativistic theory of gravity that accounts for it. Something very telling occurred in the 60s though,counting against the supposition that it corroborated GTR, when Dicke claimed the measurements of the perihelion were actually off because they assumed a round sun when actually it is oblate. I’m just noting what I recall from Will’s, Was Einstein Right? So for a while the perihelion was in danger of being an anomaly for GTR, if not a flat out refutation, because it has to account for all of it. Dicke’s theory accounted for the new measure assuming solar oblateness. I think it’s fascinating that the whole thing turned on an assumption of the measurement or data model; had Dicke’s charge proved correct (and the way they checked it struck me as a low precision affair, but I’m no expert here), Dicke would have won over Einstein. And it wouldn’t have mattered that Dicke’s theory (Brans-Dicke) had adjustable parameters. (The two theories are indistinguishable with the right adjustments but there’s a metaphysical difference). Anyway, now they have all kinds of possible theories beyond Einstein, that differ in domains not probed.)

I’m not up on the latest with GTR, so others may correct or update me–I’d be very glad if they did.

The Brans-Dicke theory is dead. I remember well when Dicke was attempting to measure the alleged solar oblateness. It came to nothing, there isn’t sufficient solar oblateness by a large factor to rescue Brans-Dicke, now 50+ years later.

Bill: As I said. But it’s of interest in that an experimental assumption about fairly plane Jane measurements could make or break highly theoretical positions of Einstein vs Dicke.

Pingback: There Is No “Problem” Of Old Evidence – William M. Briggs

I made some comments at my place. There is no ‘problem’ with old evidence.

http://wmbriggs.com/post/23276/

I thank WMBriggs for his comments on this issue. I had been planning to write a long response making similar comments, but he has saved me a good deal of time.

I approach this problem as a statistician, not as a philosopher. I first learned of Glymour’s argument from a philosopher colleague over 20 years ago. It struck me as wrong then, for precisely the reason that WMBriggs says [namely, that the quantity P(E) is simply meaningless without putting appropriate conditions to the right of the conditioning bar that standard probability notation demands, and the knowledge that probability theory, conditional probability theory, is an extended form of logic that, like logic, depends only on the quantities that the probability statements depend on and is independent of when we may happen to have learned about the truth or falsity of particular pieces of evidence].

Over the years I’ve discussed Glymour’s argument with a number of the best Bayesian statisticians in the world. Not a single one of them buys Glymour’s argument, and all point pretty much to the same problems I outline above.

I presented a paper on the subject over a decade ago at a meeting of—it must be admitted—objectivist Bayesian statisticians, in a series of meetings started by the late physicist and statistician Edwin T. Jaynes. I’m really not clear, in fact, on what philosophers mean by “subjectivist” Bayesians. Bayesian statisticians in general will make use of what they call “subjective” priors as appropriate or necessary, but I doubt that this makes them “subjectivist” in the sense that philosophers mean it, as they would not accept Glymour’s claim that if E is old evidence, then P(E)=1, and they would regard a statement like P(E) as not well-defined, as it is missing the background information B that a meaningful statement like P(E|B), with clearly specified B, would have. My paper can be found here:

[Sorry, I haven’t figured out how to post links that are linked to arbitrary text here.]

“Bayesians Can Learn From Old Data,” by William H. Jefferys. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering, 27th International Workshop. AIP Conference Proceedings Volume 954. Edited by Kevin H. Knuth, et. al. Melville, New York: American Institute of Physics, 2007, pp. 85-92.

Rereading this paper now ten years later, I note that in it I didn’t explicitly say, as I should have, that P(E) is simply meaningless, the point that WMBriggs makes forcefully. Had I to write it over, I would have spent some time on that particular point.

Bill:The criticism is leveled at subjective Bayesians. Howson moves to a variant of non-subjective Bayesianism at least in part because after 30 or 40 years of as astute defenses of subjective Bayesianism as you will find, it fails to do justice to the old evidence problem. It’s fine to say of course we wouldn’t say a known effect has probability 1, but you can’t at the same time say that probability is measuring uncertainty, and once the data (or effect) are known, there is no more uncertainty about them. You can’t say, with subjectivists, that Pr(known E) < 1 is incoherent, as Kadane does (rightly it seems), and say one is being coherent Bayesianly.

“After it is observed, the event {d(X) ≥ 1.96} either happened or did not happen and hence has probability either one or zero” (2011, p. 439).

I don't think philosophers invented the term "subjective Bayesian". Probabilities are supposed to measure degrees of belief often gleaned by betting on statements or events. When Lindley moved from non-subjective to subjective Bayesianism, he took himself to be making a real change, and the reason he made the change (he claims) is incoherence as shown by Dawid and others regarding marginality paradoxes. Is this not so? Now maybe no one cares to uphold Bayesian coherence any more (certainly default/non-subjective Bayesians admit to violating it). But that's different from explaining how the assignment is actually done. See the link from Glymour:

http://fitelson.org/probability/glymour.pdf pp. 87-91.

I will read your paper later.

I grant that there's a distinction, though it's subtle and so far as I know, I'm the only one who has made it, between statistical foundations in practice, and the uses of statistics by philosophers. The philosopher seeks to use it to illuminate and answer problems about scientific reasoning and inductive inference. The quasi-historical task Glymour talks about, you might aver, is not something a practitioner would ever worry about. This I already conceded in my post. But when you speak of an "extended form of logic that, like logic, depends only on the quantities that the probability statements depend on and is independent of" what we know, you're really sliding into that philosophical program (heralded by logical positivists) of a logic of induction. Despite the longing for such a thing by some, it hasn't been successful. For starters, you can never have the catchall of "all hypotheses that could explain x" in actual science. In science, unlike deductive logic, we want soundness, not mere formal validity. (My own position is that the goal of a logic of induction is based on a philosophical mistake, but never mind that.)

Thanks for your comment; I look forward to reading your paper.

The point is that P(known E), unconditioned on anything, is in itself incoherent. It is meaningless, as WMBriggs said, as it does not state what it is conditioned on. If it is conditioned on B=”E is known to be true”, then fine, P(known E|B=E is known to be true)=1. But P(known E) itself is just a bunch of letters on the page, it doesn’t mean anything at all.

There is, in my view, one, and only one way to introduce evidence (old or new) into a Bayesian calculation, and that is to condition on it. That is why P(known E) is incoherent. It tries to introduce the fact that E is known without conditioning on it. As Ed Jaynes stressed (cited in my paper), all probability is conditional, and failure to use the machinery correctly, as Glymour did, is to invite horrible errors.

So, for example, a key step in Glymour’s argument is that if P(E)=1 (his claim, because E is “old data”), then P(E|H)=1 for any H whatsoever. But that’s wrong. Because if H is a hypothesis that has the consequence that if H is true, then E will never be observed, and P(E|H)=0, not 1, regardless of whether we have actually observed E or not (even if we observed it 70 years ago!). Sure, P(E|H,E)=1, but that’s not what Glymour wrote. If he had meant to condition everything on E, he should have written the conditional probability that way. But he didn’t. In this example, the fact that E was observed (regardless of when) only proves that the hypothesis H is not true.

No one who understands conditional probability as an extended form of logic, or who understands Cox’s theorem (cited in my paper), would make such an elementary mistake. It really has nothing to do with “subjectivism” or “objectivism”, it has to do with understanding the machinery of conditional probability and using it correctly.

You comment: “For starters, you can never have the catchall of “all hypotheses that could explain x” in actual science. In science, unlike deductive logic, we want soundness, not mere formal validity.”

I would say that every scientist like myself knows that we never know “all hypotheses that could explain x”. The best we can do at any given time is to consider the hypotheses that we are aware of and figure out which of these best does the job. Prior to the invention of general relativity, the best we could do to explain the anomalous perihelion motion of Mercury was to invent a hard-to-observe planet, or some sort of solar oblateness, or maybe some unknown failure of the inverse square law. All of these had the defect that they involved some sort of “fudge factor” that had to be chosen just right to explain the data observed. And the alleged observations of “Vulcan”, that hard-to-observe planet, couldn’t be replicated.

And then, Einstein invented from some simple assumptions a theory that not only predicted an anomalous perihelion motion, but also predicted a value very close to what was actually observed. So we had a new hypothesis, and that (for a Bayesian) demanded that we figure out a coherent way to introduce that new hypothesis and analyze it relative to the hypotheses that I’ve collectively described as using “fudge factors” above. Jim Berger and I attempted an analysis of this augmented problem in a paper we published in American Scientist (89, 64-72, January 1992). A pdf of our technical report on which this paper was based (the published paper had a different title and was edited) can be found here: http://billandsue.net/papers/ockham.pdf

We do not claim that this is the last word on the subject, and certainly there have been lots of tests of general relativity since 1920 that have been very important in solidifying the evidence in favor of it. So is general relativity the last, true, correct answer? I don’t know any physicist or astronomer that doesn’t think that it could be deposed, tomorrow, by some experimental result that is completely at odds with it. So as a working scientist I would say that we never claim to know “all hypotheses that could explain x”. That’s a fact of life. Does that make us Bayesianly incoherent? Only if you insist that the only way to be coherent as a Bayesian is to know at the outset “all hypotheses that could explain x”, and I don’t know any scientist, Bayesian or not, that thinks that. I do think that it is coherent to do Bayesian calculations with an incomplete set of “hypotheses that could explain x” as long as we recognize the limitations of such calculations and in particular that they are conditioned on a particular set of hypotheses, and not ALL of the hypotheses that could explain x (which includes hypotheses that we haven’t been clever enough to think of yet).

Bill: You say: probabilities “like logic, depends only on the quantities that the probability statements depend on and is independent of when we may happen to have learned about the truth or falsity of particular pieces of evidence].”

But we wouldn’t want to say that knowing the data on 2017 stock prices, and erecting a stock prediction algorithm to get most or all correct, is just as strong confirmation that H: you’re stock predictor is reliable, as it would be if you set out H in advance, back at the start of the year.

So probability alone doesn’t measure confirmation.

Deborah: You are misreading my comment. I’m talking about writing down statements in the language of the probability calculus. A statement like P(E|H) depends only on what E and H are, which is a matter of mathematics, and not on when we may have learned about the truth or falsity of E, because the mathematics of the probability calculus doesn’t care when E was observed.

As for your comment on stock prices, it’s still the case that the probabilities predicted by the same H (whether invented in advance or later) are the same if the H’s are the same. That’s just a matter of mathematics. The reason why we would not regard them as as strong confirmation is that an H constructed after looking at the data has of course been designed to do a good job of predicting the historical data. This is why statisticians wouldn’t test H by designing H to fit the data and then saying that it fits the data. They’d use historical data to design H, perhaps, and then test this H by looking at entirely new data that was not used in the design. Sometimes this is done by randomly splitting the data into two sets, using one to design and the other, independently, to test. Or sometimes you just wait for new data to be observed.

However, the case of GR is different. The perihelion advance that GR predicts just “fell out” of the theory. Einstein didn’t design GR to agree with the already-observed perihelion advance, his simple assumptions just happened to have that as a consequence, as was discovered after-the-fact when people started doing calculations using GR.

Bill:

A key difference between those who make use of error probabilities of methods in assessing inference (error statisticians for short) and those who follow the Likelihood Principle is that the former takes into account how a variety of selection effects and stopping rules alter error probabilities. Under the latter we get freedom from the sampling plan, once the data are in hand. This is discussed a lot on this blog. Just 2 links:

https://errorstatistics.com/2014/10/10/breaking-the-royall-law-of-likelihood-c/

“The likelihood principle implies…the irrelevance of predesignation, of whether an hypothesis was thought of beforehand or was introduced to explain known effects.” (Rosenkrantz, p. 122)

https://errorstatistics.com/2016/02/03/philosophy-laden-meta-statistics-is-the-new-technical-activism-free-of-statistical-philosophy/

“Two problems that plague frequentist inference: multiple comparisons and multiple looks, or, as they are more commonly called, data dredging and peeking at the data. The frequentist solution to both problems involves adjusting the P-value…But adjusting the measure of evidence because of considerations that have nothing to do with the data defies scientific sense, belies the claim of ‘objectivity’ that is often made for the P-value”(Goodman 1999, p. 1010).

https://errorstatistics.com/2014/04/05/who-is-allowed-to-cheat-i-j-good-and-that-after-dinner-comedy-hour-2/

In general, suppose that you collect data of any kind whatsoever — not necessarily Bernoullian, nor identically distributed, nor independent of each other . . . — stopping only when the data thus far collected satisfy some criterion of a sort that is sure to be satisfied sooner or later, then the import of the sequence of n data actually observed will be exactly the same as it would be had you planned to take exactly n observations in the first place. (Edwards, Lindman, and Savage 1962, 238-239)

So given your remarks, either you’re appealing to error probabilities of a method or pack the influences of selection effects into priors somehow. There may be other ways, but the effects must show up in the formal account, and as I say, this has long been a point of disagreement between frequentist error statisticians and Bayesians (even non-subjective Bayesians).

Deborah, one question:

You wrote: “You can’t say, with subjectivists, that Pr(known E) < 1 is incoherent, as Kadane does(rightly it seems), and say one is being coherent Bayesianly."

I have a hard time parsing this comment. Does Kadane agree or disagree with the statement that P(known E)<1? What do you think he means by this? There are too many clauses and negatives, double negatives in here for me to figure out what you mean. Write it out clearly in a few sentences so that I can figure out what you mean.

Sorry for being so thick.

Bill: As I indicated, Kadane says:

“After it is observed, the event {d(X) ≥ 1.96} either happened or did not happen and hence has probability either one or zero” (2011, p. 439).

On the business of assigning higher priors to “simpler” theories, as in your Bayesian Strop paper, insofar as one can articulate simplicity (and we know there are completely opposite ways of doing it), I don’t think the results square with scientific practice. But, then again, I don’t think scientists assign prior probabilities to theories like GTR. I think there’s a slide between theories that are poorly tested on account of certain ad hockeries and being fit by adjusted parameters, on the one hand, and the theories themselves being a priori improbable in some sense. As for the case at hand, John Eaman discusses how scientists did not count Brans-Dicke as a priori less plausible on account of adjustable parameters. Some point out that if Brans-Dicke had been articulated before GTR, the latter might be described as having been arrived at via adjustable parameters.However they are arrived at all that matters is how they’re tested.

Of course, all these reconstructions of GTR, Mercury, deflection effect etc. are vastly oversimple and rather distant from actual appraisals of local hypotheses in probing aspects of different relativistic theories of gravity.

Deborah, thanks for your clarification of what you meant by Jay Kadane’s comment.

I would say then that he is talking about P({d(X) ≥ 1.96} | {d(X) ≥ 1.96} has been observed), not about P({d(X) ≥ 1.96}), which is not a meaningful probability in my view as it does not state what it is conditioned on.

The Bayesian Ockham’s razor idea has been around for a while. It’s inspired by ideas that Harold Jeffreys (no relation) and Dennis Lindley discussed much earlier (https://en.wikipedia.org/wiki/Lindley%27s_paradox).

It’s simply that the existence of an adjustable parameter in essence splits a theory into a multiplicity of theories, which automatically divides that prior probability that one might apply to the theory as a whole into a multiplicity of priors, one for each sub-theory that the theory has been split into. This wastes prior probability on values of the parameter that enough data will eventually reject. By comparison a theory (like General Relativity) that happens to predict a precise value of an effect will have a “leg up”.