A paper of mine on “double-counting” and novel evidence just came out: “Some surprising facts about (the problem of) surprising facts” in Studies in History and Philosophy of Science (2013), http://dx.doi.org/10.1016/j.shpsa.2013.10.005
ABSTRACT: A common intuition about evidence is that if data x have been used to construct a hypothesis H, then x should not be used again in support of H. It is no surprise that x fits H, if H was deliberately constructed to accord with x. The question of when and why we should avoid such ‘‘double-counting’’ continues to be debated in philosophy and statistics. It arises as a prohibition against data mining, hunting for significance, tuning on the signal, and ad hoc hypotheses, and as a preference for predesignated hypotheses and ‘‘surprising’’ predictions. I have argued that it is the severity or probativeness of the test—or lack of it—that should determine whether a double-use of data is admissible. I examine a number of surprising ambiguities and unexpected facts that continue to bedevil this debate.
A reader, Cory J, sent me a question in relation to a talk of mine he once attended:
I have the vague ‘memory’ of an example that was intended to bring out a central difference between broadly Bayesian methodology and broadly classical statistics. I had thought it involved a case in which a Bayesian would say that the data should be conditionalized on, and supports H, whereas a classical statistician effectively says that the data provides no support to H. …We know the data, but we also know of the data that only ‘supporting’ data would be given us. A Bayesian was then supposed to say that we should conditionalize on the data that we have, even if we know that we wouldn’t have been given contrary data had it been available.
That only “supporting” data would be presented need not be problematic in itself; it all depends on how this is interpreted. There might be no negative results to be had (H might be true) , and thus none to “be given us”. Your last phrase, however, does describe a pejorative case for a frequentist error statistician, in that, if “we wouldn’t have been given contrary data” to H (in the sense of data in conflict with what H asserts), even “had it been available” then the procedure had no chance of finding or reporting flaws in H. Thus only data in accordance with H would be presented, even if H is false; so H passes a “test” with minimal stringency or severity. I discuss several examples in papers below (I think the reader had in mind Mayo and Kruse 2001). Continue reading →