A reader, Cory J, sent me a question in relation to a talk of mine he once attended:
I have the vague ‘memory’ of an example that was intended to bring out a central difference between broadly Bayesian methodology and broadly classical statistics. I had thought it involved a case in which a Bayesian would say that the data should be conditionalized on, and supports H, whereas a classical statistician effectively says that the data provides no support to H. …We know the data, but we also know of the data that only ‘supporting’ data would be given us. A Bayesian was then supposed to say that we should conditionalize on the data that we have, even if we know that we wouldn’t have been given contrary data had it been available.
That only “supporting” data would be presented need not be problematic in itself; it all depends on how this is interpreted. There might be no negative results to be had (H might be true) , and thus none to “be given us”. Your last phrase, however, does describe a pejorative case for a frequentist error statistician, in that, if “we wouldn’t have been given contrary data” to H (in the sense of data in conflict with what H asserts), even “had it been available” then the procedure had no chance of finding or reporting flaws in H. Thus only data in accordance with H would be presented, even if H is false; so H passes a “test” with minimal stringency or severity. I discuss several examples in papers below (I think the reader had in mind Mayo and Kruse 2001).
The thorniest examples involve using the data x to construct or select hypothesis H in such a way as to ensure it accords with x, and then “again” as evidence to warrant H (as supported, well tested, indicated, or the like). I discussed the business of so-called “double-counting” briefly in a November 28, 2011 blogpost, and much more elsewhere. Since x is used to construct H, the resulting hypothesis (I often write it as H(x)) is sometimes said to be “use-constructed”:
Two types of examples, among others, are
(a) accounting for a result that is anomalous for some theory or model H (e.g., by means of invoking an auxiliary explanation);
(b) estimating or measuring a parameter (e.g., deflection effect, weight).
Both types have instances that are non-kosher, but others that are entirely kosher[i] from an error statistical point of view.
Under (a), the “Velikovsky dodge” would be non-kosher (Mayo 2010, 158), but explaining away the eclipse result (as not anomalous for GTR), kosher (156). Under (b), the confidence interval estimation example is kosher (Mayo 2008, 865) while the “optional-stopping” version of the same interval, non-kosher (866). [See also Opera Error post, for an informal example.]
I should add, given our recent discussion, a third example:
(c) using data to arrive at as well as test the adequacy of statistical models and model assumptions.
In pondering intuitions against such “double-counting” I discovered that considerable confusion was due to a fallacious slide from (1) to (2) involving two uses of “no matter what” (Mayo 1996):
(1) The procedure is guaranteed to output an H that accords with x, “no matter what the data are.”
(2) The procedure is guaranteed to output an H that accords with x, “no matter if the (use-constructed) H is true or false” (Mayo 1996, 27).
Only (2) would entail a lack of severity (as in cases where it is a “forgone conclusion” that H would find support)[ii]. By contrast, there are kosher cases where we “go wherever the data take us!”
I will not here surmise what Bayesian responses might be nowadays (see Mayo and Kruse 2001).
Mayo, D. (1996) Error and the Growth of Experimental Knowledge. Chicago: Chicago University Press.
Mayo, D. and M. Kruse (2001). “Principles of Inference and Their Consequences,” in D. Cornfield and J. Williamson (eds.) Foundations of Bayesianism. Dordrecht: Kluwer Academic Publishes: 381-403.
Mayo, D. (2008). “How to Discount Double-Counting When It Counts: Some Clarifications,” British Journal of Philosophy of Science, 59: 857–879.
Mayo, D. (2010). “An Ad Hoc Save of a Theory of Adhocness? Exchanges with John Worrall” in Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability and the Objectivity and Rationality of Science (D. Mayo and A. Spanos eds.), Cambridge: Cambridge University Press:155-169.
There’s something I don’t get. Does this mean using the data to search for a statistically adequate model is kosher or not kosher? It would not be known what adequate model would result, but that some would is assured, right? So wouldn’t that be unkosher? I’m confused.
Anon: No, to your last: It would be unkosher if the procedure outputs hypotheses or models, whether or not they are false. It is the same with data-dependent testing, or more generally, arriving at a solution to a problem. Since I’m dashing, I’ll just quote: “there are reliable procedures for using data both to identify and test hypotheses: the use of a DNA match to identify a criminal, radiointerferometry data to estimate the deflection of light, and in such homely examples as of using a ruler to measure the length of a kitchen table. Here, although the inferences (about the criminal, the deflection effect, the table length) were constructed to fit the data, they were deliberately constrained to reflect what is correct, at least approximately. It is the severity, stringency, or probativeness of the test—or lack of it—therefore that should determine if a double use of data is admissible—or so I have argued.”(Mayo 2008, 858)