Gelman responds to the comment[i] I made on my 8/31/13 post:
Popper and Jaynes
Posted by Andrew on 3 September 2013
Deborah Mayo quotes me as saying, “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive.” She then follows up with:
Gelman employs significance test-type reasoning to reject a model when the data sufficiently disagree.
Now, strictly speaking, a model falsification, even to inferring something as weak as “the model breaks down,” is not purely deductive, but Gelman is right to see it as about as close as one can get, in statistics, to a deductive falsification of a model. But where does that leave him as a Jaynesian?
I was influenced by reading a toy example from Jaynes’s book where he sets up a model (for the probability of a die landing on each of its six sides) based on first principles, then presents some data that contradict the model, then expands the model.
I’d seen very little of this sort of this reasoning before in statistics! In physics it’s the standard way to go: you set up a model based on physical principles and some simplifications (for example, in a finite-element model you assume the various coefficients aren’t changing over time, and you assume stability within each element), then if the model doesn’t quite work, you figure out what went wrong and you make it more realistic.
But in statistics we weren’t usually seeing this. Instead, model checking typically was placed in the category of “hypothesis testing,” where the rejection was the goal. Models to be tested were straw man, build up only to be rejected. You can see this, for example, in social science papers that list research hypotheses that are not the same as the statistical “hypotheses” being tested. A typical research hypothesis is “Y causes Z,” with the corresponding statistical hypothesis being “Y has no association with Z after controlling for X.” Jaynes’s approach—or, at least, what I took away from Jaynes’s presentation—was more simpatico to my way of doing science. And I put a lot of effort into formalizing this idea, so that the kind of modeling I talk and write about can be the kind of modeling I actually do.
I don’t want to overstate this—as I wrote earlier, Jaynes is no guru—but I do think this combination of model building and checking is important. Indeed, just as a chicken is said to be an egg’s way of making another egg, we can view inference as a way of sharpening the implications of an assumed model so that it can better be checked.
P.S. In response to Larry’s post here, let me give a quick +1 to this comment and also refer to this post, which remains relevant 3 years later.
I still don’t see how one learns about falsification from Jaynes when he alleges that the entailment of x from H disappears once H is rejected. But put that aside. In my quote from Gelman 2011, he was alluding to simple significance tests–without an alternative–for checking consistency of a model; whereas, he’s now saying what he wants is to infer an alternative model, and furthermore suggests one doesn’t see this in statistical hypotheses tests. But of course Neyman-Pearson testing always has an alternative, and even Fisherian simple significance tests generally indicate a direction of departure. However, neither type of statistical test method would automatically license going directly from a rejection of one statistical hypotheses to inferring an alternative model that was constructed to account for the misfit. A parametric discrepancy,δ, from a null may be indicated if the test very probably would not have resulted in so large an observed difference, were such a discrepancy absent (i.e., when the inferred alternative passes severely). But I’m not sure Gelman is limiting himself to such alternatives.
As I wrote in a follow-up comment: “there’s no warrant to infer a particular model that happens to do a better job fitting the data x–at least on x alone. Insofar as there are many alternatives that could patch things up, an inference to one particular alternative fails to pass with severity. I don’t understand how it can be that some of the critics of the (bad) habit of some significance testers to move from rejecting the null to a particular alternative, nevertheless seem prepared to allow this in Bayesian model testing. But maybe they carry out further checks down the road; I don’t claim to really get the methods of correcting Bayesian priors (as part of a model)”
A published discussion of Gelman and Shalizi on this matter is here.
[i] My comment was:
” If followers of Jaynes agree with [one of the commentators] (and Jaynes, apparently) that as soon as H is falsified, the grounds on which the test was based disappear!—a position that is based on a fallacy– then I’m confused as to how Andrew Gelman can claim to follow Jaynes at all. “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive…” (Gelman, 2011, bottom p. 71).Gelman employs significance test-type reasoning to reject a model when the data sufficiently disagree. Now, strictly speaking, a model falsification, even to inferring something as weak as “the model breaks down,” is not purely deductive, but Gelman is right to see it as about as close as one can get, in statistics, to a deductive falsification of a model. But where does that leave him as a Jaynesian? Perhaps he’s not one of the ones in Paul’s Jaynes/Bayesian audience who is laughing, but is rather shaking his head?”
Mayo: When we spoke, you asked me about this, and my lame off-the-cuff response was along the lines of, “One gets a sense for what kinds of model elaborations are safe.” You gave this response all the respect it deserved.
I reflected on the question, and here’s what my reflections produced. A reasonable (continuous) model elaboration “unfixes” some parameter that was implicitly fixed to a particular value in the original inadequate model. Fixing a parameter to a particular value is, in Bayesian terms, equivalent to giving it an arbitrarily concentrated prior distribution; “unfixing” can then be viewed as changing the inadequate prior for that parameter to one that is more dispersed. Upon refitting the model, the data are then permitted to “speak” about the plausible values of the new parameter. This procedure is safe because if the data contain little information about the value of the new parameter, the posterior will remain about as dispersed as the prior, and this can be detected easily.
I’m not using model checks to “infer a model.” I’m using model checks to examine possible problems with a model that I’d already like to use. The idea is that we are working with a model (for example, a four-compartment differential equation model in toxicology) and we want to understand its problems with an eye toward improving it if possible.
I think one of the confusions in statistics regarding model checking is that people often try to cram all statistical procedures into the category of “inference.” In our book, we split Bayesian data analysis into 3 parts:
1. Model building
2. Inference within a model
3. Model checking
And we recommend iterating these stops. It is perfectly reasonable to stop at some point with a model with known flaws, just because it is too difficult to try to construct, fit, and understand a larger model.