Gelman responds to the comment[i] I made on my 8/31/13 post:

*Popper and Jaynes*

Posted by Andrew on 3 September 2013

Deborah Mayo quotes me as saying, “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive.” She then follows up with:

Gelman employs significance test-type reasoning to reject a model when the data sufficiently disagree.

Now, strictly speaking, a model falsification, even to inferring something as weak as “the model breaks down,” is not purely deductive, but Gelman is right to see it as about as close as one can get, in statistics, to a deductive falsification of a model. But where does that leave him as a Jaynesian?

My reply:

I was influenced by reading a toy example from Jaynes’s book where he sets up a model (for the probability of a die landing on each of its six sides) based on first principles, then presents some data that contradict the model, then expands the model.

I’d seen very little of this sort of this reasoning before in statistics! In physics it’s the standard way to go: you set up a model based on physical principles and some simplifications (for example, in a finite-element model you assume the various coefficients aren’t changing over time, and you assume stability within each element), then if the model doesn’t quite work, you figure out what went wrong and you make it more realistic.

But in statistics we weren’t usually seeing this. Instead, model checking typically was placed in the category of “hypothesis testing,” where the rejection was the goal. Models to be tested were straw man, build up only to be rejected. You can see this, for example, in social science papers that list research hypotheses that are

notthe same as the statistical “hypotheses” being tested. A typical research hypothesis is “Y causes Z,” with the corresponding statistical hypothesis being “Y has no association with Z after controlling for X.” Jaynes’s approach—or, at least, what I took away from Jaynes’s presentation—was more simpatico to my way of doing science. And I put a lot of effort into formalizing this idea, so that the kind of modeling I talk and write about can be the kind of modeling I actually do.I don’t want to overstate this—as I wrote earlier, Jaynes is no guru—but I do think this combination of model building and checking is important. Indeed, just as a chicken is said to be an egg’s way of making another egg, we can view inference as a way of sharpening the implications of an assumed model so that it can better be checked.

P.S. In response to Larry’s post here, let me give a quick +1 to this comment and also refer to this post, which remains relevant 3 years later.

I still don’t see how one learns about falsification from Jaynes when he alleges that the entailment of * x* from

*H*disappears once

*H*is rejected. But put that aside. In my quote from Gelman 2011, he was alluding to simple significance tests–without an alternative–for checking consistency of a model; whereas, he’s now saying what he wants is to infer an alternative model, and furthermore suggests one doesn’t see this in statistical hypotheses tests. But of course Neyman-Pearson testing always has an alternative, and even Fisherian simple significance tests generally indicate a direction of departure. However, neither type of statistical test method would automatically license going directly from a rejection of one statistical hypotheses to inferring an alternative model that was constructed to account for the misfit. A parametric discrepancy,δ, from a null may be indicated if the test very probably would not have resulted in so large an observed difference, were such a discrepancy absent (i.e., when the inferred alternative passes severely). But I’m not sure Gelman is limiting himself to such alternatives.

As I wrote in a follow-up comment: “*there’s no warrant to infer a particular model that happens to do a better job fitting the data x–at least on x alone. Insofar as there are many alternatives that could patch things up, an inference to one particular alternative fails to pass with severity. I don’t understand how it can be that some of the critics of the (bad) habit of some significance testers to move from rejecting the null to a particular alternative, nevertheless seem prepared to allow this in Bayesian model testing. But maybe they carry out further checks down the road; I don’t claim to really get the methods of correcting Bayesian priors (as part of a model)”*

A published discussion of Gelman and Shalizi on this matter is here.

[i] My comment was:

” If followers of Jaynes agree with [one of the commentators] (and Jaynes, apparently) that as soon asHis falsified, the grounds on which the test was based disappear!—a position that is based on a fallacy– then I’m confused as to how Andrew Gelman can claim to follow Jaynes at all. “Popper has argued (convincingly, in my opinion) that scientific inference is not inductive but deductive…” (Gelman, 2011, bottom p. 71).Gelman employs significance test-type reasoning to reject a model when the data sufficiently disagree. Now, strictly speaking, a model falsification, even to inferring something as weak as “the model breaks down,” is not purely deductive, but Gelman is right to see it as about as close as one can get, in statistics, to a deductive falsification of a model. But where does that leave him as a Jaynesian? Perhaps he’s not one of the ones in Paul’s Jaynes/Bayesian audience who is laughing, but is rather shaking his head?”

Mayo: When we spoke, you asked me about this, and my lame off-the-cuff response was along the lines of, “One gets a sense for what kinds of model elaborations are safe.” You gave this response all the respect it deserved.

I reflected on the question, and here’s what my reflections produced. A reasonable (continuous) model elaboration “unfixes” some parameter that was implicitly fixed to a particular value in the original inadequate model. Fixing a parameter to a particular value is, in Bayesian terms, equivalent to giving it an arbitrarily concentrated prior distribution; “unfixing” can then be viewed as changing the inadequate prior for that parameter to one that is more dispersed. Upon refitting the model, the data are then permitted to “speak” about the plausible values of the new parameter. This procedure is safe because if the data contain little information about the value of the new parameter, the posterior will remain about as dispersed as the prior, and this can be detected easily.

I’m not using model checks to “infer a model.” I’m using model checks to examine possible problems with a model that I’d already like to use. The idea is that we are working with a model (for example, a four-compartment differential equation model in toxicology) and we want to understand its problems with an eye toward improving it if possible.

I think one of the confusions in statistics regarding model checking is that people often try to cram all statistical procedures into the category of “inference.” In our book, we split Bayesian data analysis into 3 parts:

1. Model building

2. Inference within a model

3. Model checking

And we recommend iterating these stops. It is perfectly reasonable to stop at some point with a model with known flaws, just because it is too difficult to try to construct, fit, and understand a larger model.

Andrew: Thanks for the comment. The key thing would be just what these iterations consist of. There is still an “inference” to a model component even if it’s not inference within a full statistical model. But the main thing just now is that I don’t see the reasoning falling outside the existing panoply of tests of models and misspecification tests; nothing to suggest it is better attributed to “a toy example from Jaynes’s book”. There are many ways to expand a model that is contradicted: don’t you worry that there was no chance to have discerned others? This came up in the several posts (on this blog) on Aris Spanos and m-s tests.

Mayo:

I’m not saying that Jaynes deserves the historical credit for the idea of model checking. I have no idea. I’m just saying that this passage in his book was influential to me, and I believe that my attitude toward model checking has traditionally been unusual in statistics (but I think that, in part thanks to my efforts, it’s become more mainstream).

Andrew: Then I hope you will emphasize that learning is not building up a subjective probability, and get people to beware of (what I shall now call) Jaynes falsification fallacy.

George Box who died only a few months ago, was an applied Bayesian well ahead of his time. He advocated a mixture of Bayesian fitting and frequentist model checking. See

http://www.jstor.org/discover/10.2307/2982063?uid=3738488&uid=2&uid=4&sid=21102606609577

Stephen:

Yes, that’s why I find it so strange when Bayesian critics of frequentist statistics claim, some of them, that we assume our models are infallibly given, and then they, the Bayesians, declare that they of course will be like the frequentists and not check their models (two examples: Bernardo, Greenland).

The following is from a post: https://errorstatistics.com/2013/06/26/why-i-am-not-a-dualist-in-the-sense-of-sander-greenland/#comments

“I noticed in some of Greenland’s work* a presumption that a frequentist error statistician takes the model, and statistical assumptions of a method, as infallibly given. It’s easy to knock down such an absurd view. But where’s the evidence for this reading? Why would error statisticians have developed a battery of methods for testing assumptions if she regarded them as infallible. Why would she have erected methods—quite deliberately—that are quite robust to violated assumptions were she not aware they may be defeated? Indeed, the “ecumenism” of many, e.g., George Box, stems from an acknowledged need to utilize error statistical methods of some sort when it comes to checking the model (I think Gelman concurs, but I will not speak for him).”

Fair points but there is a technical problem for frequentists (and, in practice, Bayesians). If a battery of tests have been used to examine various incidental featuresof a model (e.g. Normality, homoscedasticity) when one then proceeds to test the substantive hypothesis using the model, can one then test as if one always knew the model was true?,

You don’t know the model is true; but if you’ve done the checking adequately, and found the model acceptable for the purpose, then yes. I’ve discussed this elsewhere. Even more than I, Aris Spanos has. But the main point is the one you agree with.