*“Gathering of philosophers and physicists unaware of modern reconciliation of Bayes and Popper” by Andrew Gelman*

Hiro Minato points us to a news article by physicist Natalie Wolchover entitled “A Fight for the Soul of Science.”

I have no problem with most of the article, which is a report about controversies within physics regarding the purported untestability of physics models such as string theory (as for example discussed by my Columbia colleague Peter Woit). Wolchover writes:

Whether the fault lies with theorists for getting carried away, or with nature, for burying its best secrets, the conclusion is the same: Theory has detached itself from experiment. The objects of theoretical speculation are now too far away, too small, too energetic or too far in the past to reach or rule out with our earthly instruments. . . .

Over three mild winter days, scholars grappled with the meaning of theory, confirmation and truth; how science works; and whether, in this day and age, philosophy should guide research in physics or the other way around. . . .

To social and behavioral scientists, this is all an old old story. Concepts such as personality, political ideology, and social roles are undeniably important but only indirectly related to any measurements. In social science we’ve forever been in the unavoidable position of theorizing without sharp confirmation or falsification, and, indeed, unfalsifiable theories such as Freudian psychology and rational choice theory have been central to our understanding of much of the social world.

But then somewhere along the way the discussion goes astray:

Falsificationism is no longer the reigning philosophy of science. . . . Nowadays, as several philosophers at the workshop said, Popperian falsificationism has been supplanted by Bayesian confirmation theory . . . Bayesianism allows for the fact that modern scientific theories typically make claims far beyond what can be directly observed — no one has ever seen an atom — and so today’s theories often resist a falsified-unfalsified dichotomy. Instead, trust in a theory often falls somewhere along a continuum, sliding up or down between 0 and 100 percent as new information becomes available.

Noooooooooo! As Cosma Shalizi and I argue, “the most successful forms of Bayesian statistics do not actually support that particular philosophy [of inductive inference] but rather accord much better with sophisticated forms of hypothetico-deductivism.”

As reporting, the article is excellent. But it seems that nobody in the room was aware of the “Bayesian Data Analysis” view in which Bayesian models can and should be checked. We have to do more communicating. Naive Popperianism is bad, but so is naive anti-Popperianism. Lakatos would be spinning in his grave.

**Readers can search this blog for many posts on what I regard to be an improvement on Popper. Notably, I provide a more satisfactory notion of severe tests that accomplishes what Popper intended without the oversimplifications. Had Popper been more aware of statistical methods, my guess is that he would have taken the “error statistical turn”[1]. **

Share your comments.

[1]In a letter to me (on my notion of severe tests), Popper said he wished he was more familiar with statistical developments.

I wish I was at this meeting, just to set some of the participants straight on some philstat and philsci issues. (Norm Matloff had sent me items from this gathering last weekend, but I was too busy to respond.) In the article to which Gelman links we hear: “The Bayesian framework is much more flexible” than Popper’s theory, said Stephan Hartmann, a Bayesian philosopher”at the gathering. Indeed, flexibility is one of the things Popper was trying to reign in. You can’t just make the science better by adopting a looser theory of evidence and inference, yet that’s what Hartmann is saying.You can make yourself feel better but that’s different. Either you’ve done something to show a claim or theory would not be as in sync with evidence as it is, if specifiably wrong, or you have not.

There’s an enormous confusion in the discussion about stringent testing and falsification. For one thing, there’s the erroneous and oversimple conception of Popperian falsification (promoted by people like Pigliucci) already discussed a lot on this blog. For another, those wringing their hands about lack of experimental evidence in some theoretical physics overlook the fact that there are theoretical as well as empirical constraints that can substantiate strong arguments from coincidence & severe tests.

I thought of you when I read the article. I think we’re in complete agreement that this use of Bayesian confirmation theory is extremely worrying.

Richard: I’m surprised you agree with me that this “use of Bayesian confirmation theory is extremely worrying”? I don’t know if it’s worrisome, I just find it worrisome that some sophisticated philosophers of science are holding “Cliffs notes” versions of Popper and etching them further into stone among scientists who consult them. Of course the who multiverse theory is, in one way, entirely up the (frequentist) Bayesian alley. They try to do something like assess the relative frequencies of observers, galaxies, universes–something like that–and it’s based on some metaphysics about “naturalness”. I”ll link to one of my posts in a later comment.

Just how widely accepted, or even acknowledged, is this “modern reconciliation of Bayes and Popper” anyway? Should a group of philosophers and physicists gathered to discuss Dawid’s book be expected to be familiar with this development?

Anonymous: Maybe not, but they might be expected to have more than a “Cliff’s notes” understanding of Popperian falsification. And if they are serious about appealing to philosophies of evidence and statistical inference in appraising the criteria they are using in theory assessment, they might be expected to look beyond the current fashions in philo.In a recent book (that won the Lakatos prize), the author said he found the frequentist notion of probability greatly illuminated his multiverse theory, but since some philosophers of confirmation said it was unpopular, he forced his account into a (subjective) Bayesian mold. I agreed with the author: frequentist probability greatly illuminated his ideas, and hopefully others will discover this. The point is: philosophers are often creatures of fashion and bias, and if scientists consult us (which I think they should) they may wish to strive for a representative sample.

Popper wouldn’t like this search for the highly probable. Pursuit of highly improbable theories is Popper’s way. I wonder if these physicists study Popper,?

e.berk: There does seem to be a supposition that what’s wanted is high confirmation.

Achinstein asked about the meaning of well confirmed (in the article).

“I will continue to work on it,” Gross said.

“That’s pretty low,” Achinstein said.

“Not for science,” Gross said. “That’s the question that matters.”

The idea that acceptance is all about determining if it’s worth working on is very Popperian and not in sync with Achinstein’s probabilism. As you say, it’s a high content, high risk (and so improbable) theory that mattered for Popper because one is likely to learn more from pursuing it, subjecting it to tests, and using results to amend or extend hypotheses. Yet Achinstein is saying it doesn’t count for much, because he demands strong belief in the truth of a theory. This is the same reason he criticized me in many papers (search under Bayesian epistemology).

But this is misleading in the sense that not all philosophers of science agree with Achinstein. (By the way, Achinstein rejects Bayesian confirmation (in the sense of a boost in probability) as necessary or sufficient for evidence.

Scientists, consulting philosophers of science, need to strive for a random sample of phiosophers. Or better, they should seek out unpopular views. Popper was never really popular in the U.S.

It would be interesting if some of the theoretical physicists questioned or illuminated the assumptions underlying their leap to multiverses. I noticed, in the article, mention of an exception:

“Paul Steinhardt, a theoretical physicist at Princeton University and one of the early contributors to the theory of eternal inflation, saw the multiverse as a ‘fatal flaw’ in the reasoning he had helped advance, and he remains stridently anti-multiverse today. ‘Our universe has a simple, natural structure,’ he said in September. ‘The multiverse idea is baroque, unnatural, untestable and, in the end, dangerous to science and society.’

Steinhardt and other critics believe the multiverse hypothesis leads science away from uniquely explaining the properties of nature.”

https://www.quantamagazine.org/20141103-in-a-multiverse-what-are-the-odds/

Wow. And we keep hearing the appeal of multiverses is to attain “naturalness”, even though it’s admitted it doesn’t quite work out. I had a blogpost on this:

https://errorstatistics.com/2013/08/28/is-being-lonely-unnatural-for-slim-particles-a-statistical-argument/

Vapnik et al on Popper

@article{6038,

title = {Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions},

author = {Corfield, D. and Sch{\”o}lkopf, B. and Vapnik, V.},

journal = {Journal for General Philosophy of Science},

volume = {40},

number = {1},

pages = {51-58},

organization = {Max-Planck-Gesellschaft},

school = {Biologische Kybernetik},

month = jul,

year = {2009}

}

One thing I’ve been surprised by over the last year or so is just how few people seem to understand the foundations/motivations behind Gelman’s philosophy. Perhaps I underestimated how different it is to the ideas in mainstream philosophy/foundations (and/or how good Gelman’s intuition is?).

I would love to come across a philosopher or mathematician who could give even a half reasonable account of his approach. As far as I can tell from various exchanges, Mayo you by your own admission don’t understand it fully, right? I’ve found that Judea Pearl doesn’t understand it. Laurie I think might understand some but not all of it.

What’s going on here? I’ve certainly found a lot of value in it – why hasn’t it caught on and been developed in foundational circles? It seems to be crying out for some more foundational/formal development and I think there could be interesting formal work there – or at least as interesting as yet more arguments about the likelihood principle.

OM: Well we’ll try (to promote its “development in foundational circles”). I take it that whether or not the account itself is a work in progress, providing it with a philosophy of statistics is. In Gelman and Shalizi, and in some other places, he calls it error statistical, and I would endorse that conception. Whether specific methods achieve intended goals is less important than identifying the overall logic and rationale. That said, I admit that I’d like to better understand, and help to clarify, the account.

The following is based on those few articles of Gelman which I have read in particular his Bayesian EDA. In Section 3 he discusses three examples. The first one is finite mixture models which I discussed, if that is the correct word, with Michael Lew in a previous post. The particular model in Gelman’s example is 0.5N(mu_1,sigma_1^2)+0.5N(mu_2,sigma_2^2). I have come to the conclusion that this model poses a big challenge to likelihood. The model is perfectly sensible the problem is well-posed and does not require any regularization. It is not like the comb model where adjectives such as `silly’ can be used to dismiss it. Moreover there is a conceptually simple way of analysing the data based on the model. As an example in the discussion with Michael Lew I used the Old Faithful data available from Larry Wasserman’s webpage, it may even be available in R. For this data set there is no(?) value of the parameter theta=(mu_1,sigma_1,mu_2,sigma_2) for which the model above is consistent with the data. To get a consistent model we replace the 0.5 by p and 1-p do give the model pN(mu_1,sigma_1^2)+(1-p)N(mu_2,sigma_2^2) with now parameter theta =(p,mu1,sigma_1,mu_2,sigma_2) where mu_1<mu_2 to break the symmetry. There is a conceptually very simple method of analysing the data. Given alpha=0.9 say one determines all those parameter values theta for which d_{ku}(P_n,P(theta))<q_{ku}(alpha,n) where d_{ku}(P_n,P(theta)) is the Kuiper distance between the empirical distribution_n and the model distribution P(theta)=pN(mu_1,sigma_1^2)+(1-p)N(mu_2,sigma_2^2) and q_{ku}(alpha,n) is the alpha quantile of the Kuiper distance for a sample of size n. In particular q_{ku}(0.95,272)=0.106 and q_{ku}(0.99,272)=0.121 and q_{ku}(0.99999,272)=0.169. For p=0.5 and alpha=0.99999 there are no values of (mu_1,…,sigma_2) which satisfy the inequality. In other words p=0.5 is not consistent with the Old Faithful data. There are many things one can read off from the results. The closest model is with (p,mu_1,sigma_1,mu_2,sigma_2)=(0.34,1.974, 0.203,4.297,0.435) with a Kuiper distance of 0.0658. For p=0.34 the upper and lower bounds of acceptable parameter values are 1.921 and 2.118 for mu_1, 0.155 and 0.349 for sigma_1, 4.271 and 4.462 for mu_2 and 0.324 and 0.556 for sigma_2. The range of values for p is [0.25,0.38] and so on and so forth. Note that all this is based on distribution functions and not on densities. Gelman must be aware that such an analysis is possible but he insists on likelihood, differentiates to densities and gets into trouble. I simply do not understand this. If it were possible to gain some insight not available by from the method just described there would be some justification, but this is not the case. I realize that this is only one example but it seems to me to be a fundamental one and without a convincing explanation I will not be able to understand Gelman's philosophy.

On a more general level Gelman operates with two different modes which have different topologies, that is, concepts of closeness. He is not the only one to do this, many statisticians do it including frequentists. It is to be deprecated. When in his Bayesian mode he is at the level of densities and although the topology he uses is not specified I take it to be the L_1 or total variation metric. In his EDA mode he switches to distribution functions and the topology here can be characterized by the Kuiper metric. The Kuiper metric is based on intervals. One takes two distributions P and Q and an interval I. The difference in probabilities is |P(I)-Q(I)|. The value of the Kuiper distance is simply the maximum over all intervals I. This is deeper than it may appear and relates to the Popper-Vapnik reference I gave in a previous post. Given n distinct points on the real line you can obtain n(n+1)/2 +1 different subsets by taking intersections with all intervals. This is a polynomial of degree 2 known as the Vapnik-Cervonenkis index or dimension. The Kolmogorov metric is based on sets of the form (-infty,x] and they have dimension 1. In general one can use metrics based on V-C classes. In one sense this means that in order to learn something about possible parameter values one must restrict the calculation of probabilities to those of the form P(C) where C in {\mathcal C} is a V-C class. Finally the topologies induces by V-C classes are strictly weaker than those based on all subsets such as total variation. There is therefore in my opinion a fundamental tension and lack of consistency between Gelman's two modes.

The tension between the two modes can be often be eliminated by some form of regularization. Gelman states that this can be accomplished by a choice of prior. It is not clear to me whether he means that it can always be done or sometimes done. The answer is not always but sometimes. Here again it would be useful if some indication could be given as the circumstance under which regularization through the choice of a prior is possible. A few examples would be useful.

One central aspect of Gelman's EDA is the simulation of samples based on the model. These are then compared with the actual data. In order to simulate from a model it must be a model in the sense that it is a single probability measure, not a parametric family of such measures. Gelman treats a prior as part of the model and so has no qualms of changing the prior if it turns out not to be consistent with the data. I think this is perfectly acceptable. Suppose now my model is that the data are i.i.d. N(mu,sigma^2) and that mu is uniform over (-K,K) and sigma independent uniform over (0,L). This defines a model exactly specified by P(K,L). Thus given data x_n I can specify (K,L) and simulate samples of size n. However it is clear that these sample would have a much greater variability than if I specified (mu,sigma) and simulated from a N(mu,sigma^2) distribution. In fact as far as I understand it Gelman would not simulate from the model P(K,L) but from the posterior given the data. Here I am only pointing out the difference.

My attitude approach to parametric families is to specify those parameter values which are consistent with the data. The definition of consistency will depend on the problem and what is of interest but in a previous post I gave an R programme for the family of normal models. Very often the sigma here is a so called nuisance parameter and is not explicitly mentioned or estimated. Gelman recommends at times integrating out the nuisance parameters. What are the possible consequences of this for the EDA? Is it a perfectly neutral operation? My suspicion is that it is not. One will in general be integrating over values of the nuisance parameter which are not consistent with the data.

One final comment not directly about Gelman. There are many arguments for and against the Likelihood Principle which are very general. Why not take a specific example, namely the model 0.5N(mu_1,sigma_1^2)+0.5N(mu_2,sigma_2^2) and then give a convincing argument either for or against the likelihood principle in this particular case. Those that give an argument in favour then have the obligation to specify an analysis.

Laurie: Please remind us of the overall upshot of your stance.

Here is an interpretation that will make little sense to anybody I’m sure. Many years ago there was much debate over how to do non-equilbrium thermodynamics (still is, but the particular arguments I’m going to refer to have died away somewhat). One school favoured retaining a small state space and defining general constitutive functionals over the process history. Another favoured expanding the state space but using simpler constitutive functions, typically introducing additional dynamical equations for the additional variables.

The latter is typically considered a pragmatic approximation to the former, which is considered elegant and general but difficult to define particular models for. Other recent approaches have tried to give an independent, more sophisticated foundation for the latter, so that it is not necessarily just an approximation to the former.

The relation between Laurie and Gelman’s approaches is somewhat analogous in my mind.

Thanks Laurie. As I’ve said a few times now I think you raise good, valid points and propose a nice alternative (which is why I bought your book!).

I’ve been slowly gathering my thoughts by looking at a number of approaches and trying to extract the bits I like. I’m certainly planning to try these ideas out on the examples you’ve proposed.

I’m not sure yet whether it’ll come out more Gelman or more Davies or more something else. I think there are ways to keep some of the best parts of Gelman’s approach while meeting your objections, but we’ll see!

Happy hols, and I’ll make sure to say hi to your old friend if I run into him 🙂

Deborah, I have posted some comments on P-values on the last Stepen Senn contribution. I hope it answers some of your questions. It is difficult to give a short description of my stance.

https://www.ceremade.dauphine.fr/~xian/mixo.pdf on Bayesian modelling and inference for mixtures.

(Meant to say: Laurie – you may be interested in this)

Laurie, I think it would be nice if you sent Gelman an email with your thoughts on his ideas and ask whether he’d like to discuss them on his blog. He may well be open to this.

Regarding Gelman’s “philosophy”, at least at times Gelman is very open about this philosophy not being fully elaborated. I tried out my understanding of (parts of) it by writing a draft of the section on “Falsificationist Bayes” in our joint paper http://arxiv.org/abs/1508.05453 and Gelman apparently wasn’t too unhappy about it because he changed very little. This however doesn’t address Laurie’s concerns and involves some alternative possible interpretations of prior and posterior, so not everything is fully elaborated and nailed down.

Wolchover is wrong for another important reason too. According to Popper scientific theories contain metaphysical statements. But that does not make scientific theories unfalsifiable. A theory is falsifiable (that is Popper’s necessary and sufficient condition) “if there exists *at least one* non-empty class of homotypic basic statements which are forbidden by it” (emphasis on the “at least one”) . A scientific theory is not required to contain just empirical statements. This is impossible as such a theory wouldn’t actually explain anything.