“The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,” at the 2015 American Psychological Society (APS) Annual Convention in NYC, May 23, 2015:
D. Mayo: “Error Statistical Control: Forfeit at your Peril”
S. Senn: “‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?”
A. Gelman: “The statistical crisis in science” (this is not his exact presentation, but he focussed on some of these slides)
For more details see this post.
Some thoughts: What was the most impressive thing about the session? I have never spoken to a psychology audience before or even been at a psych conference. I was impressed at the level of serious interest and the openness to hearing reflections from people not in psychology. Since I’m used to philosophy forums, I was also struck by the fact that the discussion was not about critically challenging the speakers (which isn’t to say there aren’t points we could have been challenged on.)
What was surprising, but in a good way? The fact that the three speakers (me, Senn, Gelman) agreed that both tests and estimation methods are needed.
Anything disappointing? When the panel was asked the great question of how psychology researchers might become more enlightened about philosophical foundations of inference, I had to admit that philosophers of science had fallen down on the job over the last 20 years.Despite the rich recent history of interdisciplinary work between statisticians and philosophers (say in 1960-90),they’ve been missing in action (with few exceptions) in contemporary work. I’ve discussed this before on this blog.
Click to access Article_Mayo.pdf
Thus, contemporary progress will require creating new interdisciplinary research of a meta-scientific/meta-statistical sort. More on this later.
Of course, philosophers of science should be encouraged to supplement their training in logic and probability with serious statistics.
I was present at the session, as a social psychologist. It was well-received on all levels, including the focused discussion at the end. Psychologists are rarely out to prove their excellence in such a forum; all we ask is that the presentation be comprehensible and non-condescending. Success on all points.
But, I will admit, I long for a manuscript that I can go through to understand the arguments better.
@senn I’d take issue with slide 19 as a bit odd – it’s only a “paradox” because the of the imprecision of prose which makes it sound as if two scientists start with somehow comparable prior information and assumptions when in fact their prior models on effect sizes are obviously different in slide 20. The example highlights how subsuming hypothesis testing within effect estimation and type S errors would avoid this sort of pointless obfuscation and confusion.
THE MOST RADICAL MESSAGES IN OUR TALKS? If I were to pick out some of the most radical messages in our 3 talks, a rather strong reaction would be expected. I invite the others to correct me and/or give their version of some of the radical themes. I’m not trying to introduce disagreement into what felt to be a lot of (welcome!) concordance, would be a shame to put these controversial claims out there and have them appear less serious than they are.
Senn: The arguments for the “replicability crisis”, particularly when blamed on significance tests rests upon an unwarranted computation/assumption:
(a) of expected Bayesian probability of replication (Goodman)
(b) of “science-wise” error rates.
Further, the allegation that P-values exaggerate the evidence by comparing a p-value to a Bayesian posterior is wrong-headed, and only shows that Bayesians can seriously disagree with eachother in interpreting results (e.g., according to whether they use a smooth or lump prior).
Senn will correct me if I’m wrong.
Gelman: I’m not sure, but the undercurrent seemed to be that statistical significance tests in general, or p-values in particular, permit unwarranted claims due to biases and researcher flexibility. We should be radically skeptical (unlike Turing and Kahneman), and perhaps stop using tests (as well as Bayes factors and I presume other techniques) except maybe for raising criticisms.
Gelman will correct me if I’m wrong.
Mayo: Current “probabilist” reforms to replace tests with likelihood ratios, Bayes factors, HPD intervals, or just lower the p-value so that the maximally likely alternative gets .95 posterior, given that p-value is attained, while ignoring “biasing selection effects” (defined in the slides) will fail. Moreover, the very basis for criticism–that selection effects can result in the actual error probabilities differing from and being much greater than reported error probabilities–vanishes.
(Since I’ve taken this right from the slides, I guess I must agree!)
One thing that was also touched on my all speakers, and myself in my brief opening, was the importance of individuals and their responsibility to think about these issues and be skeptical. Mayo mentioned the need to be skeptical of reforms – implied, but not explicitly stated, seemed skepticism of Bayesian reforms. Being one of the Bayesians advocating change, I think this is *critical*. One of the reasons I asked the panel about how people can educate themselves on these foundational issues is that I think skepticism is key. Bayesian statistics appears to be moving from “underdog” to mainstream, but it serves no one’s interest for people to simply accept it because someone told them it was the “right” thing to do (or even because they *think* they want to use it; ‘What is the probability of the null hypothesis?’ is not a good question, period, whether you want to ask it or not).
Senn mentioned that he dislikes almost everything, statistically; Gelman stressed that disbelief is always an option, and not to be “brow-beaten” by statisticians.
Interestingly, one of the questions asked from the audience showed the attitude that I think many psychologists have: Maybe psychologists can have statisticians analyze their data for them? I think there is an unhealthy impulse among psychologists to try to push the responsibility for statistical claims to others. “Just tell me how I can say this” — Bayesian, classical, whatever — is what many psychologists want.
There’s probably a nice paper in on this topic to be written: how to be statistically skeptical. I don’t mean skepticism about the results presented in a particular paper, but rather a more fundamental skepticism.