“The Philosophy of Statistics: Bayesianism, Frequentism and the Nature of Inference,” at the 2015 American Psychological Society (APS) Annual Convention in NYC, May 23, 2015:
D. Mayo: “Error Statistical Control: Forfeit at your Peril”
S. Senn: “‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?”
A. Gelman: “The statistical crisis in science” (this is not his exact presentation, but he focussed on some of these slides)
For more details see this post.
Some thoughts: What was the most impressive thing about the session? I have never spoken to a psychology audience before or even been at a psych conference. I was impressed at the level of serious interest and the openness to hearing reflections from people not in psychology. Since I’m used to philosophy forums, I was also struck by the fact that the discussion was not about critically challenging the speakers (which isn’t to say there aren’t points we could have been challenged on.)
What was surprising, but in a good way? The fact that the three speakers (me, Senn, Gelman) agreed that both tests and estimation methods are needed.
Anything disappointing? When the panel was asked the great question of how psychology researchers might become more enlightened about philosophical foundations of inference, I had to admit that philosophers of science had fallen down on the job over the last 20 years.Despite the rich recent history of interdisciplinary work between statisticians and philosophers (say in 1960-90),they’ve been missing in action (with few exceptions) in contemporary work. I’ve discussed this before on this blog.
Click to access Article_Mayo.pdf
Thus, contemporary progress will require creating new interdisciplinary research of a meta-scientific/meta-statistical sort. More on this later.
Of course, philosophers of science should be encouraged to supplement their training in logic and probability with serious statistics.
I was present at the session, as a social psychologist. It was well-received on all levels, including the focused discussion at the end. Psychologists are rarely out to prove their excellence in such a forum; all we ask is that the presentation be comprehensible and non-condescending. Success on all points.
But, I will admit, I long for a manuscript that I can go through to understand the arguments better.
@senn I’d take issue with slide 19 as a bit odd – it’s only a “paradox” because the of the imprecision of prose which makes it sound as if two scientists start with somehow comparable prior information and assumptions when in fact their prior models on effect sizes are obviously different in slide 20. The example highlights how subsuming hypothesis testing within effect estimation and type S errors would avoid this sort of pointless obfuscation and confusion.
THE MOST RADICAL MESSAGES IN OUR TALKS? If I were to pick out some of the most radical messages in our 3 talks, a rather strong reaction would be expected. I invite the others to correct me and/or give their version of some of the radical themes. I’m not trying to introduce disagreement into what felt to be a lot of (welcome!) concordance, would be a shame to put these controversial claims out there and have them appear less serious than they are.
Senn: The arguments for the “replicability crisis”, particularly when blamed on significance tests rests upon an unwarranted computation/assumption:
(a) of expected Bayesian probability of replication (Goodman)
and/or
(b) of “science-wise” error rates.
Further, the allegation that P-values exaggerate the evidence by comparing a p-value to a Bayesian posterior is wrong-headed, and only shows that Bayesians can seriously disagree with eachother in interpreting results (e.g., according to whether they use a smooth or lump prior).
Senn will correct me if I’m wrong.
Gelman: I’m not sure, but the undercurrent seemed to be that statistical significance tests in general, or p-values in particular, permit unwarranted claims due to biases and researcher flexibility. We should be radically skeptical (unlike Turing and Kahneman), and perhaps stop using tests (as well as Bayes factors and I presume other techniques) except maybe for raising criticisms.
Gelman will correct me if I’m wrong.
Mayo: Current “probabilist” reforms to replace tests with likelihood ratios, Bayes factors, HPD intervals, or just lower the p-value so that the maximally likely alternative gets .95 posterior, given that p-value is attained, while ignoring “biasing selection effects” (defined in the slides) will fail. Moreover, the very basis for criticism–that selection effects can result in the actual error probabilities differing from and being much greater than reported error probabilities–vanishes.
(Since I’ve taken this right from the slides, I guess I must agree!)
One thing that was also touched on my all speakers, and myself in my brief opening, was the importance of individuals and their responsibility to think about these issues and be skeptical. Mayo mentioned the need to be skeptical of reforms – implied, but not explicitly stated, seemed skepticism of Bayesian reforms. Being one of the Bayesians advocating change, I think this is *critical*. One of the reasons I asked the panel about how people can educate themselves on these foundational issues is that I think skepticism is key. Bayesian statistics appears to be moving from “underdog” to mainstream, but it serves no one’s interest for people to simply accept it because someone told them it was the “right” thing to do (or even because they *think* they want to use it; ‘What is the probability of the null hypothesis?’ is not a good question, period, whether you want to ask it or not).
Senn mentioned that he dislikes almost everything, statistically; Gelman stressed that disbelief is always an option, and not to be “brow-beaten” by statisticians.
Interestingly, one of the questions asked from the audience showed the attitude that I think many psychologists have: Maybe psychologists can have statisticians analyze their data for them? I think there is an unhealthy impulse among psychologists to try to push the responsibility for statistical claims to others. “Just tell me how I can say this” — Bayesian, classical, whatever — is what many psychologists want.
There’s probably a nice paper in on this topic to be written: how to be statistically skeptical. I don’t mean skepticism about the results presented in a particular paper, but rather a more fundamental skepticism.
Richard:
The whole process of management of statistical collaboration and consultation would benefit from more study.
Statisticians differ greatly on their training, abilities and motivations – some just learn to make the client happy others back-off on giving good advice as that can lead to the clients finding statisticians who give more enabling but bad advice. An instance I am thinking about here, was a young statistician working with clinical researchers at an ivy league university who confided in me that they initially were very concerned about addressing multiplicities but stopped when they noticed clinicians preferred to work with statisticians that did not raise these concerns.
Another, would have been at different medical school where the senior biostats faculty instructed clinicians to do step-wise selection when analyzing observational studies (even though in repeated discussions with me, they were well aware of that this did not address confounding properly and they did have a good grasp of better methods which would have been too time consuming for them to be a co-author on. With step-wise regression they were publishing about 10 studies a year with very little work on their part)
Keith O’Rourke
I was just giving some of my no doubt biased experiences.
But by benefit from more study I meant surveys, audits and studies.
Here is an example – http://simplystatistics.org/2015/04/29/data-analysis-subcultures/
Keith O’Rourke
Thanks for the comments. I’m for skepticism but it must be associated with a constructive account for how to properly carry out statistical inference. I don’t buy that idea that a bunch of horror stories constitutes an argument in the least. I regard it as encouraging (what I call) a Dale Carnegie fallacy. There are pretty obvious ways that more self-critical statistics can be done in psychology ,assuming people really wanted to. On the other hand, if certain areas border on pseudoscience/questionable science, then statistics can only and at best be used to discover the lack of regularity. That requires a willingness to discover one has been standing on quicksand, and it appears than many prefer to weaken the tools to prevent discovering this.
You were here in the city and I didn’t know! I could have shown you my collection of wee p-values.
I’m still here (live here part time)….but I rarely fall for an offer to show me a collection of wee p-values