I haven’t been blogging that much lately, as I’m tethered to the task of finishing revisions on a book (on the philosophy of statistical inference!) But I noticed two interesting blogposts, one by Jeff Leek, another by Andrew Gelman, and even a related petition on Twitter, reflecting a newish front in the statistics wars: When it comes to improving scientific integrity, do we need more carrots or more sticks?
Leek’s post, from yesterday, called “Statistical Vitriol” (29 Sep 2016), calls for de-escalation of the consequences of statistical mistakes:
Over the last few months there has been a lot of vitriol around statistical ideas. First there were data parasites and then there were methodological terrorists. These epithets came from established scientists who have relatively little statistical training. There was the predictable backlash to these folks from their counterparties, typically statisticians or statistically trained folks who care about open source.
I’m a statistician who cares about open source but I also frequently collaborate with scientists from different fields. It makes me sad and frustrated that statistics – which I’m so excited about and have spent my entire professional career working on – is something that is causing so much frustration, anxiety, and anger.
I have been thinking a lot about the cause of this anger and division in the sciences. As a person who interacts with both groups pretty regularly I think that the reasons are some combination of the following.
1. Data is now everywhere, so every single publication involves some level of statistical modeling and analysis. It can’t be escaped.
2. The deluge of scientific papers means that only big claims get your work noticed, get you into fancy journals, and get you attention.
3. Most senior scientists, the ones leading and designing studies, have little or no training in statistics. There is a structural reason for this: data was sparse when they were trained and there wasn’t any reason for them to learn statistics. So statistics and data science wasn’t (and still often isn’t) integrated into medical and scientific curricula.
Even for senior scientists in charge of designing statistical studies?
4. There is an imbalance of power in the scientific process between statisticians/computational scientists and scientific investigators or clinicians. The clinicians/scientific investigators are “in charge” and the statisticians are often relegated to a secondary role. … There are a large number of lonely bioinformaticians out there.
5. Statisticians and computational scientists are also frustrated because there is often no outlet for them to respond to these papers in the formal scientific literature – those outlets are controlled by scientists and rarely have statisticians in positions of influence within the journals.
Since statistics is everywhere (1) and only flashy claims get you into journals (2) and the people leading studies don’t understand statistics very well (3), you get many publications where the paper makes a big claim based on shaky statistics but it gets through. This then frustrates the statisticians because they have little control over the process (4) and can’t get their concerns into the published literature (5).
This used to just result in lots of statisticians and computational scientists complaining behind closed doors. The internet changed all that, everyone is an internet scientist now.
…Sometimes to get attention, statisticians start to have the same problem as scientists; they need their complaints to get attention to have any effect. So they go over the top. They accuse people of fraud, or being statistically dumb, or nefarious, or intentionally doing things with data, or cast a wide net and try to implicate a large number of scientists in poor statistics. The ironic thing is that these things are the same thing that the scientists are doing to get attention that frustrated the statisticians in the first place.
Just to be 100% clear here I am also guilty of this. I have definitely fallen into the hype trap – talking about the “replicability crisis”. I also made the mistake earlier in my blogging career of trashing the statistics of a paper that frustrated me. …
I also understand the feeling of “being under attack”. I’ve had that happen to me too and it doesn’t feel good. So where do we go from here? How do we end statistical vitriol and make statistics a positive force? Here is my six part plan:
- We should create continuing education for senior scientists and physicians in statistical and open data thinking so people who never got that training can understand the unique requirements of a data rich scientific world.
- We should encourage journals and funders to incorporate statisticians and computational scientists at the highest levels of influence so that they can drive policy that makes sense in this new data driven time.
- We should recognize that scientists and data generators have a lot more on the line when they produce a result or a scientific data set. We should give them appropriate credit for doing that even if they don’t get the analysis exactly right.
- We should de-escalate the consequences of statistical mistakes. Right now the consequences are: retractions that hurt careers, blog posts that are aggressive and often too personal, and humiliation by the community. We should make it easy to acknowledge these errors without ruining careers. This will be hard – scientist’s careers often depend on the results they get (recall 2 above). So we need a way to pump up/give credit to/acknowledge scientists who are willing to sacrifice that to get the stats right.
- We need to stop treating retractions/statistical errors/mistakes like a sport where there are winners and losers. Statistical criticism should be easy, allowable, publishable and not angry or personal.
- Any paper where statistical analysis is part of the paper must have both a statistically trained author or a statistically trained reviewer or both. I wouldn’t believe a paper on genomics that was performed entirely by statisticians with no biology training any more than I believe a paper with statistics in it performed entirely by physicians with no statistical training.
I think scientists forget that statisticians feel un-empowered in the scientific process and statisticians forget that a lot is riding on any given study for a scientist. So being a little more sympathetic to the pressures we all face would go a long way to resolving statistical vitriol.
What do you think of his six part plan? More carrots or more sticks? (you can read his post here.)
There may be a fairly wide disparity between the handling of these issues in medicine and biology as opposed to the social sciences. In psychology at least, it appears my predictions (vague, but clear enough) of the likely untoward consequences of their way of handling their “replication crisis” are proving all too true. (See, for example, this post.)
Compare Leek to Gelman’s recent blog on the person raising accusations of “methodological terrorism”, Susan Fiske. (I don’t know if Fiske coined the term, but I consider the analogy reprehensible and think she should retract the term.) Here’s from Gelman:
Who is Susan Fiske and why does she think there are methodological terrorists running around? I can’t be sure about the latter point because she declines to say who these terrorists are or point to any specific acts of terror. Her article provides exactly zero evidence but instead gives some uncheckable half-anecdotes.
I first heard of Susan Fiske because her name was attached as editor to the aforementioned PPNAS articles on himmicanes, etc. So, at least in some cases, she’s a poor judge of social science research….
Fiske’s own published work has some issues too. I make no statement about her research in general, as I haven’t read most of her papers. What I do know is what Nick Brown sent me: [an article] by Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske (Journal of Social Issues, 2005). . . .
This paper was just riddled through with errors. First off, its main claims were supported by t statistics of 5.03 and 11.14 . . . ummmmm, upon recalculation the values were actually 1.8 and 3.3. So one of the claim wasn’t even “statistically significant” (thus, under the rules, was unpublishable).
….The short story is that Cuddy, Norton, and Fiske made a bunch of data errors—which is too bad, but such things happen—and then when the errors were pointed out to them, they refused to reconsider anything. Their substantive theory is so open-ended that it can explain just about any result, any interaction in any direction.
And that’s why the authors’ claim that fixing the errors “does not change the conclusion of the paper” is both ridiculous and all too true….
The other thing that’s sad here is how Fiske seems to have felt the need to compromise her own principles here. She deplores “unfiltered trash talk,” “unmoderated attacks” and “adversarial viciousness” and insists on the importance of “editorial oversight and peer review.” According to Fiske, criticisms should be “most often in private with a chance to improve (peer review), or at least in moderated exchanges (curated comments and rebuttals).” And she writes of “scientific standards, ethical norms, and mutual respect.”
But Fiske expresses these views in an unvetted attack in an unmoderated forum with no peer review or opportunity for comments or rebuttals, meanwhile referring to her unnamed adversaries as “methological terrorists.” Sounds like unfiltered trash talk to me. But, then again, I haven’t seen Fiske on the basketball court so I really have no idea what she sounds like when she’s really trash talkin’. (You can read Gelman’s post, which also includes a useful chronology of events, here.)
How can Leek’s 6 point list of “peaceful engagement” work in cases where authors deny the errors really matter? What if they view statistics as so much holy water to dribble over their data, mere window-dressing to attain a veneer of science? I have heard some (successful) social scientists say this aloud (privately)! Far from showing the claims they infer may be represented as unsuccessful attempts to falsify (as good Popperians would demand), the entire effort is a self-sealing affair, dressed up with statistical razzmatazz.
So, I concur with Gelman who has no sympathy for those who wish to protect their work from criticism, going merrily on their way using significance tests illicitly. I also have no sympathy for those who think the cure is merely lowering p-values or embracing methods where the assessment and control of error probabilities are absent. For me, error probability control is not for good long-run error rates, by the way, but to ensure a severe probing of error in the case at hand.
One group may unfairly call the critics “methodological terrorists.” Another may unfairly demonize the statistical methods as the villains to be blamed, banned and eradicated. It’s all the p-value’s fault there’s bad science (never mind that the lack of replication and fraudbusting are based on the use of significance tests). Worse, in some circles, methods that neatly hide the damage from biasing selection effects are championed (in high places)!
Gelman says the paradigm of erroneously moving from an already spurious p-value to a substantive claim—thereby doubling up on the blunders–is dead. Is it? That would be swell, but I have my doubts, especially in the most troubling areas. They didn’t nail Potti and Nevins whose erroneous cancer trials had life-threatening consequences; we can scarcely feel confident that such finagling isn’t continuing in clinical trials (see this post), though I think there’s some hope for improvements. But how can it be that “senior scientists, the ones leading and designing studies, have little or no training in statistics,” as Leek says? This is exactly why everyone could say “it’s not my job” in the horror story of the Potti and Nevins fraud. At least social psychologists aren’t using their results to base decisions on chemo treatments for breast cancer patients.
In the social sciences, undergoing a replication revolution has raised awareness, no doubt, and it’s altogether a plus that they’re stressing preregistration. But it’s been such a windfall, one cannot help asking: why would a field whose own members frequently write about its “perverse incentives,” have an incentive to kill the cash cow? Especially with all its interesting side-lines? It has a life of its own, and offers a career of its own with grants aplenty. So grist for its mills would need to continue. That’s rather cynical, but unless they’re prepared to call out bad sciences-including mounting serious critiques of widely held experimental routines and measurements (which could well lead to whole swaths of inquiry falling by the wayside), I don’t see how any other outcome is to be expected.
Share your thoughts. I wrote much more, but it got too long. I may continue this…
- “Don’t Throw Out the Error Control Baby with the Bad Statistics Bathwater”
- “P-Value Madness: A Puzzle About the Latest Test Ban, or ‘Don’t Ask, Don’t Tell”
- “Repligate Returns (or, the Non Significance of Nonsignificant Results Are the New Significant Results)
- “The Paradox of Replication and the Vindication of the P-value, but She Can Go Deeper”
Send me related links you find (on comments) and I’ll post them.
1)”There’s no tone problem in psychology” Talyarkoni
2)Menschplaining: Three Ideas for Civil Criticism: Uri Simonsohn on Data Colada
 I do not attribute this stance to Gelman who has made it clear that he cares about what could have happened but didn’t in analyzing tests, and is sympathetic to the idea of statistical tests as error probes:
“But I do not make these decisions on altering, rejecting, and expanding models based on the posterior probability that a model is true. …In statistical terms, an anomaly is a misfit of model to data (or perhaps an internal incoherence of the model), and it can be identified by a (Fisherian) hypothesis test without reference to any particular alternative (what Cox and Hinkley 1974 call “pure significance testing”). … At the next stage, we see science—and applied statistics—as resolving anomalies via the creation of improved models which often include their predecessors as special cases. This view corresponds closely to the error-statistics idea of Mayo (1996)” (Gelman 2011, p. 70).
With respect to “de-escalating” the consequences of error, maybe it should be “3 strikes and you’re out!”
I spent my entire career as an applied statistician working with applied biologists, ecologists, and chemist.
At the beginning of this blog you list Leek’s four reasons for the present situation. I don’t agree with 1 and 2. I agree with only the first sentence in 3 but the whole of 4. To 3 I would add that, in my experience, a lot of scientists have no interest nor aptitude for statistics and some demonstrate actual contempt. For junior scientists many applied statisticians are introverts and poor communicators and are perceived to be very very odd.. Senior scientists rarely meet a statistician as a member of their peer group so statisticians are almost always junior and of low status. Even though they don’t understand it, scientists perceive statistics to be easy and routine. But when one of their own shows an aptitude for computing and knows how to enter data into a statistics package they are perceived to be very clever.
Sometimes academic statisticians don’t help. I know from personal experience that some also perceive applied statisticians as being of low status and of carrying out routine work. The job of teaching statistics to biologists, ecologists, etc is often given to the most junior member of staff – i.e. the guy/girl who has just completed his/her PhD and is new to teaching. This results in statistics being taught as a bunch of equations. Teaching statistics well to biologists etc is challenging and requires real skill, backed up by experience.
The leaders of the statistics profession in the academic societies do not help either. They tend to be senior academics or government statisticians who are very remote from the coalface. I tend to envy engineers. If their models were wrong their bridges (etc) would fall down which leads to a professional organisation that works for the ordinary engineer.
Finally, it doesn’t help when statisticians argue about the benefits of different methods: NHST, Bayes, Bayes Factors, AIC, BIC, Likelihood ratios etc etc. For a low powered study they will all flounder and for a study with good power they will all give a reasonable answer. Because they all utilize the Likelihood as the means of introducing data to the model, they are all probably doing the same thing,.
I am surprised but then not (because of the above) that design (and study conduct) never gets a mention when discussing the replication crisis. I am pretty sure that poor design and study conduct are a major cause or contributor.
And finally, I don’t believe that Leek’s 6 point solution will work for the reasons outlined above.
I guess this doesn’t pass the constructive test.