ASA Guide to P-values

Palavering about Palavering about P-values

.

Nathan Schachtman (who was a special invited speaker at our recent Summer Seminar in Phil Stat) put up a post on his law blog the other day (“Palavering About P-values”) on an article by a statistics professor at Stanford, Helena Kraemer. “Palavering” is an interesting word choice of Schachtman’s. Its range of meanings is relevant here [i]; in my title, I intend both, in turn. You can read Schachtman’s full post here, it begins like this:

The American Statistical Association’s most recent confused and confusing communication about statistical significance testing has given rise to great mischief in the world of science and science publishing.[ASA II 2019] Take for instance last week’s opinion piece about “Is It Time to Ban the P Value?” Please.

Admittedly, their recent statement, which I refer to as ASA II, has seemed to open the floodgates to some very zany remarks about P-values, their meaning and role in statistical testing. Continuing with Schachtman’s post:

…Kraemer’s eye-catching title creates the impression that the p-value is unnecessary and inimical to valid inference.

Remarkably, Kraemer’s article commits the very mistake that the ASA set out to correct back in 2016 [ASA I], by conflating the probability of the data under a hypothesis of no association with the probability of a hypothesis given the data:

“If P value is less than .05, that indicates that the study evidence was good enough to support that hypothesis beyond reasonable doubt, in cases in which the P value .05 reflects the current consensus standard for what is reasonable.”

The ASA tried to break the bad habit of scientists’ interpreting p-values as allowing us to assign posterior probabilities, such as beyond a reasonable doubt, to hypotheses, but obviously to no avail.

While I share Schachtman’s puzzlement over a number of remarks in her article, this particular claim, while contorted, need not be regarded as giving a posterior probability to “that hypothesis” (the alternative to a test hypothesis). It is perhaps close to being tautological. If a P-value of .05 “reflects the current consensus standard for what is reasonable” evidence of a discrepancy from a test or null hypothesis, then it is reasonable evidence of such a discrepancy. Of course, she would have needed to say it’s a standard for “beyond a reasonable doubt” (BARD), but there’s no reason to suppose that that standard is best seen as a posterior probability.

I think we should move away from that notion, given how ill-defined and unobtainable it is. That a claim is probable, in any of the manifold senses that is meant, is very different from its having been well tested, corroborated, or its truth well-warranted. It might well be that finding statistically significant increased risks 3 or 4 times is sufficient for inferring a genuine risk exists–beyond a reasonable doubt– given the tests pass audits of their assumptions. The 5 sigma Higgs results warranted claiming a discovery insofar as there was a very high probability of getting less statistically significant results, were the bumps due to background alone. In other words, evidence BARD for H can be supplied by H’s having passed a sufficiently severe test (set of tests). It’s denial may be falsified in the strongest (fallible) manner possible in science. Back to Schachtman:

Perhaps in her most misleading advice, Kraemer asserts that:

“[w]hether P values are banned matters little. All readers (reviewers, patients, clinicians, policy makers, and researchers) can just ignore P values and focus on the quality of research studies and effect sizes to guide decision-making.”

Really? If a high quality study finds an “effect size” of interest, we can now ignore random error?

I agree her claim here is extremely strange, though one can surmise how it’s instigated by some suggested “reforms” in ASA II. It might also be the result of confusing observed or sample effect size with population or parametric effect size (or magnitude of discrepancy). But the real danger in speaking cavalierly about “banning” P-values is not that there aren’t some cases where genuine and spurious effects may be distinguished by eye-balling alone. It is that we lose an entire critical reasoning tool for determining if a statistical claim is based on methods with even moderate capability of revealing mistaken interpretations of data.  The first thing that a statistical consumer needs to ask those who assure them they’re not banning P-values, is whether they’ve so stripped them of their error statistical force as to deprive us of an essential tool for holding the statistical “experts” accountable.

The ASA 2016 Statement, with its “six principles,” has provoked some deliberate or ill-informed distortions in American judicial proceedings, but Kraemer’s editorial creates idiosyncratic meanings for p-values. Even the 2019 ASA “post-modernism” does not advocate ignoring random error and p-values, as opposed to proscribing dichotomous characterization of results as “statistically significant,” or not.

You may have an overly sanguine construal of ASA II (2019 ASA) (as merely “proscribing dichotomous characterization of results”). As I read it, although their actual position is quite vague, their recommended P-values appear to be merely descriptive and do not (or need )not have error probabilistic interpretations, even if assumptions pass audits. Granted, the important Principle 4 in ASA I (that data dredging and multiple testing invalidate P-values), implies that error control matters. But I think this is likely to be just another inconsistency between ASA I and II. Neither mentions Type I or II errors or power (except to say that it is not mentioning them). I think it is the onus of the ASA II authors to clarify this and other points I’ve discussed elsewhere on this blog.

[i]

  1. chattering, talking unproductively and at length
  2. persuading by flattery, browbeating or bullying

 

Categories: ASA Guide to P-values, P-values | Tags: | 10 Comments

The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring? (i)

The New England Journal of Medicine NEJM announced new guidelines for authors for statistical reporting  yesterday*. The ASA describes the change as “in response to the ASA Statement on P-values and Statistical Significance and subsequent The American Statistician special issue on statistical inference” (ASA I and II, in my abbreviation). If so, it seems to have backfired. I don’t know all the differences in the new guidelines, but those explicitly noted appear to me to move in the reverse direction from where the ASA I and II guidelines were heading.

The most notable point is that the NEJM highlights the need for error control, especially for constraining the Type I error probability, and pays a lot of attention to adjusting P-values for multiple testing and post hoc subgroups. ASA I included an important principle (#4) that P-values are altered and may be invalidated by multiple testing, but they do not call for adjustments for multiplicity, nor do I find a discussion of Type I or II error probabilities in the ASA documents. NEJM gives strict requirements for controlling family-wise error rate or false discovery rates (understood as the Benjamini and Hochberg frequentist adjustments). Continue reading

Categories: ASA Guide to P-values | 17 Comments

B. Haig: The ASA’s 2019 update on P-values and significance (ASA II)(Guest Post)

Brian Haig, Professor Emeritus
Department of Psychology
University of Canterbury
Christchurch, New Zealand

The American Statistical Association’s (ASA) recent effort to advise the statistical and scientific communities on how they should think about statistics in research is ambitious in scope. It is concerned with an initial attempt to depict what empirical research might look like in “a world beyond p<0.05” (The American Statistician, 2019, 73, S1,1-401). Quite surprisingly, the main recommendation of the lead editorial article in the Special Issue of The American Statistician devoted to this topic (Wasserstein, Schirm, & Lazar, 2019; hereafter, ASA II) is that “it is time to stop using the term ‘statistically significant’ entirely”. (p.2) ASA II acknowledges the controversial nature of this directive and anticipates that it will be subject to critical examination. Indeed, in a recent post, Deborah Mayo began her evaluation of ASA II by making constructive amendments to three recommendations that appear early in the document (‘Error Statistics Philosophy’, June 17, 2019). These amendments have received numerous endorsements, and I record mine here. In this short commentary, I briefly state a number of general reservations that I have about ASA II. Continue reading

Categories: ASA Guide to P-values, Brian Haig | Tags: | 31 Comments

“The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean” (Some Recommendations)(ii)

Some have asked me why I haven’t blogged on the recent follow-up to the ASA Statement on P-Values and Statistical Significance (Wasserstein and Lazar 2016)–hereafter, ASA I. They’re referring to the editorial by Wasserstein, R., Schirm, A. and Lazar, N. (2019)–hereafter, ASA II–opening a special on-line issue of over 40 contributions responding to the call to describe “a world beyond P < 0.05”.[1] Am I falling down on the job? Not really. All of the issues are thoroughly visited in my Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, SIST (2018, CUP). I invite interested readers to join me on the statistical cruise therein.[2] As the ASA II authors observe: “At times in this editorial and the papers you’ll hear deep dissonance, the echoes of ‘statistics wars’ still simmering today (Mayo 2018)”. True, and reluctance to reopen old wounds has only allowed them to fester. However, I will admit, that when new attempts at reforms are put forward, a philosopher of science who has written on the statistics wars ought to weigh in on the specific prescriptions/proscriptions, especially when a jumble of fuzzy conceptual issues are interwoven through a cacophony of competing reforms. (My published comment on ASA I, “Don’t Throw Out the Error Control Baby With the Bad Statistics Bathwater” is here.) Continue reading

Categories: ASA Guide to P-values, Statistics | 93 Comments

If you like Neyman’s confidence intervals then you like N-P tests

Neyman

Neyman, confronted with unfortunate news would always say “too bad!” At the end of Jerzy Neyman’s birthday week, I cannot help imagining him saying “too bad!” as regards some twists and turns in the statistics wars. First, too bad Neyman-Pearson (N-P) tests aren’t in the ASA Statement (2016) on P-values: “To keep the statement reasonably simple, we did not address alternative hypotheses, error types, or power”. An especially aggrieved “too bad!” would be earned by the fact that those in love with confidence interval estimators don’t appreciate that Neyman developed them (in 1930) as a method with a precise interrelationship with N-P tests. So if you love CI estimators, then you love N-P tests! Continue reading

Categories: ASA Guide to P-values, CIs and tests, Neyman | Leave a comment

Diary For Statistical War Correspondents on the Latest Ban on Speech

When science writers, especially “statistical war correspondents”, contact you to weigh in on some article, they may talk to you until they get something spicy, and then they may or may not include the background context. So a few writers contacted me this past week regarding this article (“Retire Statistical Significance”)–a teaser, I now suppose, to advertise the ASA collection growing out of that conference “A world beyond P ≤ .05” way back in Oct 2017, where I gave a paper*. I jotted down some points, since Richard Harris from NPR needed them immediately, and I had just gotten off a plane when he emailed. He let me follow up with him, which is rare and greatly appreciated. So I streamlined the first set of points, and dropped any points he deemed technical. I sketched the third set for a couple of other journals who contacted me, who may or may not use them. Here’s Harris’ article, which includes a couple of my remarks. Continue reading

Categories: ASA Guide to P-values, P-values | 40 Comments

A letter in response to the ASA’s Statement on p-Values by Ionides, Giessing, Ritov and Page

I came across an interesting letter in response to the ASA’s Statement on p-values that I hadn’t seen before. It’s by Ionides, Giessing, Ritov and Page, and it’s very much worth reading. I make some comments below. Continue reading

Categories: ASA Guide to P-values, P-values | 7 Comments

Blog at WordPress.com.