P-values

P-Value Statements and Their Unintended(?) Consequences: The June 2019 ASA President’s Corner (b)

2208388671_0d8bc38714

Mayo writing to Kafadar

I never met Karen Kafadar, the 2019 President of the American Statistical Association (ASA), but the other day I wrote to her in response to a call in her extremely interesting June 2019 President’s Corner: “Statistics and Unintended Consequences“:

  • “I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether.”

I only recently came across her call, and I will share my letter below. First, here are some excerpts from her June President’s Corner (her December report is due any day).

Recently, at chapter meetings, conferences, and other events, I’ve had the good fortune to meet many of our members, many of whom feel queasy about the effects of differing views on p-values expressed in the March 2019 supplement of The American Statistician (TAS). The guest editors— Ronald Wasserstein, Allen Schirm, and Nicole Lazar—introduced the ASA Statement on P-Values (2016) by stating the obvious: “Let us be clear. Nothing in the ASA statement is new.” Indeed, the six principles are well-known to statisticians.The guest editors continued, “We hoped that a statement from the world’s largest professional association of statisticians would open a fresh discussion and draw renewed and vigorous attention to changing the practice of science with regards to the use of statistical inference.”…

Wait a minute. I’m confused about who is speaking. The statements “Let us be clear…” and “We hoped that a statement from the world’s largest professional association…” come from the 2016 ASA Statement on P-values. I abbreviate this as ASA I (Wasserstein and Lazar 2016). The March 2019 editorial that Kafadar says is making many members “feel queasy,” is the update (Wasserstein, Schirm, and Lazar 2019). I abbreviate it as ASA II [i]. 

A healthy debate about statistical approaches can lead to better methods. But, just as Wilks and his colleagues discovered, unintended consequences may have arisen: Nonstatisticians (the target of the issue) may be confused about what to do. Worse, “by breaking free from the bonds of statistical significance” as the editors suggest and several authors urge, researchers may read the call to “abandon statistical significance” as “abandon statistical methods altogether.” …

But we may need more. How exactly are researchers supposed to implement this “new concept” of statistical thinking? Without specifics, questions such as “Why is getting rid of p-values so hard?” may lead some of our scientific colleagues to hear the message as, “Abandon p-values”—despite the guest editors’ statement: “We are not recommending that the calculation and use of continuous p-values be discontinued.”

Brad Efron once said, “Those who ignore statistics are condemned to re-invent it.” In his commentary (“It’s not the p-value’s fault”) following the 2016 ASA Statement on P-Values, Yoav Benjamini wrote, “The ASA Board statement about the p-values may be read as discouraging the use of p-values because they can be misused, while the other approaches offered there might be misused in much the same way.” Indeed, p-values (and all statistical methods in general) can be misused. (So may cars and computers and cell phones and alcohol. Even words in the English language get misused!) But banishing them will not prevent misuse; analysts will simply find other ways to document a point—perhaps better ways, but perhaps less reliable ones. And, as Benjamini further writes, p-values have stood the test of time in part because they offer “a first line of defense against being fooled by randomness, separating signal from noise, because the models it requires are simpler than any other statistical tool needs”—especially now that Efron’s bootstrap has become a familiar tool in all branches of science for characterizing uncertainty in statistical estimates.[Benjamini is commenting on ASA I.]

… It is reassuring that “Nature is not seeking to change how it considers statistical evaluation of papers at this time,” but this line is buried in its March 20 editorial, titled “It’s Time to Talk About Ditching Statistical Significance.” Which sentence do you think will be more memorable? We can wait to see if other journals follow BASP’s lead and then respond. But then we’re back to “reactive” versus “proactive” mode (see February’s column), which is how we got here in the first place.

… Indeed, the ASA has a professional responsibility to ensure good science is conducted—and statistical inference is an essential part of good science. Given the confusion in the scientific community (to which the ASA’s peer-reviewed 2019 TAS supplement may have unintentionally contributed), we cannot afford to sit back. After all, that’s what started us down the “abuse of p-values” path. 

Is it unintentional? [ii]

…Tukey wrote years ago about Bayesian methods: “It is relatively clear that discarding Bayesian techniques would be a real mistake; trying to use them everywhere, however, would in my judgment, be a considerably greater mistake.” In the present context, perhaps he might have said: “It is relatively clear that trusting or dismissing results based on a single p-value would be a real mistake; discarding p-values entirely, however, would in my judgment, be a considerably greater mistake.” We should take responsibility for the situation in which we find ourselves today (and during the past decades) to ensure that our well-researched and theoretically sound statistical methodology is neither abused nor dismissed categorically. I welcome your suggestions for how we can communicate the importance of statistical inference and the proper interpretation of p-values to our scientific partners and science journal editors in a way they will understand and appreciate and can use with confidence and comfort—before they change their policies and abandon statistics altogether. Please send me your ideas! 

You can read the full June President’s Corner.

On Fri, Nov 8, 2019 at 2:09 PM Deborah Mayo <mayod@vt.edu> wrote:

Dear Professor Kafadar;

Your article in the President’s Corner of the ASA for June 2019 was sent to me by someone who had read my “P-value Thresholds: Forfeit at your Peril” editorial, invited by John Ioannidis. I find your sentiment welcome and I’m responding to your call for suggestions.

For starters, when representatives of the ASA issue articles criticizing P-values and significance tests, recommending their supplementation or replacement by others, three very simple principles should be followed:

  • The elements of tests should be presented in an accurate, fair and at least reasonably generous manner, rather than presenting mainly abuses of the methods;
  • The latest accepted methods should be included, not just crude nil null hypothesis tests. How these newer methods get around the often-repeated problems should be mentioned.
  • Problems facing the better-known alternatives, recommended as replacements or supplements to significance tests, should be discussed. Such an evaluation should recognize the role of statistical falsification is distinct from (while complementary to) using probability to quantify degrees of confirmation, support, plausibility or belief in a statistical hypothesis or model.

Here’s what I recommend ASA do now in order to correct the distorted picture that is now widespread and growing: Run a conference akin to the one Wasserstein ran on “A World Beyond ‘P < 0.05′” except that it would be on evaluating some competing methods for statistical inference: Comparative Methods of Statistical Inference: Problems and Prospects.

The workshop would consist of serious critical discussions on Bayes Factors, confidence intervals[iii], Likelihoodist methods, other Bayesian approaches (subjective, default non-subjective, empirical), particularly in relation to today’s replication crisis. …

Growth of the use of these alternative methods have been sufficiently widespread to have garnered discussions on well-known problems….The conference I’m describing will easily attract the leading statisticians in the world. …

Sincerely,
D. Mayo

Please share your comments on this blogpost.

************************************

[i] My reference to ASA II refers just to the portion of the editorial encompassing their general recommendations: don’t say significance or significant, oust P-value thresholds. (It mostly encompasses the first 10 pages.) It begins with a review of 4 of the 6 principles from ASA I, even though they are stated in more extreme terms than in ASA I. (As I point out in my blogpost, the result is to give us principles that are in tension with the original 6.) Note my new qualification in [ii]*

[ii]*As soon as I saw the 2019 document, I queried Wasserstein as to the relationship between ASA I and II. It was never clarified. I hope now that it will be, with some kind of disclaimer. That will help, but merely noting that it never came to a Board vote will not quell the confusion now rattling some ASA members. The ASA’s P-value campaign to editors to revise their author guidelines asks them to take account of both ASA I and II. In carrying out the P-value campaign, at which he is highly effective, Ron Wasserstein obviously* wears his Executive Director’s hat. See The ASA’s P-value Project: Why It’s Doing More Harm than Good. So, until some kind of clarification is issued by the ASA, I’ve hit upon this solution.

The ASA P-value Project existed before the 2016 ASA I. The only difference in today’s P-value Project–since the March 20, 2019 editorial by Wasserstein et al– is that the ASA Executive Director (in talks, presentations, correspondence) recommends ASA I and the general stipulations of ASA II–even though that’s not a policy document. I will now call it the 2019 ASA P-value Project II. It also includes the rather stronger principles in ASA II. Even many who entirely agree with the “don’t say significance” and “don’t use P-value thresholds” recommendations have concurred with my “friendly amendments” to ASA II (including, for example, Greenland, Hurlbert, and others). See my post from June 17, 2019.

You merely have to look at the comments to that blog. If Wasserstein would make those slight revisions, the 2019 P-value Project II wouldn’t contain the inconsistencies, or at least “tensions” that it now does, assuming that it retains ASA I. The 2019 ASA P-value Project II sanctions making the recommendations in ASA II, even though ASA II is not an ASA policy statement.

However, I don’t see that those made queasy by ASAII would be any less upset with the reality of the ASA P-value Project II.

[iii]Confidence intervals (CIs) clearly aren’t “alternative measures of evidence” in relation to statistical significance tests. The same man, Neyman, developed tests (with Pearson) and CIs, even earlier ~1930. They were developed as duals, or inversions, of tests. Yet the advocates of CIs–the CI Crusaders, S. Hurlbert calls them–are some of today’s harshest and most ungenerous critics of tests. For these crusaders, it has to be “CIs only”. Supplementing p-values with CIs isn’t good enough. Now look what’s happened to CIS in the latest guidelines of the NEJM. You can readily find them searching NEJM on this blog. (My own favored measure, severity, improves on CIs, moves away from the fixed confidence level, and provides a different assessment corresponding to each point in the CI.

*Or is it not obvious? I think it is, because he is invited and speaks, writes, and corresponds in that capacity.

 

Wasserstein, R. & Lazar, N. (2016) [ASA I], The ASA’s Statement on p-Values: Context, Process, and Purpose”. Volume 70, 2016 – Issue 2.

Wasserstein, R., Schirm, A. and Lazar, N. (2019) [ASA II] “Moving to a World Beyond ‘p < 0.05’”, The American Statistician 73(S1): 1-19: Editorial. (ASA II)(pdf)

Related posts on ASA II:

Related book (excerpts from posts on this blog are collected here)

Mayo, (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, SIST (2018, CUP).

Categories: ASA Guide to P-values, Bayesian/frequentist, P-values | Leave a comment

On Some Self-Defeating Aspects of the ASA’s (2019) Recommendations on Statistical Significance Tests (ii)

.

“Before we stood on the edge of the precipice, now we have taken a great step forward”

 

What’s self-defeating about pursuing statistical reforms in the manner taken by the American Statistical Association (ASA) in 2019? In case you’re not up on the latest in significance testing wars, the 2016 ASA Statement on P-Values and Statistical Significance, ASA I, arguably, was a reasonably consensual statement on the need to avoid some well-known abuses of P-values–notably if you compute P-values, ignoring selective reporting, multiple testing, or stopping when the data look good, the computed P-value will be invalid. (Principle 4, ASA I) But then Ron Wasserstein, executive director of the ASA, and co-editors, decided they weren’t happy with their own 2016 statement because it “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned” altogether. In their new statement–ASA II–they announced: “We take that step here….Statistically significant –don’t say it and don’t use it”.

Why do I say it is a mis-take to have taken the supposed next “great step forward”? Why do I count it as unsuccessful as a piece of statistical science policy? In what ways does it make the situation worse? Let me count the ways. The first is in this post. Others will come in following posts, until I become too disconsolate to continue.[i] Continue reading

Categories: P-values, stat wars and their casualties, statistical significance tests | 12 Comments

National Academies of Science: Please Correct Your Definitions of P-values

Mayo banging head

If you were on a committee to highlight issues surrounding P-values and replication, what’s the first definition you would check? Yes, exactly. Apparently, when it came to the recently released National Academies of Science “Consensus Study” Reproducibility and Replicability in Science 2019, no one did. Continue reading

Categories: ASA Guide to P-values, Error Statistics, P-values | 19 Comments

Hardwicke and Ioannidis, Gelman, and Mayo: P-values: Petitions, Practice, and Perils (and a question for readers)

.

The October 2019 issue of the European Journal of Clinical Investigations came out today. It includes the PERSPECTIVE article by Tom Hardwicke and John Ioannidis, an invited editorial by Gelman and one by me:

Petitions in scientific argumentation: Dissecting the request to retire statistical significance, by Tom Hardwicke and John Ioannidis

When we make recommendations for scientific practice, we are (at best) acting as social scientists, by Andrew Gelman

P-value thresholds: Forfeit at your peril, by Deborah Mayo

I blogged excerpts from my preprint, and some related posts, here.

All agree to the disagreement on the statistical and metastatistical issues: Continue reading

Categories: ASA Guide to P-values, P-values, stat wars and their casualties | 16 Comments

(Excerpts from) ‘P-Value Thresholds: Forfeit at Your Peril’ (free access)

.

A key recognition among those who write on the statistical crisis in science is that the pressure to publish attention-getting articles can incentivize researchers to produce eye-catching but inadequately scrutinized claims. We may see much the same sensationalism in broadcasting metastatistical research, especially if it takes the form of scapegoating or banning statistical significance. A lot of excitement was generated recently when Ron Wasserstein, Executive Director of the American Statistical Association (ASA), and co-editors A. Schirm and N. Lazar, updated the 2016 ASA Statement on P-Values and Statistical Significance (ASA I). In their 2019 interpretation, ASA I “stopped just short of recommending that declarations of ‘statistical significance’ be abandoned,” and in their new statement (ASA II) announced: “We take that step here….’statistically significant’ –don’t say it and don’t use it”. To herald the ASA II, and the special issue “Moving to a world beyond ‘p < 0.05’”, the journal Nature requisitioned a commentary from Amrhein, Greenland and McShane “Retire Statistical Significance” (AGM). With over 800 signatories, the commentary received the imposing title “Scientists rise up against significance tests”! Continue reading

Categories: ASA Guide to P-values, P-values, stat wars and their casualties | 6 Comments

Palavering about Palavering about P-values

.

Nathan Schachtman (who was a special invited speaker at our recent Summer Seminar in Phil Stat) put up a post on his law blog the other day (“Palavering About P-values”) on an article by a statistics professor at Stanford, Helena Kraemer. “Palavering” is an interesting word choice of Schachtman’s. Its range of meanings is relevant here [i]; in my title, I intend both, in turn. You can read Schachtman’s full post here, it begins like this:

The American Statistical Association’s most recent confused and confusing communication about statistical significance testing has given rise to great mischief in the world of science and science publishing.[ASA II 2019] Take for instance last week’s opinion piece about “Is It Time to Ban the P Value?” Please.

Admittedly, their recent statement, which I refer to as ASA II, has seemed to open the floodgates to some very zany remarks about P-values, their meaning and role in statistical testing. Continuing with Schachtman’s post: Continue reading

Categories: ASA Guide to P-values, P-values | Tags: | 12 Comments

Diary For Statistical War Correspondents on the Latest Ban on Speech

When science writers, especially “statistical war correspondents”, contact you to weigh in on some article, they may talk to you until they get something spicy, and then they may or may not include the background context. So a few writers contacted me this past week regarding this article (“Retire Statistical Significance”)–a teaser, I now suppose, to advertise the ASA collection growing out of that conference “A world beyond P ≤ .05” way back in Oct 2017, where I gave a paper*. I jotted down some points, since Richard Harris from NPR needed them immediately, and I had just gotten off a plane when he emailed. He let me follow up with him, which is rare and greatly appreciated. So I streamlined the first set of points, and dropped any points he deemed technical. I sketched the third set for a couple of other journals who contacted me, who may or may not use them. Here’s Harris’ article, which includes a couple of my remarks. Continue reading

Categories: ASA Guide to P-values, P-values | 41 Comments

A letter in response to the ASA’s Statement on p-Values by Ionides, Giessing, Ritov and Page

I came across an interesting letter in response to the ASA’s Statement on p-values that I hadn’t seen before. It’s by Ionides, Giessing, Ritov and Page, and it’s very much worth reading. I make some comments below. Continue reading

Categories: ASA Guide to P-values, P-values | 7 Comments

A small amendment to Nuzzo’s tips for communicating p-values

.

I’ve been asked if I agree with Regina Nuzzo’s recent note on p-values [i]. I don’t want to be nit-picky, but one very small addition to Nuzzo’s helpful tips for communicating statistical significance can make it a great deal more helpful. Here’s my friendly amendment. She writes: Continue reading

Categories: P-values, science communication | 2 Comments

Statistics and the Higgs Discovery: 5-6 yr Memory Lane

.

I’m reblogging a few of the Higgs posts at the 6th anniversary of the 2012 discovery. (The first was in this post.) The following, was originally “Higgs Analysis and Statistical Flukes: part 2″ (from March, 2013).[1]

Some people say to me: “This kind of [severe testing] reasoning is fine for a ‘sexy science’ like high energy physics (HEP)”–as if their statistical inferences are radically different. But I maintain that this is the mode by which data are used in “uncertain” reasoning across the entire landscape of science and day-to-day learning (at least, when we’re trying to find things out)[2] Even with high level theories, the particular problems of learning from data are tackled piecemeal, in local inferences that afford error control. Granted, this statistical philosophy differs importantly from those that view the task as assigning comparative (or absolute) degrees-of-support/belief/plausibility to propositions, models, or theories.  Continue reading

Categories: Higgs, highly probable vs highly probed, P-values | 1 Comment

Why significance testers should reject the argument to “redefine statistical significance”, even if they want to lower the p-value*

.

An argument that assumes the very thing that was to have been argued for is guilty of begging the question; signing on to an argument whose conclusion you favor even though you cannot defend its premises is to argue unsoundly, and in bad faith. When a whirlpool of “reforms” subliminally alter  the nature and goals of a method, falling into these sins can be quite inadvertent. Start with a simple point on defining the power of a statistical test.

I. Redefine Power?

Given that power is one of the most confused concepts from Neyman-Pearson (N-P) frequentist testing, it’s troubling that in “Redefine Statistical Significance”, power gets redefined too. “Power,” we’re told, is a Bayes Factor BF “obtained by defining H1 as putting ½ probability on μ = ± m for the value of m that gives 75% power for the test of size α = 0.05. This H1 represents an effect size typical of that which is implicitly assumed by researchers during experimental design.” (material under Figure 1). Continue reading

Categories: Bayesian/frequentist, fallacy of rejection, P-values, reforming the reformers, spurious p values | 15 Comments

Erich Lehmann’s 100 Birthday: Neyman Pearson vs Fisher on P-values

Erich Lehmann 20 November 1917 – 12 September 2009

Erich Lehmann was born 100 years ago today! (20 November 1917 – 12 September 2009). Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them.*

.

I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that).  He began by telling me that he was sitting in a very large room at an ASA (American Statistical Association) meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, wood table sat just one book, all alone, shiny red.

He said ” I wonder if it might be of interest to me!”  So he walked up to it….  It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after[0]. (What are the chances?) Some related posts on Lehmann’s letter are here and here.

Continue reading

Categories: Fisher, P-values, phil/history of stat | 3 Comments

Yoav Benjamini, “In the world beyond p < .05: When & How to use P < .0499…"

.

These were Yoav Benjamini’s slides,”In the world beyond p<.05: When & How to use P<.0499…” from our session at the ASA 2017 Symposium on Statistical Inference (SSI): A World Beyond p < 0.05. (Mine are in an earlier post.) He begins by asking:

However, it’s mandatory to adjust for selection effects, and Benjamini is one of the leaders in developing ways to carry out the adjustments. Even calling out the avenues for cherry-picking and multiple testing, long known to invalidate p-values, would make replication research more effective (and less open to criticism). Continue reading

Categories: Error Statistics, P-values, replication research, selection effects | 22 Comments

Going round and round again: a roundtable on reproducibility & lowering p-values

.

There will be a roundtable on reproducibility Friday, October 27th (noon Eastern time), hosted by the International Methods Colloquium, on the reproducibility crisis in social sciences motivated by the paper, “Redefine statistical significance.” Recall, that was the paper written by a megateam of researchers as part of the movement to require p ≤ .005, based on appraising significance tests by a Bayes Factor analysis, with prior probabilities on a point null and a given alternative. It seems to me that if you’re prepared to scrutinize your frequentist (error statistical) method on grounds of Bayes Factors, then you must endorse using Bayes Factors (BFs) for inference to begin with. If you don’t endorse BFs–and, in particular, the BF required to get the disagreement with p-values–*, then it doesn’t make sense to appraise your non-Bayesian method on grounds of agreeing or disagreeing with BFs. For suppose you assess the recommended BFs from the perspective of an error statistical account–that is, one that checks how frequently the method would uncover or avoid the relevant mistaken inference.[i] Then, if you reach the stipulated BF level against a null hypothesis, you will find the situation is reversed, and the recommended BF exaggerates the evidence!  (In particular, with high probability, it gives an alternative H’ fairly high posterior probability, or comparatively higher probability, even though H’ is false.) Failing to reach the BF cut-off, by contrast, can find no evidence against, and even finds evidence for, a null hypothesis with high probability, even when non-trivial discrepancies exist. They’re measuring very different things, and it’s illicit to expect an agreement on numbers.[ii] We’ve discussed this quite a lot on this blog (2 are linked below [iii]).

If the given list of panelists is correct, it looks to be 4 against 1, but I’ve no doubt that Lakens can handle it.

Continue reading

Categories: Announcement, P-values, reforming the reformers, selection effects | 5 Comments

Deconstructing “A World Beyond P-values”

.A world beyond p-values?

I was asked to write something explaining the background of my slides (posted here) in relation to the recent ASA “A World Beyond P-values” conference. I took advantage of some long flight delays on my return to jot down some thoughts:

The contrast between the closing session of the conference “A World Beyond P-values,” and the gist of the conference itself, shines a light on a pervasive tension within the “Beyond P-Values” movement. Two very different debates are taking place. First there’s the debate about how to promote better science. This includes welcome reminders of the timeless demands of rigor and integrity required to avoid deceiving ourselves and others–especially crucial in today’s world of high-powered searches and Big Data. That’s what the closing session was about. [1] Continue reading

Categories: P-values, Philosophy of Statistics, reforming the reformers | 9 Comments

Statistical skepticism: How to use significance tests effectively: 7 challenges & how to respond to them

Here are my slides from the ASA Symposium on Statistical Inference : “A World Beyond p < .05”  in the session, “What are the best uses for P-values?”. (Aside from me,our session included Yoav Benjamini and David Robinson, with chair: Nalini Ravishanker.)

7 QUESTIONS

  • Why use a tool that infers from a single (arbitrary) P-value that pertains to a statistical hypothesis H0 to a research claim H*?
  • Why use an incompatible hybrid (of Fisher and N-P)?
  • Why apply a method that uses error probabilities, the sampling distribution, researcher “intentions” and violates the likelihood principle (LP)? You should condition on the data.
  • Why use methods that overstate evidence against a null hypothesis?
  • Why do you use a method that presupposes the underlying statistical model?
  • Why use a measure that doesn’t report effect sizes?
  • Why do you use a method that doesn’t provide posterior probabilities (in hypotheses)?

 

Categories: P-values, spurious p values, statistical tests, Statistics | Leave a comment

New venues for the statistics wars

I was part of something called “a brains blog roundtable” on the business of p-values earlier this week–I’m glad to see philosophers getting involved.

Next week I’ll be in a session that I think is intended to explain what’s right about P-values at an ASA Symposium on Statistical Inference : “A World Beyond p < .05”. Continue reading

Categories: Announcement, Bayesian/frequentist, P-values | 3 Comments

Thieme on the theme of lowering p-value thresholds (for Slate)

.

Here’s an article by Nick Thieme on the same theme as my last blogpost. Thieme, who is Slate’s 2017 AAAS Mass Media Fellow, is the first person to interview me on p-values who (a) was prepared to think through the issue for himself (or herself), and (b) included more than a tiny fragment of my side of the exchange.[i]. Please share your comments.

Will Lowering P-Value Thresholds Help Fix Science? P-values are already all over the map, and they’re also not exactly the problem.

 

 

Illustration by Slate

                 Illustration by Slate

Last week a team of 72 scientists released the preprint of an article attempting to address one aspect of the reproducibility crisis, the crisis of conscience in which scientists are increasingly skeptical about the rigor of our current methods of conducting scientific research.

Their suggestion? Change the threshold for what is considered statistically significant. The team, led by Daniel Benjamin, a behavioral economist from the University of Southern California, is advocating that the “probability value” (p-value) threshold for statistical significance be lowered from the current standard of 0.05 to a much stricter threshold of 0.005. Continue reading

Categories: P-values, reforming the reformers, spurious p values | 14 Comments

“A megateam of reproducibility-minded scientists” look to lowering the p-value

.

Having discussed the “p-values overstate the evidence against the null fallacy” many times over the past few years, I leave it to readers to disinter the issues (pro and con), and appraise the assumptions, in the most recent rehearsal of the well-known Bayesian argument. There’s nothing intrinsically wrong with demanding everyone work with a lowered p-value–if you’re so inclined to embrace a single, dichotomous standard without context-dependent interpretations, especially if larger sample sizes are required to compensate the loss of power. But lowering the p-value won’t solve the problems that vex people (biasing selection effects), and is very likely to introduce new ones (see my comment). Kelly Servick, a reporter from Science, gives the ingredients of the main argument given by “a megateam of reproducibility-minded scientists” in an article out today: Continue reading

Categories: Error Statistics, highly probable vs highly probed, P-values, reforming the reformers | 57 Comments

3 YEARS AGO (JULY 2014): MEMORY LANE

3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: July 2014. I mark in red 3-4 posts from each month that seem most apt for general background on key issues in this blog, excluding those reblogged recently[1]. Posts that are part of a “unit” or a group count as one. This month there are three such groups: 7/8 and 7/10; 7/14 and 7/23; 7/26 and 7/31.

July 2014

  • (7/7) Winner of June Palindrome Contest: Lori Wike
  • (7/8) Higgs Discovery 2 years on (1: “Is particle physics bad science?”)
  • (7/10) Higgs Discovery 2 years on (2: Higgs analysis and statistical flukes)
  • (7/14) “P-values overstate the evidence against the null”: legit or fallacious? (revised)
  • (7/23) Continued:”P-values overstate the evidence against the null”: legit or fallacious?
  • (7/26) S. Senn: “Responder despondency: myths of personalized medicine” (Guest Post)
  • (7/31) Roger Berger on Stephen Senn’s “Blood Simple” with a response by Senn (Guest Posts)

[1] Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

 

Save

Save

Save

Save

Save

Categories: 3-year memory lane, Higgs, P-values | Leave a comment

Blog at WordPress.com.