Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)

Posted on April 26, 2014 by Mayo

S. Stanley Young, PhD
Assistant Director for Bioinformatics
National Institute of Statistical Sciences
Research Triangle Park, NC

Here are Dr. Stanley Young’s slides from our April 25 seminar. They contain several tips for unearthing deception by fraudulent p-value reports. Since it’s Saturday night, you might wish to perform an experiment with three 10-sided dice*,recording the results of 100 rolls (3 at a time) on the form on slide 13. An entry, e.g., (0,1,3) becomes an imaginary p-value of .013 associated with the type of tumor, male-female, old-young. You report only hypotheses whose null is rejected at a “p-value” less than .05. Forward your results to me for publication in a peer-reviewed journal.

*Sets of 10-sided dice will be offered as a palindrome prize beginning in May.

Categories: Phil6334, science communication, spurious p values, Statistical fraudbusting, Statistics | Tags: S. Stanley Young | 12 Comments

12 thoughts on “Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)”

April 27, 2014

Stephen John Senn (@stephensenn)

The key statement is ‘if you report just the significant ones’. The P-value per test is controlled provided that each test is performed correctly. Of course the P-values within a given trial are implausibly independent and dependence may cause some difficulties for interepretation. Nevertheless, if the null is true the expected number of P-values less than 0.05 (say) will not be more than 1/20 however many tests you do.

The main sin of multiplicity is doing lots of stuff and only reporting what appears interesting. This is actually a problem whether or not you do signicance tests.

See
1. Senn S, Bretz F. Power and sample size when multiple endpoints are considered. Pharmaceutical statistics 2007; 6: 161-170.

Reply

April 28, 2014

Mayo

Stephen: Discussions with Stan brought us back to your posts on Dawid’s selection paradox and the question about the connection between a frequentist adjustment, and a Bayesian formulation via hyper parameters. I’m thinking another round at that query might be useful. https://errorstatistics.com/2013/12/03/stephen-senn-dawids-selection-paradox-guest-post/

Reply

April 27, 2014

Mayo

Stan Young led a terrific seminar that really let participants see the applications of the foundational issues we’d been discussing from various philosophical perspectives all semester. To mention just one thing: I hadn’t understood before how resampling can be used to adjust for multiple testing. The entire discussion was extremely illuminating. Thanks so much Stan!

Reply
April 28, 2014

Carlos Cinelli

“Papers following good manufacturing procedures and addressing important questions should be accepted without regard to statistical significance” (slide 28).

Couldn’t agree more!

Reply

April 28, 2014

Mayo

Carlos: I think there is some ambiguity in that point, so I’m glad you raised it so we can see what Stan says. I take his point to be, not that the actual (non-fraudulent, adjusted as needed) p-value wouldn’t be reported, but that there is much to learn from non-significant results as well. Provided, of course, they follow “good manufacturing procedures”–the question is what will that require aside from critical scrutiny as to whether the error probabilities reported are legitimate or illegitimate. If the concern isn’t to control and assess the legitimacy of error probabilities reported (be they p-value, confidence levels, or other), then it’s hard to see the basis for Stan’s emphasis on needed adjustments and avoidance of biased assessments.

Reply

April 28, 2014

Carlos Cinelli

Taking his analogy from Deming (who, by the way, is one of the applied statisticians I admire the most!),I guess his trying to enphasize the incentives!

Today we “reward” significant results. So we give “employees” the wrong incentive – they hunt significance. And so we end up with unreliable “significant” results. Certainly not what we wanted.

If, instead, we “reward” good methods, we will get what we want – good data that we can draw reliable inference, even if the inference is that we can’t settle the question yet (which is better than false precision)!

Reply

April 28, 2014

Mayo

Carlos: Right but this is different from ignoring or throwing out p-values or other error probabilities. I was noticing that his statement was ambiguous.

Reply

April 28, 2014

Carlos Cinelli

* I guess he is

Reply
April 29, 2014

West

Mayo: Apologies for going off-topic, but an interesting small workshop was held this weekend in SC that heavily discussed the problem of multiple testing in particle physics. Figured from your Higgs coverage, you would be interested in perusing the slides. Particularly the one from Robert Cousins. They are public available here – http://indico.cern.ch/event/314840

Reply

April 29, 2014

Mayo

West: test very interested, where was this again? SC?

Reply

April 29, 2014

West

University of South Carolina at Columbia

Reply

April 29, 2014

Mayo

I can’t believe no one told me, not Staley nor Cousins and I’m doing a session on the Higgs next fall with both of them. Thanks for letting me know.

Reply

I welcome constructive comments that are of relevance to the post and the discussion, and discourage detours into irrelevant topics, however interesting, or unconstructive declarations that "you (or they) are just all wrong". If you want to correct or remove a comment, send me an e-mail. If readers have already replied to the comment, you may be asked to replace it to retain comprehension. Cancel reply

Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)

Post navigation

12 thoughts on “Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.

Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)

Related

Post navigation

12 thoughts on “Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)”

The Statistics Wars & Their Casualties

Blog links (references)

Reviews of Statistical Inference as Severe Testing (SIST)

Interviews & Debates on PhilStat (2020)

Interviews on PhilStat (2019)

LSE PH500 Research Seminar (May 21-June 25, 2020): Controversies in Phil Stat

Summer Seminar 2019 (article)

Top Posts & Pages

Conferences & Workshops

RMM Special Topic

Mayo & Spanos, Error Statistics

Follow Blog via Email

My Websites

Recent Posts: PhilStatWars

The Statistics Wars and Their Casualties Videos & Slides from Sessions 1 & 2

THE STATISTICS WARS AND THEIR CASUALTIES VIDEOS & SLIDES FROM SESSIONS 3 & 4

Final session: The Statistics Wars and Their Casualties: 8 December, Session 4

SCHEDULE: The Statistics Wars and Their Casualties: 1 Dec & 8 Dec: Sessions 3 & 4

WORKSHOP

LOG IN/OUT

Archives

© Deborah G. Mayo, Error Statistics Philosophy, 2011-2018 All Rights Reserved.