Monthly Archives: November 2015

Return to the Comedy Hour: P-values vs posterior probabilities (1)

Comedy Hour

Comedy Hour

Did you hear the one about the frequentist significance tester when he was shown the nonfrequentist nature of p-values?

JB [Jim Berger]: I just simulated a long series of tests on a pool of null hypotheses, and I found that among tests with p-values of .05, at least 22%—and typically over 50%—of the null hypotheses are true!(1)

Frequentist Significance Tester: Scratches head: But rejecting the null with a p-value of .05 ensures erroneous rejection no more than 5% of the time!

Raucous laughter ensues!

(Hah, hah,…. I feel I’m back in high school: “So funny, I forgot to laugh!)

The frequentist tester should retort:

Frequentist Significance Tester: But you assumed 50% of the null hypotheses are true, and  computed P(H0|x) (imagining P(H0)= .5)—and then assumed my p-value should agree with the number you get, if it is not to be misleading!

Yet, our significance tester is not heard from as they move on to the next joke…. Continue reading

Categories: Bayesian/frequentist, Comedy, PBP, significance tests, Statistics


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: November 2012. I mark in red three posts that seem most apt for general background on key issues in this blog.[1]. Please check out others that didn’t make the “bright red cut”. If you’re interested in the Likelihood Principle, check “Blogging Birnbaum” and “Likelihood Links”. If you think P-values are hard to explain, see how the “Bad News Bears” struggle to decipher Bayesian probability. (Some of the posts allude to seminars I was giving at the London School of Economics 3 years ago.)

November 2012

[1] I exclude those reblogged fairly recently. Posts that are part of a “unit” or a group of “U-Phils” count as one. Monthly memory lanes began at the blog’s 3-year anniversary in Sept, 2014.

Categories: 3-year memory lane, Statistics

Erich Lehmann: Neyman-Pearson & Fisher on P-values


lone book on table

Today is Erich Lehmann’s birthday (20 November 1917 – 12 September 2009). Lehmann was Neyman’s first student at Berkeley (Ph.D 1942), and his framing of Neyman-Pearson (NP) methods has had an enormous influence on the way we typically view them.

I got to know Erich in 1997, shortly after publication of EGEK (1996). One day, I received a bulging, six-page, handwritten letter from him in tiny, extremely neat scrawl (and many more after that).  He began by telling me that he was sitting in a very large room at an ASA (American Statistical Association) meeting where they were shutting down the conference book display (or maybe they were setting it up), and on a very long, wood table sat just one book, all alone, shiny red.  He said he wondered if it might be of interest to him!  So he walked up to it….  It turned out to be my Error and the Growth of Experimental Knowledge (1996, Chicago), which he reviewed soon after[0]. (What are the chances?) Some related posts on Lehmann’s letter are here and here.

One of Lehmann’s more philosophical papers is Lehmann (1993), “The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two?” We haven’t discussed it before on this blog. Here are some excerpts (blue), and remarks (black)

Erich Lehmann 20 November 1917 – 12 September 2009

Erich Lehmann 20 November 1917 – 12 September 2009

…A distinction frequently made between the approaches of Fisher and Neyman-Pearson is that in the latter the test is carried out at a fixed level, whereas the principal outcome of the former is the statement of a p value that may or may not be followed by a pronouncement concerning significance of the result [p.1243].

The history of this distinction is curious. Throughout the 19th century, testing was carried out rather informally. It was roughly equivalent to calculating an (approximate) p value and rejecting the hypothesis if this value appeared to be sufficiently small. … Fisher, in his 1925 book and later, greatly reduced the needed tabulations by providing tables not of the distributions themselves but of selected quantiles. … These tables allow the calculation only of ranges for the p values; however, they are exactly suited for determining the critical values at which the statistic under consideration becomes significant at a given level. As Fisher wrote in explaining the use of his [chi square] table (1946, p. 80):

In preparing this table we have borne in mind that in practice we do not want to know the exact value of P for any observed [chi square], but, in the first place, whether or not the observed value is open to suspicion. If P is between .1 and .9, there is certainly no reason to suspect the hypothesis tested. If it is below .02, it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 and consider that higher values of [chi square] indicate a real discrepancy.

Similarly, he also wrote (1935, p. 13) that “it is usual and convenient for experimenters to take 5 percent as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard .. .” …. Continue reading

Categories: Neyman, P-values, phil/history of stat, Statistics | Tags: ,

“What does it say about our national commitment to research integrity?”



There’s an important guest editorial by Keith Baggerly and C.K. Gunsalus in today’s issue of the Cancer Letter: Penalty Too Light” on the Duke U. (Potti/Nevins) cancer trial fraud*. Here are some excerpts.

publication date: Nov 13, 2015

Penalty Too Light

What does it say about our national commitment to research integrity that the Department of Health and Human Services’ Office of Research Integrity has concluded that a five-year ban on federal research funding for one individual researcher is a sufficient response to a case involving millions of taxpayer dollars, completely fabricated data, and hundreds to thousands of patients in invasive clinical trials?

This week, ORI released a notice of “final action” in the case of Anil Potti, M.D. The ORI found that Dr. Potti engaged in several instances of research misconduct and banned him from receiving federal funding for five years.

(See my previous post.)

The principles involved are important and the facts complicated. This was not just a matter of research integrity. This was also a case involving direct patient care and millions of dollars in federal and other funding. The duration and extent of deception were extreme. The case catalyzed an Institute of Medicine review of genomics in clinical trials and attracted national media attention.

If there are no further conclusions coming from ORI and if there are no other investigations under way—despite the importance of the issues involved and the five years that have elapsed since research misconduct investigation began, we do not know—a strong argument can be made that neither justice nor the research community have been served by this outcome. Continue reading

Categories: Anil Potti, fraud, science communication, Statistics

Findings of the Office of Research Misconduct on the Duke U (Potti/Nevins) cancer trial fraud: No one is punished but the patients

imgres-2Findings of Research Misconduct
A Notice by the Health and Human Services Dept
on 11/09/2015
AGENCY: Office of the Secretary, HHS.
ACTION: Notice.


SUMMARY: Notice is hereby given that the Office of Research Integrity 
(ORI) has taken final action in the following case:
    Anil Potti, M.D., Duke University School of Medicine: Based on the 
reports of investigations conducted by Duke University School of 
Medicine (Duke) and additional analysis conducted by ORI in its 
oversight review, ORI found that Dr. Anil Potti, former Associate 
Professor of Medicine, Duke, engaged in research misconduct in research 
supported by National Heart, Lung, and Blood Institute (NHLBI), 
National Institutes of Health (NIH), grant R01 HL072208 and National 
Cancer Institute (NCI), NIH, grants R01 CA136530, R01 CA131049, K12 
CA100639, R01 CA106520, and U54 CA112952.
    ORI found that Respondent engaged in research misconduct by 
including false research data in the following published papers, 
submitted manuscript, grant application, and the research record as 
specified in 1-3 below. Specifically, ORI found that: Continue reading 
Categories: Anil Potti, reproducibility, Statistical fraudbusting, Statistics

S. McKinney: On Efron’s “Frequentist Accuracy of Bayesian Estimates” (Guest Post)



Steven McKinney, Ph.D.
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre


On Bradley Efron’s: “Frequentist Accuracy of Bayesian Estimates”

Bradley Efron has produced another fine set of results, yielding a valuable estimate of variability for a Bayesian estimate derived from a Markov Chain Monte Carlo algorithm, in his latest paper “Frequentist accuracy of Bayesian estimates” (J. R. Statist. Soc. B (2015) 77, Part 3, pp. 617–646). I give a general overview of Efron’s brilliance via his Introduction discussion (his words “in double quotes”).

“1. Introduction

The past two decades have witnessed a greatly increased use of Bayesian techniques in statistical applications. Objective Bayes methods, based on neutral or uniformative priors of the type pioneered by Jeffreys, dominate these applications, carried forward on a wave of popularity for Markov chain Monte Carlo (MCMC) algorithms. Good references include Ghosh (2011), Berger (2006) and Kass and Wasserman (1996).”

A nice concise summary, one that should bring joy to anyone interested in Bayesian methods after all the Bayesian-bashing of the middle 20th century. Efron himself has crafted many beautiful results in the Empirical Bayes arena. He has reviewed important differences between Bayesian and frequentist outcomes that point to some as-yet unsettled issues in statistical theory and philosophy such as his scales of evidence work. Continue reading

Categories: Bayesian/frequentist, objective Bayesians, Statistics

Blog at