Some statistical dirty laundry

Objectivity 1: Will the Real Junk Science Please Stand Up?


It’s an apt time to reblog the “statistical dirty laundry” post from 2013 here. I hope we can take up the recommendations from Simmons, Nelson and Simonsohn at the end (Note [5]), which we didn’t last time around.


I finally had a chance to fully read the 2012 Tilberg Report* on “Flawed Science” last night. Here are some stray thoughts…

1. Slipping into pseudoscience.
The authors of the Report say they never anticipated giving a laundry list of “undesirable conduct” by which researchers can flout pretty obvious requirements for the responsible practice of science. It was an accidental byproduct of the investigation of one case (Diederik Stapel, social psychology) that they walked into a culture of “verification bias”[1]. Maybe that’s why I find it so telling. It’s as if they could scarcely believe their ears when people they interviewed “defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences” (Report 48). So they trot out some obvious rules, and it seems to me that they do a rather good job:

One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses. Violations of this fundamental rule, such as continuing an experiment until it works as desired, or excluding unwelcome experimental subjects or results, inevitably tends to confirm the researcher’s research hypotheses, and essentially render the hypotheses immune to the facts…. [T]he use of research procedures in such a way as to ‘repress’ negative results by some means” may be called verification bias. [my emphasis] (Report, 48).

I would place techniques for ‘verification bias’ under the general umbrella of techniques for squelching stringent criticism and repressing severe tests. These gambits make it so easy to find apparent support for one’s pet theory or hypotheses, as to count as no evidence at all (see some from their list ). Any field that regularly proceeds this way I would call a pseudoscience, or non-science, following Popper. “Observations or experiments can be accepted as supporting a theory (or a hypothesis, or a scientific assertion) only if these observations or experiments are severe tests of the theory” (Popper 1994, p. 89). [2] It is unclear at what point a field slips into the pseudoscience realm.

2. A role for philosophy of science?
I am intrigued that one of the final recommendations in the Report is this: Continue reading

Categories: junk science, reproducibility, spurious p values, Statistics | 27 Comments

Power Analysis and Non-Replicability: If bad statistics is prevalent in your field, does it follow you can’t be guilty of scientific fraud?



If questionable research practices (QRPs) are prevalent in your field, then apparently you can’t be guilty of scientific misconduct or fraud (by mere QRP finagling), or so some suggest. Isn’t that an incentive for making QRPs the norm? 

The following is a recent blog discussion (by  Ulrich Schimmack) on the Jens Förster scandal: I thank Richard Gill for alerting me. I haven’t fully analyzed Schimmack’s arguments, so please share your reactions. I agree with him on the importance of power analysis, but I’m not sure that the way he’s using it (via his “R index”) shows what he claims. Nor do I see how any of this invalidates, or spares Förster from, the fraud allegations along the lines of Simonsohn[i]. Most importantly, I don’t see that cheating one way vs another changes the scientific status of Forster’s flawed inference. Forster already admitted that faced with unfavorable results, he’d always find ways to fix things until he got results in sync with his theory (on the social psychology of creativity priming). Fraud by any other name.



The official report, “Suspicion of scientific misconduct by Dr. Jens Förster,” is anonymous and dated September 2012. An earlier post on this blog, “Who ya gonna call for statistical fraud busting” featured a discussion by Neuroskeptic that I found illuminating, from Discover Magazine: On the “Suspicion of Scientific Misconduct by Jens Förster. Also see Retraction Watch.

Does anyone know the official status of the Forster case?

How Power Analysis Could Have Prevented the Sad Story of Dr. Förster”

From Ulrich Schimmack’s “Replicability Index” blog January 2, 2015. A January 14, 2015 update is here. (occasional emphasis in bright red is mine) Continue reading

Categories: junk science, reproducibility, Statistical fraudbusting, Statistical power, Statistics | Tags: | 22 Comments

Winners of the December 2014 Palindrome Contest: TWO!

I am pleased to announce that there were two (returning) winners for the December Palindrome contest.
The requirement was: In addition to Elba, one word: Math

(or maths; mathematics, for anyone brave enough).

The winners in alphabetical order are:




Karthik Durvasula
Visiting Assistant Professor in Phonology & Phonetics at Michigan State University

Palindrome: Ha! Am I at natal bash? tame lives, ol’ able-stats Elba. “Lose vile maths!” a blatant aim, aah!

(This was in honor of my birthday–thanks Karthik!)

Bio: I’m a Visiting Assistant Professor in Phonology & Phonetics at Michigan State University. My work primarily deals with probing people’s subconscious knowledge of (abstract) sound patterns. Recently, I have been working on auditory illusions that stem from the bias that such subconscious knowledge introduces. Continue reading

Categories: Palindrome | 2 Comments

“Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)


more Potti training/validation fireworks

So it turns out there was an internal whistleblower in the Potti scandal at Duke after all (despite denials by the Duke researchers involved ). It was a medical student Brad Perez. It’s in the Jan. 9, 2015 Cancer Letter*. Ever since my first post on Potti last May (part 1), I’ve received various e-mails and phone calls from people wishing to confide their inside scoops and first-hand experiences working with Potti (in a statistical capacity) but I was waiting for some published item. I believe there’s a court case still pending (anyone know?)

Now here we have a great example of something I am increasingly seeing: Challenges to the scientific credentials of data analysis are dismissed as mere differences in statistical philosophies or as understandable disagreements about stringency of data validation.[i] This is further enabled by conceptual fuzziness as to what counts as meaningful replication, validation, legitimate cross-validation.

If so, then statistical philosophy is of crucial practical importance.[ii]

Here’s the bulk of Perez’s memo (my emphasis in bold), followed by an even more remarkable reply from Potti and Nevins. Continue reading

Categories: evidence-based policy, junk science, PhilStat/Med, Statistics | Tags: | 28 Comments

On the Brittleness of Bayesian Inference–An Update: Owhadi and Scovel (guest post)




Houman Owhadi

Professor of Applied and Computational Mathematics and Control and Dynamical Systems,
Computing + Mathematical Sciences
California Institute of Technology, USA




Clint Scovel
Senior Scientist,
Computing + Mathematical Sciences
California Institute of Technology, USA


 “On the Brittleness of Bayesian Inference: An Update”

Dear Readers,

This is an update on the results discussed in (“On the Brittleness of Bayesian Inference”) and a high level presentation of the more  recent paper “Qualitative Robustness in Bayesian Inference” available at

In we looked at the robustness of Bayesian Inference in the classical framework of Bayesian Sensitivity Analysis. In that (classical) framework, the data is fixed, and one computes optimal bounds on (i.e. the sensitivity of) posterior values with respect to variations of the prior in a given class of priors. Now it is already well established that when the class of priors is finite-dimensional then one obtains robustness.  What we observe is that, under general conditions, when the class of priors is finite codimensional, then the optimal bounds on posterior are as large as possible, no matter the number of data points.

Our motivation for specifying a finite co-dimensional  class of priors is to look at what classical Bayesian sensitivity  analysis would conclude under finite  information and the best way to understand this notion of “brittleness under finite information”  is through the simple example already given in and recalled in Example 1. The mechanism causing this “brittleness” has its origin in the fact that, in classical Bayesian Sensitivity Analysis, optimal bounds on posterior values are computed after the observation of the specific value of the data, and that the probability of observing the data under some feasible prior may be arbitrarily small (see Example 2 for an illustration of this phenomenon). This data dependence of worst priors is inherent to this classical framework and the resulting brittleness under finite-information can be seen as an extreme occurrence of the dilation phenomenon (the fact that optimal bounds on prior values may become less precise after conditioning) observed in classical robust Bayesian inference [6]. Continue reading

Categories: Bayesian/frequentist, Statistics | 13 Comments

“When Bayesian Inference Shatters” Owhadi, Scovel, and Sullivan (reblog)

images-9I’m about to post an update of this, most viewed, blogpost, so I reblog it here as a refresher. If interested, you might check the original discussion.


I am grateful to Drs. Owhadi, Scovel and Sullivan for replying to my request for “a plain Jane” explication of their interesting paper, “When Bayesian Inference Shatters”, and especially for permission to post it. 


owhadiHouman Owhadi
Professor of Applied and Computational Mathematics and Control and Dynamical Systems, Computing + Mathematical Sciences,
California Institute of Technology, USA
 Clint Scovel
ClintpicSenior Scientist,
Computing + Mathematical Sciences,
California Institute of Technology, USA
TimSullivanTim Sullivan
Warwick Zeeman Lecturer,
Assistant Professor,
Mathematics Institute,
University of Warwick, UK

“When Bayesian Inference Shatters: A plain Jane explanation”

This is an attempt at a “plain Jane” presentation of the results discussed in the recent arxiv paper “When Bayesian Inference Shatters” located at with the following abstract:

“With the advent of high-performance computing, Bayesian methods are increasingly popular tools for the quantification of uncertainty throughout science and industry. Since these methods impact the making of sometimes critical decisions in increasingly complicated contexts, the sensitivity of their posterior conclusions with respect to the underlying models and prior beliefs is becoming a pressing question. We report new results suggesting that, although Bayesian methods are robust when the number of possible outcomes is finite or when only a finite number of marginals of the data-generating distribution are unknown, they are generically brittle when applied to continuous systems with finite information on the data-generating distribution. This brittleness persists beyond the discretization of continuous systems and suggests that Bayesian inference is generically ill-posed in the sense of Hadamard when applied to such systems: if closeness is defined in terms of the total variation metric or the matching of a finite system of moments, then (1) two practitioners who use arbitrarily close models and observe the same (possibly arbitrarily large amount of) data may reach diametrically opposite conclusions; and (2) any given prior and model can be slightly perturbed to achieve any desired posterior conclusions.”

Now, it is already known from classical Robust Bayesian Inference that Bayesian Inference has some robustness if the random outcomes live in a finite space or if the class of priors considered is finite-dimensional (i.e. what you know is infinite and what you do not know is finite). What we have shown is that if the random outcomes live in an approximation of a continuous space (for instance, when they are decimal numbers given to finite precision) and your class of priors is finite co-dimensional (i.e. what you know is finite and what you do not know may be infinite) then, if the data is observed at a fine enough resolution, the range of posterior values is the deterministic range of the quantity of interest, irrespective of the size of the data. Continue reading

Categories: 3-year memory lane, Bayesian/frequentist, Statistics | 1 Comment

Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)


too strict/not strict enough

Given the daily thrashing significance tests receive because of how preposterously easy it is claimed to satisfy the .05 significance level requirement, it’s surprising[i] to hear Naomi Oreskes blaming the .05 standard as demanding too high a burden of proof for accepting climate change. “Playing Dumb on Climate Change,” N.Y. Times Sunday Revat 2 (Jan. 4, 2015). Is there anything for which significance levels do not serve as convenient whipping boys?  Thanks to lawyer Nathan Schachtman for alerting me to her opinion piece today (congratulations to Oreskes!),and to his current blogpost. I haven’t carefully read her article, but one claim jumped out: scientists, she says, “practice a form of self-denial, denying themselves the right to believe anything that has not passed very high intellectual hurdles.” If only! *I add a few remarks at the end.  Anyhow here’s Schachtman’s post:




Playing Dumb on Statistical Significance”
by Nathan Schachtman

Naomi Oreskes is a professor of the history of science in Harvard University. Her writings on the history of geology are well respected; her writings on climate change tend to be more adversarial, rhetorical, and ad hominem. See, e.g., Naomi Oreskes,Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming(N.Y. 2010). Oreskes’ abuse of the meaning of significance probability for her own rhetorical ends is on display in today’s New York Times. Naomi Oreskes, “Playing Dumb on Climate Change,” N.Y. Times Sunday Revat 2 (Jan. 4, 2015).

Oreskes wants her readers to believe that those who are resisting her conclusions about climate change are hiding behind an unreasonably high burden of proof, which follows from the conventional standard of significance in significance probability. In presenting her argument, Oreskes consistently misrepresents the meaning of statistical significance and confidence intervals to be about the overall burden of proof for a scientific claim:

“Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.”

Although the confidence interval is related to the pre-specified Type I error rate, alpha, and so a conventional alpha of 5% does lead to a coefficient of confidence of 95%, Oreskes has misstated the confidence interval to be a burden of proof consisting of a 95% posterior probability. The “relationship” is either true or not; the p-value or confidence interval provides a probability for the sample statistic, or one more extreme, on the assumption that the null hypothesis is correct. The 95% probability of confidence intervals derives from the long-term frequency that 95% of all confidence intervals, based upon samples of the same size, will contain the true parameter of interest.

Oreskes is an historian, but her history of statistical significance appears equally ill considered. Here is how she describes the “severe” standard of the 95% confidence interval: Continue reading

Categories: evidence-based policy, science communication, Statistics | 59 Comments

No headache power (for Deirdre)



Deirdre McCloskey’s comment leads me to try to give a “no headache” treatment of some key points about the power of a statistical test. (Trigger warning: formal stat people may dislike the informality of my exercise.)

We all know that for a given test, as the probability of a type 1 error goes down the probability of a type 2 error goes up (and power goes down).

And as the probability of a type 2 error goes down (and power goes up), the probability of a type 1 error goes up. Leaving everything else the same. There’s a trade-off between the two error probabilities.(No free lunch.) No headache powder called for.

So if someone said, as the power increases, the probability of a type 1 error decreases, they’d be saying: As the type 2 error decreases, the probability of a type 1 error decreases! That’s the opposite of a trade-off. So you’d know automatically they’d made a mistake or were defining things in a way that differs from standard NP statistical tests.

Before turning to my little exercise, I note that power is defined in terms of a test’s cut-off for rejecting the null, whereas a severity assessment always considers the actual value observed (attained power). Here I’m just trying to clarify regular old power, as defined in a N-P test.


Let’s use a familiar oversimple example to fix the trade-off in our minds so that it cannot be dislodged. Our old friend, test T+ : We’re testing the mean of a Normal distribution with n iid samples, and (for simplicity) known, fixed σ:

H0: µ ≤  0 against H1: µ >  0

Let σ = 2n = 25, so (σ/ √n) = .4. To avoid those annoying X-bars, I will use M for the sample mean. I will abbreviate (σ/ √n) as σx .

  • Test T+ is a rule: reject Hiff M > m*
  • Power of a test T+ is computed in relation to values of µ >  0.
  • The power of T+ against alternative µ =µ1) = Pr(T+ rejects H0 ;µ = µ1) = Pr(M > m*; µ = µ1)

We may abbreviate this as : POW(T+,α, µ = µ1) Continue reading

Categories: power, statistical tests, Statistics | 6 Comments

Blog Contents: Oct.- Dec. 2014

metablog old fashion typewriterBLOG CONTENTS: OCT – DEC 2014*


  • 10/01 Oy Faye! What are the odds of not conflating simple conditional probability and likelihood with Bayesian success stories?
  • 10/05 Diederik Stapel hired to teach “social philosophy” because students got tired of success stories… or something (rejected post)
  • 10/07 A (Jan 14, 2014) interview with Sir David Cox by “Statistics Views”
  • 10/10 BREAKING THE (Royall) LAW! (of likelihood) (C)
  • 10/14 Gelman recognizes his error-statistical (Bayesian) foundations
  • 10/18 PhilStat/Law: Nathan Schachtman: Acknowledging Multiple Comparisons in Statistical Analysis: Courts Can and Must
  • 10/22 September 2014: Blog Contents
  • 10/26 To Quarantine or not to Quarantine?: Science & Policy in the time of Ebola
  • 10/31 Oxford Gaol: Statistical Bogeymen


  • 11/01 Philosophy of Science Assoc. (PSA) symposium on Philosophy of Statistics in the Higgs Experiments “How Many Sigmas to Discovery?”
  • 11/09 “Statistical Flukes, the Higgs Discovery, and 5 Sigma” at the PSA
  • 11/11 The Amazing Randi’s Million Dollar Challenge
  • 11/12 A biased report of the probability of a statistical fluke: Is it cheating?
  • 11/15 Why the Law of Likelihood is bankrupt–as an account of evidence
  • 11/18 Lucien Le Cam: “The Bayesians Hold the Magic”
  • 11/20 Erich Lehmann: Statistician and Poet
  • 11/22 Msc Kvetch: “You are a Medical Statistic”, or “How Medical Care Is Being Corrupted”
  • 11/25 How likelihoodists exaggerate evidence from statistical tests



  • 12/02 My Rutgers Seminar: tomorrow, December 3, on philosophy of statistics
  • 12/04 “Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance” (Dec 3 Seminar slides)
  • 12/06 How power morcellators inadvertently spread uterine cancer
  • 12/11 Msc. Kvetch: What does it mean for a battle to be “lost by the media”?
  • 12/13 S. Stanley Young: Are there mortality co-benefits to the Clean Power Plan? It depends. (Guest Post)
  • 12/17 Announcing Kent Staley’s new book, An Introduction to the Philosophy of Science (CUP)
  • 12/21 Derailment: Faking Science: A true story of academic fraud, by Diederik Stapel (translated into English)
  • 12/23 All I want for Chrismukkah is that critics & “reformers” quit howlers of testing (after 3 yrs of blogging)! So here’s Aris Spanos “Talking Back!”
  • 12/29 To raise the power of a test is to lower (not raise) the “hurdle” for rejecting the null (Ziliac and McCloskey 3 years on)
  • 12/31 Midnight With Birnbaum (Happy New Year)

* Compiled by Jean A. Miller

Categories: blog contents, Statistics | Leave a comment

Midnight With Birnbaum (Happy New Year)

 Just as in the past 3 years since I’ve been blogging, I revisit that spot in the road at 11p.m.*,just outside the Elbar Room, get into a strange-looking taxi, and head to “Midnight With Birnbaum”. I wonder if they’ll come for me this year, given that my Birnbaum article is out… This is what the place I am taken to looks like. [It’s 6 hrs later here, so I’m about to leave…]

You know how in that (not-so) recent movie, “Midnight in Paris,” the main character (I forget who plays it, I saw it on a plane) is a writer finishing a novel, and he steps into a cab that mysteriously picks him up at midnight and transports him back in time where he gets to run his work by such famous authors as Hemingway and Virginia Wolf?  He is impressed when his work earns their approval and he comes back each night in the same mysterious cab…Well, imagine an error statistical philosopher is picked up in a mysterious taxi at midnight (New Year’s Eve 2011 2012, 2013, 2014) and is taken back fifty years and, lo and behold, finds herself in the company of Allan Birnbaum.[i] There are a couple of brief (12/31/14) updates at the end.  



ERROR STATISTICIAN: It’s wonderful to meet you Professor Birnbaum; I’ve always been extremely impressed with the important impact your work has had on philosophical foundations of statistics.  I happen to be writing on your famous argument about the likelihood principle (LP).  (whispers: I can’t believe this!)

BIRNBAUM: Ultimately you know I rejected the LP as failing to control the error probabilities needed for my Confidence concept.

ERROR STATISTICIAN: Yes, but I actually don’t think your argument shows that the LP follows from such frequentist concepts as sufficiency S and the weak conditionality principle WLP.[ii]  Sorry,…I know it’s famous…

BIRNBAUM:  Well, I shall happily invite you to take any case that violates the LP and allow me to demonstrate that the frequentist is led to inconsistency, provided she also wishes to adhere to the WLP and sufficiency (although less than S is needed).

ERROR STATISTICIAN: Well I happen to be a frequentist (error statistical) philosopher; I have recently (2006) found a hole in your proof,…well I hope we can discuss it.

BIRNBAUM: Well, well, well: I’ll bet you a bottle of Elba Grease champagne that I can demonstrate it! Continue reading

Categories: Birnbaum Brakes, Statistics, strong likelihood principle | Tags: , , , | 2 Comments

To raise the power of a test is to lower (not raise) the “hurdle” for rejecting the null (Ziliac and McCloskey 3 years on)

Part 2 Prionvac: The Will to Understand PowerI said I’d reblog one of the 3-year “memory lane” posts marked in red, with a few new comments (in burgundy), from time to time. So let me comment on one referring to Ziliac and McCloskey on power. (from Oct.2011). I would think they’d want to correct some wrong statements, or explain their shifts in meaning. My hope is that, 3 years on, they’ll be ready to do so. By mixing some correct definitions with erroneous ones, they introduce more confusion into the discussion.

From my post 3 years ago: “The Will to Understand Power”: In this post, I will adhere precisely to the text, and offer no new interpretation of tests. Type 1 and 2 errors and power are just formal notions with formal definitions.  But we need to get them right (especially if we are giving expert advice).  You can hate the concepts; just define them correctly please.  They write:

“The error of the second kind is the error of accepting the null hypothesis of (say) zero effect when the null is in face false, that is, then (say) such and such a positive effect is true.”

So far so good (keeping in mind that “positive effect” refers to a parameter discrepancy, say δ, not an observed difference.

And the power of a test to detect that such and such a positive effect δ is true is equal to the probability of rejecting the null hypothesis of (say) zero effect when the null is in fact false, and a positive effect as large as δ is present.


Let this alternative be abbreviated H’(δ):

H’(δ): there is a positive effect as large as δ.

Suppose the test rejects the null when it reaches a significance level of .01.

(1) The power of the test to detect H’(δ) =

P(test rejects null at .01 level; H’(δ) is true).

Say it is 0.85.

“If the power of a test is high, say, 0.85 or higher, then the scientist can be reasonably confident that at minimum the null hypothesis (of, again, zero effect if that is the null chosen) is false and that therefore his rejection of it is highly probably correct”. (Z & M, 132-3).

But this is not so.  Perhaps they are slipping into the cardinal error of mistaking (1) as a posterior probability:

(1’) P(H’(δ) is true| test rejects null at .01 level)! Continue reading

Categories: 3-year memory lane, power, Statistics | Tags: , , | 6 Comments


3 years ago...

3 years ago…

MONTHLY MEMORY LANE: 3 years ago: December 2011. I mark in red 3 posts that seem most apt for general background on key issues in this blog.*

*I announced this new, once-a-month feature at the blog’s 3-year anniversary. I will repost and comment on one of the 3-year old posts from time to time. [I’ve yet to repost and comment on the one from Oct. 2011, but will very shortly.] For newcomers, here’s your chance to catch-up; for old timers,this is philosophy: rereading is essential!


Nov. 2011

Oct. 2011

Sept. 2011 (Within “All She Wrote (so far))

Categories: 3-year memory lane, blog contents, Statistics | Leave a comment

All I want for Chrismukkah is that critics & “reformers” quit howlers of testing (after 3 yrs of blogging)! So here’s Aris Spanos “Tallking Back!”

spanos 2014



This was initially posted as slides from our joint Spring 2014 seminar: “Talking Back to the Critics Using Error Statistics”. (You can enlarge them.) Related reading is Mayo and Spanos (2011)


Categories: Error Statistics, fallacy of rejection, Phil6334, reforming the reformers, Statistics | 27 Comments

Derailment: Faking Science: A true story of academic fraud, by Diederik Stapel (translated into English)

images-16Diederik Stapel’s book, “Ontsporing” has been translated into English, with some modifications. From what I’ve read, it’s interesting in a bizarre, fraudster-porn sort of way.

Faking Science: A true story of academic fraud

Diederik Stapel
Translated by Nicholas J.L. Brown

Nicholas J. L. Brown (
Strasbourg, France
December 14, 2014



Foreword to the Dutch edition

I’ve spun off, lost my way, crashed and burned; whatever you want to call it. It’s not much fun. I was doing fine, but then I became impatient, overambitious, reckless. I wanted to go faster and better and higher and smarter, all the time. I thought it would help if I just took this one tiny little shortcut, but then I found myself more and more often in completely the wrong lane, and in the end I wasn’t even on the road at all. I left the road where I should have gone straight on, and made my own, spectacular, destructive, fatal accident. I’ve ruined my life, but that’s not the worst of it. My recklessness left a multiple pile-up in its wake, which caught up almost everyone important to me: my wife and children, my parents and siblings, colleagues, students, my doctoral candidates, the university, psychology, science, all involved, all hurt or damaged to some degree or other. That’s the worst part, and it’s something I’m going to have to learn to live with for the rest of my life, along with the shame and guilt. I’ve got more regrets than hairs on my head, and an infinite amount of time to think about them. Continue reading

Categories: Statistical fraudbusting, Statistics | Tags: | 4 Comments

Announcing Kent Staley’s new book, An Introduction to the Philosophy of Science (CUP)


Kent Staley has written a clear and engaging introduction to PhilSci that manages to blend the central key topics of philosophy of science with current philosophy of statistics. Quite possibly, Staley explains Error Statistics more clearly in many ways than I do in his 10 page section, 9.4. CONGRATULATIONS STALEY*

You can get this book for free by merely writing one of the simpler palindrome’s in the December contest.

Here’s an excerpt from that section:



9.4 Error-statistical philosophy of science and severe testing

Deborah Mayo has developed an alternative approach to the interpretation of frequentist statistical inference (Mayo 1996). But the idea at the heart of Mayo’s approach is one that can be stated without invoking probability at all. ….

Mayo takes the following “minimal scientific principle for evidence” to be uncontroversial:

Principle 3 (Minimal principle for evidence) Data xo provide poor evidence for H if they result from a method or procedure that has little or no ability of finding flaws in H, even if H is false.(Mayo and Spanos, 2009, 3) Continue reading

Categories: Announcement, Palindrome, Statistics, StatSci meets PhilSci | Tags: | 10 Comments

S. Stanley Young: Are there mortality co-benefits to the Clean Power Plan? It depends. (Guest Post)




S. Stanley Young, PhD
Assistant Director
Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC

Are there mortality co-benefits to the Clean Power Plan? It depends.

Some years ago, I listened to a series of lectures on finance. The professor would ask a rhetorical question, pause to give you some time to think, and then, more often than not, answer his question with, “It depends.” Are there mortality co-benefits to the Clean Power Plan? Is mercury coming from power plants leading to deaths? Well, it depends.

So, rhetorically, is an increase in CO2 a bad thing? There is good and bad in everything. Well, for plants an increase in CO2 is a good thing. They grow faster. They convert CO2 into more food and fiber. They give off more oxygen, which is good for humans. Plants appear to be CO2 starved.

It is argued that CO2 is a greenhouse gas and an increase in CO2 will raise temperatures, ice will melt, sea levels will rise, and coastal area will flood, etc. It depends. In theory yes, in reality, maybe. But a lot of other events must be orchestrated simultaneously. Obviously, that scenario depends on other things as, for the last 18 years, CO2 has continued to go up and temperatures have not. So it depends on other factors, solar radiance, water vapor, El Nino, sun spots, cosmic rays, earth presession, etc., just what the professor said.

young pic 1

So suppose ambient temperatures do go up a few degrees. On balance, is that bad for humans? The evidence is overwhelming that warmer is better for humans. One or two examples are instructive. First, Cox et al., (2013) with the title, “Warmer is healthier: Effects on mortality rates of changes in average fine particulate matter (PM2.5) concentrations and temperatures in 100 U.S. cities.” To quote from the abstract of that paper, “Increases in average daily temperatures appear to significantly reduce average daily mortality rates, as expected from previous research.” Here is their plot of daily mortality rate versus Max temperature. It is clear that as the maximum temperature in a city goes up, mortality goes down. So if the net effect of increasing CO2 is increasing temperature, there should be a reduction in deaths. Continue reading

Categories: evidence-based policy, junk science, Statistics | Tags: | 35 Comments

Msc. Kvetch: What does it mean for a battle to be “lost by the media”?

IMG_17801.  What does it mean for a debate to be “media driven” or a battle to be “lost by the media”? In my last post, I noted that until a few weeks ago, I’d never heard of a “power morcellator.” Nor had I heard of the AAGL–The American Association of Gynecologic Laparoscopists. In an article Battle over morcellation lost ‘in the media’”(Nov 26, 2014) Susan London reports on a recent meeting of the AAGL[i]

The media played a major role in determining the fate of uterine morcellation, suggested a study reported at a meeting sponsored by AAGL.

“How did we lose this battle of uterine morcellation? We lost it in the media,” asserted lead investigator Dr. Adrian C. Balica, director of the minimally invasive gynecologic surgery program at the Robert Wood Johnson Medical School in New Brunswick, N.J.

The “investigation” Balica led consisted of collecting Internet search data using something called the Google Adwords Keyword Planner:

Results showed that the average monthly number of Google searches for the term ‘morcellation’ held steady throughout most of 2013 at about 250 per month, reported Dr. Balica. There was, however, a sharp uptick in December 2013 to more than 2,000 per month, and the number continued to rise to a peak of about 18,000 per month in July 2014. A similar pattern was seen for the terms ‘morcellator,’ ‘fibroids in uterus,’ and ‘morcellation of uterine fibroid.’

The “vitals” of the study are summarized at the start of the article:

Key clinical point: Relevant Google searches rose sharply as the debate unfolded.

Major finding: The mean monthly number of searches for “morcellation” rose from about 250 in July 2013 to 18,000 in July 2014.

Data source: An analysis of Google searches for terms related to the power morcellator debate.

Disclosures: Dr. Balica disclosed that he had no relevant conflicts of interest.

2. Here’s my question: Does a high correlation between Google searches and debate-related terms signify that the debate is “media driven”? I suppose you could call it that, but Dr. Balica is clearly suggesting that something not quite kosher, or not fully factual was responsible for losing “this battle of uterine morcellation”, downplaying the substantial data and real events that drove people (like me) to search the terms upon hearing the FDA announcement in November. Continue reading

Categories: msc kvetch, PhilStat Law, science communication, Statistics | 11 Comments

How power morcellators inadvertently spread uterine cancer

imagesUntil a few weeks ago, I’d never even heard of a “power morcellator.” Nor was I aware of the controversy that has pitted defenders of a woman’s right to choose a minimally invasive laparoscopic procedure in removing fibroids—enabled by the power morcellator–and those who decry the danger it poses in spreading an undetected uterine cancer throughout a woman’s abdomen. The most outspoken member of the anti-morcellation group is surgeon Hooman Noorchashm. His wife, Dr. Amy Reed, had a laparoscopic hysterectomy that resulted in morcellating a hidden cancer, progressing it to Stage IV sarcoma. Below is their video (link is here), followed by a recent FDA warning. I may write this in stages or parts. (I will withhold my view for now, I’d like to know what you think.)

Morcellation: (The full Article is here.)


FDA Safety Communication:images-1

UPDATED Laparoscopic Uterine Power Morcellation in Hysterectomy and Myomectomy: FDA Safety Communication

The following information updates our April 17, 2014 communication.

Date Issued: Nov. 24, 2014

Laparoscopic power morcellators are medical devices used during different types of laparoscopic (minimally invasive) surgeries. These can include certain procedures to treat uterine fibroids, such as removing the uterus (hysterectomy) or removing the uterine fibroids (myomectomy). Morcellation refers to the division of tissue into smaller pieces or fragments and is often used during laparoscopic surgeries to facilitate the removal of tissue through small incision sites.

When used for hysterectomy or myomectomy in women with uterine fibroids, laparoscopic power morcellation poses a risk of spreading unsuspected cancerous tissue, notably uterine sarcomas, beyond the uterus. The FDA is warning against using laparoscopic power morcellators in the majority of women undergoing hysterectomy or myomectomy for uterine fibroids. Health care providers and patients should carefully consider available alternative treatment options for the removal of symptomatic uterine fibroids.

Summary of Problem and Scope: 
Uterine fibroids are noncancerous growths that develop from the muscular tissue of the uterus. Most women will develop uterine fibroids (also called leiomyomas) at some point in their lives, although most cause no symptoms1. In some cases, however, fibroids can cause symptoms, including heavy or prolonged menstrual bleeding, pelvic pressure or pain, and/or frequent urination, requiring medical or surgical therapy.

Many women choose to undergo laparoscopic hysterectomy or myomectomy because these procedures are associated with benefits such as a shorter post-operative recovery time and a reduced risk of infection compared to abdominal hysterectomy and myomectomy2. Many of these laparoscopic procedures are performed using a power morcellator.

Based on an FDA analysis of currently available data, we estimate that approximately 1 in 350 women undergoing hysterectomy or myomectomy for the treatment of fibroids is found to have an unsuspected uterine sarcoma, a type of uterine cancer that includes leiomyosarcoma. At this time, there is no reliable method for predicting or testing whether a woman with fibroids may have a uterine sarcoma.

If laparoscopic power morcellation is performed in women with unsuspected uterine sarcoma, there is a risk that the procedure will spread the cancerous tissue within the abdomen and pelvis, significantly worsening the patient’s long-term survival. While the specific estimate of this risk may not be known with certainty, the FDA believes that the risk is higher than previously understood. Continue reading

Categories: morcellation: FDA warning, Statistics | Tags: | 7 Comments

“Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance” (Dec 3 Seminar slides)

(May 4) 7 Deborah Mayo  “Ontology & Methodology in Statistical Modeling”Below are the slides from my Rutgers seminar for the Department of Statistics and Biostatistics yesterday, since some people have been asking me for them. The abstract is here. I don’t know how explanatory a bare outline like this can be, but I’d be glad to try and answer questions[i]. I am impressed at how interested in foundational matters I found the statisticians (both faculty and students) to be. (There were even a few philosophers in attendance.) It was especially interesting to explore, prior to the seminar, possible connections between severity assessments and confidence distributions, where the latter are along the lines of Min-ge Xie (some recent papers of his may be found here.)

“Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance”

[i]They had requested a general overview of some issues in philosophical foundations of statistics. Much of this will be familiar to readers of this blog.



Categories: Bayesian/frequentist, Error Statistics, Statistics | 11 Comments

My Rutgers Seminar: tomorrow, December 3, on philosophy of statistics

picture-216-1I’ll be talking about philosophy of statistics tomorrow afternoon at Rutgers University, in the Statistics and Biostatistics Department, if you happen to be in the vicinity and are interested.


Seminar Speaker:     Professor Deborah Mayo, Virginia Tech

Title:           Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance

Time:          3:20 – 4:20pm, Wednesday, December 3, 2014 Place:         552 Hill Center


Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance Getting beyond today’s most pressing controversies revolving around statistical methods, I argue, requires scrutinizing their underlying statistical philosophies.Two main philosophies about the roles of probability in statistical inference are probabilism and performance (in the long-run). The first assumes that we need a method of assigning probabilities to hypotheses; the second assumes that the main function of statistical method is to control long-run performance. I offer a third goal: controlling and evaluating the probativeness of methods. An inductive inference, in this conception, takes the form of inferring hypotheses to the extent that they have been well or severely tested. A report of poorly tested claims must also be part of an adequate inference. I develop a statistical philosophy in which error probabilities of methods may be used to evaluate and control the stringency or severity of tests. I then show how the “severe testing” philosophy clarifies and avoids familiar criticisms and abuses of significance tests and cognate methods (e.g., confidence intervals). Severity may be threatened in three main ways: fallacies of statistical tests, unwarranted links between statistical and substantive claims, and violations of model assumptions.

Categories: Announcement, Statistics | 4 Comments

Blog at The Adventure Journal Theme.


Get every new post delivered to your Inbox.

Join 700 other followers