Statistical fraudbusting

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Understanding Reproducibility & Error Correction in Science

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017
57th Annual Program

Download the 57th Annual Program

The Alfred I. Taub forum:

UNDERSTANDING REPRODUCIBILITY & ERROR CORRECTION IN SCIENCE

Cosponsored by GMS and BU’s BEST at Boston University.
Friday, March 17, 2017
1:00 p.m. – 5:00 p.m.
The Terrace Lounge, George Sherman Union
775 Commonwealth Avenue

  • Reputation, Variation, &, Control: Historical Perspectives
    Jutta Schickore History and Philosophy of Science & Medicine, Indiana University, Bloomington.
  • Crisis in Science: Time for Reform?
    Arturo Casadevall Molecular Microbiology & Immunology, Johns Hopkins
  • Severe Testing: The Key to Error Correction
    Deborah Mayo Philosophy, Virginia Tech
  • Replicate That…. Maintaining a Healthy Failure Rate in Science
    Stuart Firestein Biological Sciences, Columbia

 

boston-mayo-2017

Categories: Announcement, Statistical fraudbusting, Statistics | Leave a comment

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Understanding Reproducibility & Error Correction in Science

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2016–2017
57th Annual Program

Download the 57th Annual Program

The Alfred I. Taub forum:

UNDERSTANDING REPRODUCIBILITY & ERROR CORRECTION IN SCIENCE

Cosponsored by GMS and BU’s BEST at Boston University.
Friday, March 17, 2017
1:00 p.m. – 5:00 p.m.
The Terrace Lounge, George Sherman Union
775 Commonwealth Avenue

  • Reputation, Variation, &, Control: Historical Perspectives
    Jutta Schickore History and Philosophy of Science & Medicine, Indiana University, Bloomington.
  • Crisis in Science: Time for Reform?
    Arturo Casadevall Molecular Microbiology & Immunology, Johns Hopkins
  • Severe Testing: The Key to Error Correction
    Deborah Mayo Philosophy, Virginia Tech
  • Replicate That…. Maintaining a Healthy Failure Rate in Science
    Stuart Firestein Biological Sciences, Columbia

 

boston-mayo-2017

Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics | Leave a comment

Richard Gill: “Integrity or fraud… or just questionable research practices?” (Is Gill too easy on them?)

Professor Gill

Professor Gill

Professor Richard Gill
Statistics Group
Mathematical Institute
Leiden University

It was statistician Richard Gill who first told me about Diederik Stapel (see an earlier post on Diederik). We were at a workshop on Error in the Sciences at Leiden in 2011. I was very lucky to have Gill be assigned as my commentator/presenter—he was excellent! As I was explaining some data problems to him, he suddenly said, “Some people don’t bother to collect data at all!” That’s when I learned about Stapel.

Committees often turn to Gill when someone’s work is up for scrutiny of bad statistics or fraud, or anything in between. Do you think he’s being too easy on researchers when he says, about a given case:

“data has been obtained by some combination of the usual ‘questionable research practices’ [QRPs] which are prevalent in the field in question. Everyone does it this way, in fact, if you don’t, you’d never get anything published. …People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them.”

Isn’t that the danger in relying on deeply felt background beliefs?  Have our attitudes changed (toward QRPs) over the past 3 years (harsher or less harsh)? Here’s a talk of his I blogged 3 years ago (followed by a letter he allowed me to post). I reflect on the pseudoscientific nature of the ‘recovered memories’ program in one of the Geraerts et al. papers in a later post. Continue reading

Categories: 3-year memory lane, junk science, Statistical fraudbusting, Statistics | 4 Comments

Preregistration Challenge: My email exchange

images-2

.

David Mellor, from the Center for Open Science, emailed me asking if I’d announce his Preregistration Challenge on my blog, and I’m glad to do so. You win $1,000 if your properly preregistered paper is published. The recent replication effort in psychology showed, despite the common refrain – “it’s too easy to get low P-values” – that in preregistered replication attempts it’s actually very difficult to get small P-values. (I call this the “paradox of replication”[1].) Here’s our e-mail exchange from this morning:

          Dear Deborah Mayod,

I’m reaching out to individuals who I think may be interested in our recently launched competition, the Preregistration Challenge (https://cos.io/prereg). Based on your blogging, I thought it could be of interest to you and to your readers.

In case you are unfamiliar with it, preregistration specifies in advance the precise study protocols and analytical decisions before data collection, in order to separate the hypothesis-generating exploratory work from the hypothesis testing confirmatory work. 

Though required by law in clinical trials, it is virtually unknown within the basic sciences. We are trying to encourage this new behavior by offering 1,000 researchers $1000 prizes for publishing the results of their preregistered work. 

Please let me know if this is something you would consider blogging about or sharing in other ways. I am happy to discuss further. 

Best,

David
David Mellor, PhD

Project Manager, Preregistration Challenge, Center for Open Science

 

Deborah Mayo To David:                                                                          10:33 AM (1 hour ago)

David: Yes I’m familiar with it, and I hope that it encourages people to avoid data-dependent determinations that bias results. It shows the importance of statistical accounts that can pick up on such biasing selection effects. On the other hand, coupling prereg with some of the flexible inference accounts now in use won’t really help. Moreover, there may, in some fields, be a tendency to research a non-novel, fairly trivial result.

And if they’re going to preregister, why not go blind as well?  Will they?

Best,

Mayo Continue reading

Categories: Announcement, preregistration, Statistical fraudbusting, Statistics | 11 Comments

Findings of the Office of Research Misconduct on the Duke U (Potti/Nevins) cancer trial fraud: No one is punished but the patients

imgres-2Findings of Research Misconduct
A Notice by the Health and Human Services Dept
on 11/09/2015
AGENCY: Office of the Secretary, HHS.
ACTION: Notice.

-----------------------------------------------------------------------

SUMMARY: Notice is hereby given that the Office of Research Integrity 
(ORI) has taken final action in the following case:
    Anil Potti, M.D., Duke University School of Medicine: Based on the 
reports of investigations conducted by Duke University School of 
Medicine (Duke) and additional analysis conducted by ORI in its 
oversight review, ORI found that Dr. Anil Potti, former Associate 
Professor of Medicine, Duke, engaged in research misconduct in research 
supported by National Heart, Lung, and Blood Institute (NHLBI), 
National Institutes of Health (NIH), grant R01 HL072208 and National 
Cancer Institute (NCI), NIH, grants R01 CA136530, R01 CA131049, K12 
CA100639, R01 CA106520, and U54 CA112952.
    ORI found that Respondent engaged in research misconduct by 
including false research data in the following published papers, 
submitted manuscript, grant application, and the research record as 
specified in 1-3 below. Specifically, ORI found that: Continue reading 
Categories: Anil Potti, reproducibility, Statistical fraudbusting, Statistics | 12 Comments

“Probabilism as an Obstacle to Statistical Fraud-Busting”

Boston Colloquium 2013-2014

.

“Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?” was my presentation at the 2014 Boston Colloquium for the Philosophy of Science):“Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge.”  

 As often happens, I never put these slides into a stand alone paper. But I have incorporated them into my book (in progress*), “How to Tell What’s True About Statistical Inference”. Background and slides were posted last year.

Slides (draft from Feb 21, 2014) 

Download the 54th Annual Program

Cosponsored by the Department of Mathematics & Statistics at Boston University.

Friday, February 21, 2014
10 a.m. – 5:30 p.m.
Photonics Center, 9th Floor Colloquium Room (Rm 906)
8 St. Mary’s Street

*Seeing a light at the end of tunnel, finally.
Categories: P-values, significance tests, Statistical fraudbusting, Statistics | 7 Comments

2015 Saturday Night Brainstorming and Task Forces: (4th draft)

img_0737

TFSI workgroup

Saturday Night Brainstorming: The TFSI on NHST–part reblog from here and here, with a substantial 2015 update!

Each year leaders of the movement to “reform” statistical methodology in psychology, social science, and other areas of applied statistics get together around this time for a brainstorming session. They review the latest from the Task Force on Statistical Inference (TFSI), propose new regulations they would like to see adopted, not just by the APA publication manual any more, but all science journals! Since it’s Saturday night, let’s listen in on part of an (imaginary) brainstorming session of the New Reformers. 

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Frustrated that the TFSI has still not banned null hypothesis significance testing (NHST)–a fallacious version of statistical significance tests that dares to violate Fisher’s first rule: It’s illicit to move directly from statistical to substantive effects–the New Reformers have created, and very successfully published in, new meta-level research paradigms designed expressly to study (statistically!) a central question: have the carrots and sticks of reward and punishment been successful in decreasing the use of NHST, and promoting instead use of confidence intervals, power calculations, and meta-analysis of effect sizes? Or not?  

Most recently, the group has helped successfully launch a variety of “replication and reproducibility projects”. Having discovered how much the reward structure encourages bad statistics and gaming the system, they have cleverly pushed to change the reward structure: Failed replications (from a group chosen by a crowd-sourced band of replicationistas ) would not be hidden in those dusty old file drawers, but would be guaranteed to be published without that long, drawn out process of peer review. Do these failed replications indicate the original study was a false positive? or that the replication attempt is a false negative?  It’s hard to say. 

This year, as is typical, there is a new member who is pitching in to contribute what he hopes are novel ideas for reforming statistical practice. In addition, for the first time, there is a science reporter blogging the meeting for her next free lance “bad statistics” piece for a high impact science journal. Notice, it seems this committee only grows, no one has dropped off, in the 3 years I’ve followed them. 

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pawl: This meeting will come to order. I am pleased to welcome our new member, Dr. Ian Nydes, adding to the medical strength we have recently built with epidemiologist S.C.. In addition, we have a science writer with us today, Jenina Oozo. To familiarize everyone, we begin with a review of old business, and gradually turn to new business.

Franz: It’s so darn frustrating after all these years to see researchers still using NHST methods; some of the newer modeling techniques routinely build on numerous applications of those pesky tests.

Jake: And the premier publication outlets in the social sciences still haven’t mandated the severe reforms sorely needed. Hopefully the new blood, Dr. Ian Nydes, can help us go beyond resurrecting the failed attempts of the past. Continue reading

Categories: Comedy, reforming the reformers, science communication, Statistical fraudbusting, statistical tests, Statistics | Tags: , , , , , , | 19 Comments

What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri

images-2

For entertainment only

Here’s the follow-up to my last (reblogged) post. initially here. My take hasn’t changed much from 2013. Should we be labeling some pursuits “for entertainment only”? Why not? (See also a later post on the replication crisis in psych.)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I had said I would label as pseudoscience or questionable science any enterprise that regularly permits the kind of ‘verification biases’ in the statistical dirty laundry list.  How regularly? (I’ve been asked)

Well, surely if it’s as regular as, say, much of social psychology, it goes over the line. But it’s not mere regularity, it’s the nature of the data, the type of inferences being drawn, and the extent of self-scrutiny and recognition of errors shown (or not shown). The regularity is just a consequence of the methodological holes. My standards may be considerably more stringent than most, but quite aside from statistical issues, I simply do not find hypotheses well-tested if they are based on “experiments” that consist of giving questionnaires. At least not without a lot more self-scrutiny and discussion of flaws than I ever see. (There may be counterexamples.)

Attempts to recreate phenomena of interest in typical social science “labs” leave me with the same doubts. Huge gaps often exist between elicited and inferred results. One might locate the problem under “external validity” but to me it is just the general problem of relating statistical data to substantive claims.

Experimental economists (expereconomists) take lab results plus statistics to warrant sometimes ingenious inferences about substantive hypotheses.  Vernon Smith (of the Nobel Prize in Econ) is rare in subjecting his own results to “stress tests”.  I’m not withdrawing the optimistic assertions he cites from EGEK (Mayo 1996) on Duhem-Quine (e.g., from “Rhetoric and Reality” 2001, p. 29). I’d still maintain, “Literal control is not needed to attribute experimental results correctly (whether to affirm or deny a hypothesis). Enough experimental knowledge will do”.  But that requires piece-meal strategies that accumulate, and at least a little bit of “theory” and/or a decent amount of causal understanding.[1]

I think the generalizations extracted from questionnaires allow for an enormous amount of “reading into” the data. Suddenly one finds the “best” explanation. Questionnaires should be deconstructed for how they may be misinterpreted, not to mention how responders tend to guess what the experimenter is looking for. (I’m reminded of the current hoopla over questionnaires on breadwinners, housework and divorce rates!) I respond with the same eye-rolling to just-so story telling along the lines of evolutionary psychology.

I apply the “Stapel test”: Even if Stapel had bothered to actually carry out the data-collection plans that he so carefully crafted, I would not find the inferences especially telling in the least. Take for example the planned-but-not-implemented study discussed in the recent New York Times article on Stapel: Continue reading

Categories: junk science, Statistical fraudbusting, Statistics | 3 Comments

Power Analysis and Non-Replicability: If bad statistics is prevalent in your field, does it follow you can’t be guilty of scientific fraud?

.

fraudbusters

If questionable research practices (QRPs) are prevalent in your field, then apparently you can’t be guilty of scientific misconduct or fraud (by mere QRP finagling), or so some suggest. Isn’t that an incentive for making QRPs the norm? 

The following is a recent blog discussion (by  Ulrich Schimmack) on the Jens Förster scandal: I thank Richard Gill for alerting me. I haven’t fully analyzed Schimmack’s arguments, so please share your reactions. I agree with him on the importance of power analysis, but I’m not sure that the way he’s using it (via his “R index”) shows what he claims. Nor do I see how any of this invalidates, or spares Förster from, the fraud allegations along the lines of Simonsohn[i]. Most importantly, I don’t see that cheating one way vs another changes the scientific status of Forster’s flawed inference. Forster already admitted that faced with unfavorable results, he’d always find ways to fix things until he got results in sync with his theory (on the social psychology of creativity priming). Fraud by any other name.

Förster

Förster

The official report, “Suspicion of scientific misconduct by Dr. Jens Förster,” is anonymous and dated September 2012. An earlier post on this blog, “Who ya gonna call for statistical fraud busting” featured a discussion by Neuroskeptic that I found illuminating, from Discover Magazine: On the “Suspicion of Scientific Misconduct by Jens Förster. Also see Retraction Watch.

Does anyone know the official status of the Forster case?

How Power Analysis Could Have Prevented the Sad Story of Dr. Förster”

From Ulrich Schimmack’s “Replicability Index” blog January 2, 2015. A January 14, 2015 update is here. (occasional emphasis in bright red is mine) Continue reading

Categories: junk science, reproducibility, Statistical fraudbusting, Statistical power, Statistics | Tags: | 22 Comments

Derailment: Faking Science: A true story of academic fraud, by Diederik Stapel (translated into English)

images-16Diederik Stapel’s book, “Ontsporing” has been translated into English, with some modifications. From what I’ve read, it’s interesting in a bizarre, fraudster-porn sort of way.

Faking Science: A true story of academic fraud

Diederik Stapel
Translated by Nicholas J.L. Brown

Nicholas J. L. Brown (nick.brown@free.fr)
Strasbourg, France
December 14, 2014

Derailed_Stapel_tight1

.

Foreword to the Dutch edition

I’ve spun off, lost my way, crashed and burned; whatever you want to call it. It’s not much fun. I was doing fine, but then I became impatient, overambitious, reckless. I wanted to go faster and better and higher and smarter, all the time. I thought it would help if I just took this one tiny little shortcut, but then I found myself more and more often in completely the wrong lane, and in the end I wasn’t even on the road at all. I left the road where I should have gone straight on, and made my own, spectacular, destructive, fatal accident. I’ve ruined my life, but that’s not the worst of it. My recklessness left a multiple pile-up in its wake, which caught up almost everyone important to me: my wife and children, my parents and siblings, colleagues, students, my doctoral candidates, the university, psychology, science, all involved, all hurt or damaged to some degree or other. That’s the worst part, and it’s something I’m going to have to learn to live with for the rest of my life, along with the shame and guilt. I’ve got more regrets than hairs on my head, and an infinite amount of time to think about them. Continue reading

Categories: Statistical fraudbusting, Statistics | Tags: | 4 Comments

Some ironies in the ‘replication crisis’ in social psychology (4th and final installment)

freud mirror espThere are some ironic twists in the way social psychology is dealing with its “replication crisis”, and they may well threaten even the most sincere efforts to put the field on firmer scientific footing–precisely in those areas that evoked the call for a “daisy chain” of replications. Two articles, one from the Guardian (June 14), and a second from The Chronicle of Higher Education (June 23) lay out the sources of what some are calling “Repligate”. The Guardian article is “Physics Envy: Do ‘hard’ sciences hold the solution to the replication crisis in psychology?”

The article in the Chronicle of Higher Education also gets credit for its title: “Replication Crisis in Psychology Research Turns Ugly and Odd”. I’ll likely write this in installments…(2nd, 3rd , 4th)

^^^^^^^^^^^^^^^

The Guardian article answers yes to the question “Do ‘hard’ sciences hold the solution“:

Psychology is evolving faster than ever. For decades now, many areas in psychology have relied on what academics call “questionable research practices” – a comfortable euphemism for types of malpractice that distort science but which fall short of the blackest of frauds, fabricating data.
Continue reading

Categories: junk science, science communication, Statistical fraudbusting, Statistics | 60 Comments

What have we learned from the Anil Potti training and test data fireworks ? Part 1 (draft 2)

toilet-fireworks-by-stephenthruvegas-on-flickr

Over 100 patients signed up for the chance to participate in the clinical trials at Duke (2007-10) that promised a custom-tailored cancer treatment spewed out by a cutting-edge prediction model developed by Anil Potti, Joseph Nevins and their team at Duke. Their model purported to predict your probable response to one or another chemotherapy based on microarray analyses of various tumors. While they are now described as “false pioneers” of personalized cancer treatments, it’s not clear what has been learned from the fireworks surrounding the Potti episode overall. Most of the popular focus has been on glaring typographical and data processing errors—at least that’s what I mainly heard about until recently. Although they were quite crucial to the science in this case,(surely more so than Potti’s CV padding) what interests me now are the general methodological and logical concerns that rarely make it into the popular press. Continue reading

Categories: science communication, selection effects, Statistical fraudbusting | 39 Comments

Scientism and Statisticism: a conference* (i)

images-11A lot of philosophers and scientists seem to be talking about scientism these days–either championing it or worrying about it. What is it? It’s usually a pejorative term describing an unwarranted deference to the so-called scientific method over and above other methods of inquiry. Some push it as a way to combat postmodernism (is that even still around?) Stephen Pinker gives scientism a positive spin (and even offers it as a cure for the malaise of the humanities!)[1]. Anyway, I’m to talk at a conference on Scientism (*not statisticism, that’s my word) taking place in NYC May 16-17. It is organized by Massimo Pigliucci (chair of philosophy at CUNY-Lehman), who has written quite a lot on the topic in the past few years. Information can be found here. In thinking about scientism for this conference, however, I was immediately struck by this puzzle: Continue reading

Categories: Announcement, PhilStatLaw, science communication, Statistical fraudbusting, StatSci meets PhilSci | Tags: | 15 Comments

Who ya gonna call for statistical Fraudbusting? R.A. Fisher, P-values, and error statistics (again)

images-9If there’s somethin’ strange in your neighborhood. Who ya gonna call?(Fisherian Fraudbusters!)*

*[adapted from R. Parker’s “Ghostbusters”]

When you need to warrant serious accusations of bad statistics, if not fraud, where do scientists turn? Answer: To the frequentist error statistical reasoning and to p-value scrutiny, first articulated by R.A. Fisher[i].The latest accusations of big time fraud in social psychology concern the case of Jens Förster. As Richard Gill notes:

The methodology here is not new. It goes back to Fisher (founder of modern statistics) in the 30’s. Many statistics textbooks give as an illustration Fisher’s re-analysis (one could even say: meta-analysis) of Mendel’s data on peas. The tests of goodness of fit were, again and again, too good. There are two ingredients here: (1) the use of the left-tail probability as p-value instead of the right-tail probability. (2) combination of results from a number of independent experiments using a trick invented by Fisher for the purpose, and well known to all statisticians. (Richard D. Gill)

Continue reading

Categories: Error Statistics, Fisher, significance tests, Statistical fraudbusting, Statistics | 42 Comments

Reliability and Reproducibility: Fraudulent p-values through multiple testing (and other biases): S. Stanley Young (Phil 6334: Day#13)

YoungPhoto2008

images-6S. Stanley Young, PhD
Assistant Director for Bioinformatics
National Institute of Statistical Sciences
Research Triangle Park, NC

Here are Dr. Stanley Young’s slides from our April 25 seminar. They contain several tips for unearthing deception by fraudulent p-value reports. Since it’s Saturday night, you might wish to perform an experiment with three 10-sided dice*,recording the results of 100 rolls (3 at a time) on the form on slide 13. An entry, e.g., (0,1,3) becomes an imaginary p-value of .013 associated with the type of tumor, male-female, old-young. You report only hypotheses whose null is rejected at a “p-value” less than .05. Forward your results to me for publication in a peer-reviewed journal.

*Sets of 10-sided dice will be offered as a palindrome prize beginning in May.

Categories: Phil6334, science communication, spurious p values, Statistical fraudbusting, Statistics | Tags: | 12 Comments

Phil 6334 Visitor: S. Stanley Young, “Statistics and Scientific Integrity”

We are pleased to announce our guest speaker at Thursday’s seminar (April 24, 2014): Statistics and Scientific Integrity”:

YoungPhoto2008S. Stanley Young, PhD 
Assistant Director for Bioinformatics
National Institute of Statistical Sciences
Research Triangle Park, NC

Author of Resampling-Based Multiple Testing, Westfall and Young (1993) Wiley.


0471557617

 

 

 

The main readings for the discussion are:

 

Categories: Announcement, evidence-based policy, Phil6334, science communication, selection effects, Statistical fraudbusting, Statistics | 4 Comments

capitalizing on chance (ii)

Mayo playing the slots

DGM playing the slots

I may have been exaggerating one year ago when I started this post with “Hardly a day goes by”, but now it is literally the case*. (This  also pertains to reading for Phil6334 for Thurs. March 6):

Hardly a day goes by where I do not come across an article on the problems for statistical inference based on fallaciously capitalizing on chance: high-powered computer searches and “big” data trolling offer rich hunting grounds out of which apparently impressive results may be “cherry-picked”:

When the hypotheses are tested on the same data that suggested them and when tests of significance are based on such data, then a spurious impression of validity may result. The computed level of significance may have almost no relation to the true level. . . . Suppose that twenty sets of differences have been examined, that one difference seems large enough to test and that this difference turns out to be “significant at the 5 percent level.” Does this mean that differences as large as the one tested would occur by chance only 5 percent of the time when the true difference is zero? The answer is no, because the difference tested has been selected from the twenty differences that were examined. The actual level of significance is not 5 percent, but 64 percent! (Selvin 1970, 104)[1]

…Oh wait -this is from a contributor to Morrison and Henkel way back in 1970! But there is one big contrast, I find, that makes current day reports so much more worrisome: critics of the Morrison and Henkel ilk clearly report that to ignore a variety of “selection effects” results in a fallacious computation of the actual significance level associated with a given inference; clear terminology is used to distinguish the “computed” or “nominal” significance level on the one hand, and the actual or warranted significance level on the other. Continue reading

Categories: junk science, selection effects, spurious p values, Statistical fraudbusting, Statistics | 4 Comments

“Probabilism as an Obstacle to Statistical Fraud-Busting” (draft iii)

IMG_0244Update: Feb. 21, 2014 (slides at end)Ever find when you begin to “type” a paper to which you gave an off-the-cuff title months and months ago that you scarcely know just what you meant or feel up to writing a paper with that (provocative) title? But then, pecking away at the outline of a possible paper crafted to fit the title, you discover it’s just the paper you’re up to writing right now? That’s how I feel about “Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?” (the impromptu title I gave for my paper for the Boston Colloquium for the Philosophy of Science):

The conference is called: “Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge.”  

 Here are some initial chicken-scratchings (draft (i)). Share comments, queries. (I still have 2 weeks to come up with something*.) Continue reading

Categories: P-values, significance tests, Statistical fraudbusting, Statistics | Leave a comment

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE: Revisiting the Foundations of Statistics

BOSTON COLLOQUIUM FOR PHILOSOPHY OF SCIENCE

2013–2014
54th Annual Program

Download the 54th Annual Program

REVISITING THE FOUNDATIONS OF STATISTICS IN THE ERA OF BIG DATA: SCALING UP TO MEET THE CHALLENGE

Cosponsored by the Department of Mathematics & Statistics at Boston University.
Friday, February 21, 2014
10 a.m. – 5:30 p.m.
Photonics Center, 9th Floor Colloquium Room (Rm 906)
8 St. Mary’s Street

10 a.m.–noon

  • Computational Challenges in Genomic Medicine
    Jill Mesirov Computational Biology and Bioinformatics, Broad Institute
  • Selection, Significance, and Signification: Issues in High Energy Physics
    Kent Staley Philosophy, Saint Louis University

1:30–5:30 p.m.

  • Multi-Resolution Inference: An Engineering (Engineered?) Foundation of Statistical Inference
    Xiao-Li Meng Statistics, Harvard University
  • Is the Philosophy of Probabilism an Obstacle to Statistical Fraud Busting?
    Deborah Mayo Philosophy, Virginia Tech
  • Targeted Learning from Big Data
    Mark van der Laan Biostatistics and Statistics, UC Berkeley

Panel Discussion

Boston Colloquium 2013-2014 (3)

Categories: Announcement, philosophy of science, Philosophy of Statistics, Statistical fraudbusting, Statistics | Leave a comment

S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)

 Stanley Young’s guest post arose in connection with Kepler’s Nov. 13, and my November 9 post,and associated comments.

YoungPhoto2008S. Stanley Young, PhD Assistant Director for Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC

Much is made by some of the experimental biologists that their art is oh so sophisticated that mere mortals do not have a chance [to successfully replicate]. Bunk. Agriculture replicates all the time. That is why food is so cheap. The world is growing much more on fewer acres now than it did 10 years ago. Materials science is doing remarkable things using mixtures of materials. Take a look at just about any sports equipment. These two areas and many more use statistical methods: design of experiments, randomization, blind reading of results, etc. and these methods replicate, quite well, thank you. Read about Edwards Deming. Experimental biology experiments are typically run by small teams in what is in effect a cottage industry. Herr professor is usually not in the lab. He/she is busy writing grants. A “hands” guy is in the lab. A computer guy does the numbers. No one is checking other workers’ work. It is a cottage industry to produce papers.

There is a famous failure to replicate that appeared in Science.  A pair of non-estrogens was reported to have a strong estrogenic effect. Six labs wrote into Science saying the could not replicate the effect. I think the back story is as follows. The hands guy tested a very large number of pairs of chemicals. The most extreme pair looked unusual. Lab boss said, write it up. Every assay has some variability, so they reported extreme variability as real. Failure to replicate in six labs. Science editors says, what gives. Lab boss goes to hands guy and says run the pair again. No effect. Lab boss accuses hands guy of data fabrication. They did not replicate their own finding before rushing to publish. I asked the lab for the full data set, but they refused to provide the data.  The EPA is still chasing this will of the wisp, environmental estrogens. False positive results with compelling stories can live a very long time. See [i].

Begley and Ellis visited labs. They saw how the work was done. There are instances where something was tried over and over and when it worked “as expected”, it was a rap. Write the paper and move on. I listened to a young researcher say that she tried for 6 months to replicate results of a paper. Informal conversations with scientists support very poor replication.

One can say that the jury is out as there have been few serious attempts to systematically replicate. There is now starting systematic replication. I say less than 50% of experimental biology claims will replicate.

[i]Hormone Hysterics. Tulane University researchers published a 1996 study claiming that combinations of manmade chemicals (pesticides and PCBs) disrupted normal hormonal processes, causing everything from cancer to infertility to attention deficit disorder.

Media, regulators and environmentalists hailed the study as “astonishing.” Indeed it was as it turned out to be fraud, according to an October 2001 report by federal investigators. Though the study was retracted from publication, the law it spawned wasn’t and continues to be enforced by the EPA. Read more…

Categories: evidence-based policy, junk science, Statistical fraudbusting, Statistics | 20 Comments

Blog at WordPress.com.