junk science

S. Senn: “Painful dichotomies” (Guest Post)

.

Stephen Senn
Head of  Competence Center for Methodology and Statistics (CCMS)
Luxembourg Institute of Health
Twitter @stephensenn

Painful dichotomies

The tweet read “Featured review: Only 10% people with tension-type headaches get a benefit from paracetamol” and immediately I thought, ‘how would they know?’ and almost as quickly decided, ‘of course they don’t know, they just think they know’. Sure enough, on following up the link to the Cochrane Review in the tweet it turned out that, yet again, the deadly mix of dichotomies and numbers needed to treat had infected the brains of researchers to the extent that they imagined that they had identified personal response. (See Responder Despondency for a previous post on this subject.)

The bare facts they established are the following:

The International Headache Society recommends the outcome of being pain free two hours after taking a medicine. The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo.

and the false conclusion they immediately asserted is the following

This means that only 10 in 100 or 10% of people benefited because of paracetamol 1000 mg.

To understand the fallacy, look at the accompanying graph. Continue reading

Categories: junk science, PhilStat/Med, Statistics, Stephen Senn | 27 Comments

Richard Gill: “Integrity or fraud… or just questionable research practices?” (Is Gill too easy on them?)

Professor Gill

Professor Gill

Professor Richard Gill
Statistics Group
Mathematical Institute
Leiden University

It was statistician Richard Gill who first told me about Diederik Stapel (see an earlier post on Diederik). We were at a workshop on Error in the Sciences at Leiden in 2011. I was very lucky to have Gill be assigned as my commentator/presenter—he was excellent! As I was explaining some data problems to him, he suddenly said, “Some people don’t bother to collect data at all!” That’s when I learned about Stapel.

Committees often turn to Gill when someone’s work is up for scrutiny of bad statistics or fraud, or anything in between. Do you think he’s being too easy on researchers when he says, about a given case:

“data has been obtained by some combination of the usual ‘questionable research practices’ [QRPs] which are prevalent in the field in question. Everyone does it this way, in fact, if you don’t, you’d never get anything published. …People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them.”

Isn’t that the danger in relying on deeply felt background beliefs?  Have our attitudes changed (toward QRPs) over the past 3 years (harsher or less harsh)? Here’s a talk of his I blogged 3 years ago (followed by a letter he allowed me to post). I reflect on the pseudoscientific nature of the ‘recovered memories’ program in one of the Geraerts et al. papers in a later post. Continue reading

Categories: 3-year memory lane, junk science, Statistical fraudbusting, Statistics | 4 Comments

What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Are we lowering the bar?

images-2

For entertainment only

In a post 3 years ago (“What do these share in common: m&m’s, limbo stick, ovulation, Dale Carnegie? Sat night potpourri”), I expressed doubts about expending serious effort to debunk the statistical credentials of studies that most readers without any statistical training would regard as “for entertainment only,” dubious, or pseudoscientific quackery. It needn’t even be that the claim is implausible, what’s implausible is that it has been well probed in the experiment at hand. Given the attention being paid to such examples by some leading statisticians, and scores of replication researchers over the past 3 years–attention that has been mostly worthwhile–maybe the bar has been lowered. What do you think? Anyway, this is what I blogged 3 years ago. (Oh, I decided to put in a home-made cartoon!) Continue reading

Categories: junk science, replication research, Statistics | 2 Comments

Some statistical dirty laundry: have the stains become permanent?

images

.

Right after our session at the SPSP meeting last Friday, I chaired a symposium on replication that included Brian Earp–an active player in replication research in psychology (Replication and Evidence: A tenuous relationship p. 80). One of the first things he said, according to my notes, is that gambits such as cherry picking, p-hacking, hunting for significance, selective reporting, and other QRPs, had been taught as acceptable become standard practice in psychology, without any special need to adjust p-values or alert the reader to their spuriousness [i]. (He will correct me if I’m wrong[2].) It shocked me to hear it, even though it shouldn’t have, given what I’ve learned about statistical practice in social science. It was the Report on Stapel that really pulled back the curtain on this attitude toward QRPs in social psychology–as discussed in this blogpost 3 years ago. (If you haven’t read Section 5 of the report on flawed science, you should.) Many of us assumed that QRPs, even if still committed, were at least recognized to be bad statistical practices since the time of Morrison and Henkel’s (1970) Significance Test Controversy. A question now is this: have all the confessions of dirty laundry, the fraudbusting of prominent researchers, the pledges to straighten up and fly right, the years of replication research, done anything to remove the stains? I leave the question open for now. Here’s my “statistical dirty laundry” post from 2013: Continue reading

Categories: junk science, reproducibility, spurious p values, Statistics | 4 Comments

Beware of questionable front page articles warning you to beware of questionable front page articles (2)

RR

.

Such articles have continued apace since this blogpost from 2013. During that time, meta-research, replication studies, statistical forensics and fraudbusting have become popular academic fields in their own right. Since I regard the ‘programme’ (to use a Lakatosian term) as essentially a part of the philosophy and methodology of science, I’m all in favor of it—I employed the term “metastatistics” eons ago–but, as a philosopher, I claim there’s a pressing need for meta-meta-research, i.e., a conceptual, logical, and methodological scrutiny of presuppositions and gaps in meta-level work itself.  There was an issue I raised in the section “But what about the statistics?” below that hasn’t been addressed. I question the way size and power (from statistical hypothesis testing) are employed in a “diagnostics and screening” computation that underlies most “most findings are false” articles. (This is (2) in my new “Let PBP” series, and follows upon my last post, comments in burgandy are added, 12/5/15.)

In this time of government cut-backs and sequester, scientists are under increased pressure to dream up ever new strategies to publish attention-getting articles with eye-catching, but inadequately scrutinized, conjectures. Science writers are under similar pressures, and to this end they have found a way to deliver up at least one fire-breathing, front page article a month. How? By writing minor variations on an article about how in this time of government cut-backs and sequester, scientists are under increased pressure to dream up ever new strategies to publish attention-getting articles with eye-catching, but inadequately scrutinized, conjectures. (I’m prepared to admit that meta-research consciousness raising, like “self help books,” warrant frequent revisiting. Lessons are forgotten, and there are always new users of statistics.)

Thus every month or so we see retreads on why most scientific claims are unreliable, biased, wrong, and not even wrong. Maybe that’s the reason the authors of a recent article in The Economist (“Trouble at the Lab“) remain anonymous. (I realize that is their general policy.)  Continue reading

Categories: junk science, Let PBP, P-values, science-wise screening, Statistics | 23 Comments

Will the Real Junk Science Please Stand Up?

Junk Science (as first coined).* Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. (Yes, this was the first popular use of “junk science”, to my knowledge.) For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied.

Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases? Continue reading

Categories: 4 years ago!, junk science, Objectivity, Statistics | Tags: , , , , | 29 Comments

Stapel’s Fix for Science? Admit the story you want to tell and how you “fixed” the statistics to support it!

images-16

.

Stapel’s “fix” for science is to admit it’s all “fixed!”

That recent case of the guy suspected of using faked data for a study on how to promote support for gay marriage in a (retracted) paper, Michael LaCour, is directing a bit of limelight on our star fraudster Diederik Stapel (50+ retractions).

The Chronicle of Higher Education just published an article by Tom Bartlett:Can a Longtime Fraud Help Fix Science? You can read his full interview of Stapel here. A snippet:

You write that “every psychologist has a toolbox of statistical and methodological procedures for those days when the numbers don’t turn out quite right.” Do you think every psychologist uses that toolbox? In other words, is everyone at least a little bit dirty?

Stapel: In essence, yes. The universe doesn’t give answers. There are no data matrices out there. We have to select from reality, and we have to interpret. There’s always dirt, and there’s always selection, and there’s always interpretation. That doesn’t mean it’s all untruthful. We’re dirty because we can only live with models of reality rather than reality itself. It doesn’t mean it’s all a bag of tricks and lies. But that’s where the inconvenience starts. Continue reading

Categories: junk science, Statistics | 11 Comments

Some statistical dirty laundry: The Tilberg (Stapel) Report on “Flawed Science”

Objectivity 1: Will the Real Junk Science Please Stand Up?

.

I had a chance to reread the 2012 Tilberg Report* on “Flawed Science” last night. The full report is now here. The discussion of the statistics is around pp. 17-21 (of course there was so little actual data in this case!) You might find it interesting. Here are some stray thoughts reblogged from 2 years ago…

1. Slipping into pseudoscience.
The authors of the Report say they never anticipated giving a laundry list of “undesirable conduct” by which researchers can flout pretty obvious requirements for the responsible practice of science. It was an accidental byproduct of the investigation of one case (Diederik Stapel, social psychology) that they walked into a culture of “verification bias”[1]. Maybe that’s why I find it so telling. It’s as if they could scarcely believe their ears when people they interviewed “defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences” (Report 48). So they trot out some obvious rules, and it seems to me that they do a rather good job.

One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses. Violations of this fundamental rule, such as continuing an experiment until it works as desired, or excluding unwelcome experimental subjects or results, inevitably tends to confirm the researcher’s research hypotheses, and essentially render the hypotheses immune to the facts…. [T]he use of research procedures in such a way as to ‘repress’ negative results by some means” may be called verification bias. [my emphasis] (Report, 48).

I would place techniques for ‘verification bias’ under the general umbrella of techniques for squelching stringent criticism and repressing severe tests. These gambits make it so easy to find apparent support for one’s pet theory or hypotheses, as to count as no evidence at all (see some from their list ). Any field that regularly proceeds this way I would call a pseudoscience, or non-science, following Popper. “Observations or experiments can be accepted as supporting a theory (or a hypothesis, or a scientific assertion) only if these observations or experiments are severe tests of the theory” (Popper 1994, p. 89). [2] It is unclear at what point a field slips into the pseudoscience realm.

2. A role for philosophy of science?
I am intrigued that one of the final recommendations in the Report is this: Continue reading

Categories: junk science, spurious p values | 14 Comments

Evidence can only strengthen a prior belief in low data veracity, N. Liberman & M. Denzler: “Response”

Förster

Förster

I thought the criticisms of social psychologist Jens Förster were already quite damning (despite some attempts to explain them as mere QRPs), but there’s recently been some pushback from two of his co-authors Liberman and Denzler. Their objections are directed to the application of a distinct method, touted as “Bayesian forensics”, to their joint work with Förster. I discussed it very briefly in a recent “rejected post“. Perhaps the earlier method of criticism was inapplicable to these additional papers, and there’s an interest in seeing those papers retracted as well as the one that was. I don’t claim to know. A distinct “policy” issue is whether there should be uniform standards for retraction calls. At the very least, one would think new methods should be well-vetted before subjecting authors to their indictment (particularly methods which are incapable of issuing in exculpatory evidence, like this one). Here’s a portion of their response. I don’t claim to be up on this case, but I’d be very glad to have reader feedback.

Nira Liberman, School of Psychological Sciences, Tel Aviv University, Israel

Markus Denzler, Federal University of Applied Administrative Sciences, Germany

June 7, 2015

Response to a Report Published by the University of Amsterdam

The University of Amsterdam (UvA) has recently announced the completion of a report that summarizes an examination of all the empirical articles by Jens Förster (JF) during the years of his affiliation with UvA, including those co-authored by us. The report is available online. The report relies solely on statistical evaluation, using the method originally employed in the anonymous complaint against JF, as well as a new version of a method for detecting “low scientific veracity” of data, developed by Prof. Klaassen (2015). The report concludes that some of the examined publications show “strong statistical evidence for low scientific veracity”, some show “inconclusive evidence for low scientific veracity”, and some show “no evidence for low veracity”. UvA announced that on the basis of that report, it would send letters to the Journals, asking them to retract articles from the first category, and to consider retraction of articles in the second category.

After examining the report, we have reached the conclusion that it is misleading, biased and is based on erroneous statistical procedures. In view of that we surmise that it does not present reliable evidence for “low scientific veracity”.

We ask you to consider our criticism of the methods used in UvA’s report and the procedures leading to their recommendations in your decision.

Let us emphasize that we never fabricated or manipulated data, nor have we ever witnessed such behavior on the part of Jens Förster or other co-authors.

Here are our major points of criticism. Please note that, due to time considerations, our examination and criticism focus on papers co-authored by us. Below, we provide some background information and then elaborate on these points. Continue reading

Categories: junk science, reproducibility | Tags: | 9 Comments

“Fraudulent until proved innocent: Is this really the new “Bayesian Forensics”? (rejected post)

Objectivity 1: Will the Real Junk Science Please Stand Up?Fraudulent until proved innocent: Is this really the new “Bayesian Forensics”? (rejected post)

 

 

 

Categories: evidence-based policy, frequentist/Bayesian, junk science, Rejected Posts | 2 Comments

96% Error in “Expert” Testimony Based on Probability of Hair Matches: It’s all Junk!

Objectivity 1: Will the Real Junk Science Please Stand Up?Imagine. The New York Times reported a few days ago that the FBI erroneously identified criminals 96% of the time based on probability assessments using forensic hair samples (up until 2000). Sometimes the hair wasn’t even human, it might have come from a dog, a cat or a fur coat!  I posted on  the unreliability of hair forensics a few years ago.  The forensics of bite marks aren’t much better.[i] John Byrd, forensic analyst and reader of this blog had commented at the time that: “At the root of it is the tradition of hiring non-scientists into the technical positions in the labs. They tended to be agents. That explains a lot about misinterpretation of the weight of evidence and the inability to explain the import of lab findings in court.” DNA is supposed to cure all that. So is it? I don’t know, but apparently the FBI “has agreed to provide free DNA testing where there is either a court order or a request for testing by the prosecution.”[ii] See the FBI report.

Here’s the op-ed from the New York Times from April 27, 2015:

Junk Science at the FBI”

The odds were 10-million-to-one, the prosecution said, against hair strands found at the scene of a 1978 murder of a Washington, D.C., taxi driver belonging to anyone but Santae Tribble. Based largely on this compelling statistic, drawn from the testimony of an analyst with the Federal Bureau of Investigation, Mr. Tribble, 17 at the time, was convicted of the crime and sentenced to 20 years to life.

But the hair did not belong to Mr. Tribble. Some of it wasn’t even human. In 2012, a judge vacated Mr. Tribble’s conviction and dismissed the charges against him when DNA testing showed there was no match between the hair samples, and that one strand had come from a dog.

Mr. Tribble’s case — along with the exoneration of two other men who served decades in prison based on faulty hair-sample analysis — spurred the F.B.I. to conduct a sweeping post-conviction review of 2,500 cases in which its hair-sample lab reported a match.

The preliminary results of that review, which Spencer Hsu of The Washington Post reported last week, are breathtaking: out of 268 criminal cases nationwide between 1985 and 1999, the bureau’s “elite” forensic hair-sample analysts testified wrongly in favor of the prosecution, in 257, or 96 percent of the time. Thirty-two defendants in those cases were sentenced to death; 14 have since been executed or died in prison.Forensic Hair red

The agency is continuing to review the rest of the cases from the pre-DNA era. The Justice Department is working with the Innocence Project and the National Association of Criminal Defense Lawyers to notify the defendants in those cases that they may have grounds for an appeal. It cannot, however, address the thousands of additional cases where potentially flawed testimony came from one of the 500 to 1,000 state or local analysts trained by the F.B.I. Peter Neufeld, co-founder of the Innocence Project, rightly called this a “complete disaster.”

Law enforcement agencies have long known of the dubious value of hair-sample analysis. A 2009 report by the National Research Council found “no scientific support” and “no uniform standards” for the method’s use in positively identifying a suspect. At best, hair-sample analysis can rule out a suspect, or identify a wide class of people with similar characteristics.

Yet until DNA testing became commonplace in the late 1990s, forensic analysts testified confidently to the near-certainty of matches between hair found at crime scenes and samples taken from defendants. The F.B.I. did not even have written standards on how analysts should testify about their findings until 2012.

Continue reading

Categories: evidence-based policy, junk science, PhilStat Law, Statistics | 3 Comments

Are scientists really ready for ‘retraction offsets’ to advance ‘aggregate reproducibility’? (let alone ‘precautionary withdrawals’)

images

.

Given recent evidence of the irreproducibility of a surprising number of published scientific findings, the White House’s Office of Science and Technology Policy (OSTP) sought ideas for “leveraging its role as a significant funder of scientific research to most effectively address the problem”, and announced funding for projects to “reset the self-corrective process of scientific inquiry”. (first noted in this post.)ostp

I was sent some information this morning with a rather long description of the project that received the top government award thus far (and it’s in the millions). I haven’t had time to read the proposal*, which I’ll link to shortly, but for a clear and quick description, you can read the excerpt of an interview of the OSTP representative by the editor of the Newsletter for Innovation in Science Journals (Working Group), Jim Stein, who took the lead in writing the author check list for Nature.

Stein’s queries are in burgundy, OSTP’s are in blue. Occasional comments from me are in black, which I’ll update once I study the fine print of the proposal itself. Continue reading

Categories: junk science, reproducibility, science communication, Statistics | 9 Comments

Trial on Anil Potti’s (clinical) Trial Scandal Postponed Because Lawyers Get the Sniffles (updated)

images

.

Trial in Medical Research Scandal Postponed
By Jay Price

DURHAM, N.C. — A judge in Durham County Superior Court has postponed the first civil trial against Duke University by the estate of a patient who had enrolled in one of a trio of clinical cancer studies that were based on bogus science.

The case is part of what the investigative TV news show “60 Minutes” said could go down in history as one of the biggest medical research frauds ever.

The trial had been scheduled to start Monday, but several attorneys involved contracted flu. Judge Robert C. Ervin hasn’t settled on a new start date, but after a conference call with him Monday night, attorneys in the case said it could be as late as this fall.

Flu? Don’t these lawyers get flu shots? Wasn’t Duke working on a flu vaccine? Delaying til Fall 2015?

The postponement delayed resolution in the long-running case for the two patients still alive among the eight who filed suit. It also prolonged a lengthy public relations headache for Duke Medicine that has included retraction of research papers in major scientific journals, the embarrassing segment on “60 Minutes” and the revelation that the lead scientist had falsely claimed to be a Rhodes Scholar in grant applications and credentials.

Because it’s not considered a class action, the eight cases may be tried individually. The one designated to come first was brought by Walter Jacobs, whose wife, Julie, had enrolled in an advanced stage lung cancer study based on the bad research. She died in 2010.

“We regret that our trial couldn’t go forward on the scheduled date,” said Raleigh attorney Thomas Henson, who is representing Jacobs. “As our filed complaint shows, this case goes straight to the basic rights of human research subjects in clinical trials, and we look forward to having those issues at the forefront of the discussion when we are able to have our trial rescheduled.”

It all began in 2006 with research led by a young Duke researcher named Anil Potti. He claimed to have found genetic markers in tumors that could predict which cancer patients might respond well to what form of cancer therapy. The discovery, which one senior Duke administrator later said would have been a sort of Holy Grail of cancer research if it had been accurate, electrified other scientists in the field.

Then, starting in 2007, came the three clinical trials aimed at testing the approach. These enrolled more than 100 lung and breast cancer patients, and were eventually expected to enroll hundreds more.

Duke shut them down permanently in 2010 after finding serious problems with Potti’s science.

Now some of the patients – or their estates, since many have died from their illnesses – are suing Duke, Potti, his mentor and research collaborator Dr. Joseph Nevins, and various Duke administrators. The suit alleges, among other things, that they had engaged in a systematic plan to commercially develop cancer tests worth billions of dollars while using science that they knew or should have known to be fraudulent.

The latest revelation in the case, based on documents that emerged from the lawsuit and first reported in the Cancer Letter, a newsletter that covers cancer research issues, is that a young researcher working with Potti had alerted university officials to problems with the research data two years before the experiments on the cancer patients were stopped. Continue reading

Categories: junk science, rejected post, Statistics | Tags: | 6 Comments

What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri

images-2

For entertainment only

Here’s the follow-up to my last (reblogged) post. initially here. My take hasn’t changed much from 2013. Should we be labeling some pursuits “for entertainment only”? Why not? (See also a later post on the replication crisis in psych.)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I had said I would label as pseudoscience or questionable science any enterprise that regularly permits the kind of ‘verification biases’ in the statistical dirty laundry list.  How regularly? (I’ve been asked)

Well, surely if it’s as regular as, say, much of social psychology, it goes over the line. But it’s not mere regularity, it’s the nature of the data, the type of inferences being drawn, and the extent of self-scrutiny and recognition of errors shown (or not shown). The regularity is just a consequence of the methodological holes. My standards may be considerably more stringent than most, but quite aside from statistical issues, I simply do not find hypotheses well-tested if they are based on “experiments” that consist of giving questionnaires. At least not without a lot more self-scrutiny and discussion of flaws than I ever see. (There may be counterexamples.)

Attempts to recreate phenomena of interest in typical social science “labs” leave me with the same doubts. Huge gaps often exist between elicited and inferred results. One might locate the problem under “external validity” but to me it is just the general problem of relating statistical data to substantive claims.

Experimental economists (expereconomists) take lab results plus statistics to warrant sometimes ingenious inferences about substantive hypotheses.  Vernon Smith (of the Nobel Prize in Econ) is rare in subjecting his own results to “stress tests”.  I’m not withdrawing the optimistic assertions he cites from EGEK (Mayo 1996) on Duhem-Quine (e.g., from “Rhetoric and Reality” 2001, p. 29). I’d still maintain, “Literal control is not needed to attribute experimental results correctly (whether to affirm or deny a hypothesis). Enough experimental knowledge will do”.  But that requires piece-meal strategies that accumulate, and at least a little bit of “theory” and/or a decent amount of causal understanding.[1]

I think the generalizations extracted from questionnaires allow for an enormous amount of “reading into” the data. Suddenly one finds the “best” explanation. Questionnaires should be deconstructed for how they may be misinterpreted, not to mention how responders tend to guess what the experimenter is looking for. (I’m reminded of the current hoopla over questionnaires on breadwinners, housework and divorce rates!) I respond with the same eye-rolling to just-so story telling along the lines of evolutionary psychology.

I apply the “Stapel test”: Even if Stapel had bothered to actually carry out the data-collection plans that he so carefully crafted, I would not find the inferences especially telling in the least. Take for example the planned-but-not-implemented study discussed in the recent New York Times article on Stapel: Continue reading

Categories: junk science, Statistical fraudbusting, Statistics | 3 Comments

Some statistical dirty laundry

Objectivity 1: Will the Real Junk Science Please Stand Up?

.

It’s an apt time to reblog the “statistical dirty laundry” post from 2013 here. I hope we can take up the recommendations from Simmons, Nelson and Simonsohn at the end (Note [5]), which we didn’t last time around.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I finally had a chance to fully read the 2012 Tilberg Report* on “Flawed Science” last night. Here are some stray thoughts…

1. Slipping into pseudoscience.
The authors of the Report say they never anticipated giving a laundry list of “undesirable conduct” by which researchers can flout pretty obvious requirements for the responsible practice of science. It was an accidental byproduct of the investigation of one case (Diederik Stapel, social psychology) that they walked into a culture of “verification bias”[1]. Maybe that’s why I find it so telling. It’s as if they could scarcely believe their ears when people they interviewed “defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences” (Report 48). So they trot out some obvious rules, and it seems to me that they do a rather good job:

One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses. Violations of this fundamental rule, such as continuing an experiment until it works as desired, or excluding unwelcome experimental subjects or results, inevitably tends to confirm the researcher’s research hypotheses, and essentially render the hypotheses immune to the facts…. [T]he use of research procedures in such a way as to ‘repress’ negative results by some means” may be called verification bias. [my emphasis] (Report, 48).

I would place techniques for ‘verification bias’ under the general umbrella of techniques for squelching stringent criticism and repressing severe tests. These gambits make it so easy to find apparent support for one’s pet theory or hypotheses, as to count as no evidence at all (see some from their list ). Any field that regularly proceeds this way I would call a pseudoscience, or non-science, following Popper. “Observations or experiments can be accepted as supporting a theory (or a hypothesis, or a scientific assertion) only if these observations or experiments are severe tests of the theory” (Popper 1994, p. 89). [2] It is unclear at what point a field slips into the pseudoscience realm.

2. A role for philosophy of science?
I am intrigued that one of the final recommendations in the Report is this: Continue reading

Categories: junk science, reproducibility, spurious p values, Statistics | 27 Comments

Power Analysis and Non-Replicability: If bad statistics is prevalent in your field, does it follow you can’t be guilty of scientific fraud?

.

fraudbusters

If questionable research practices (QRPs) are prevalent in your field, then apparently you can’t be guilty of scientific misconduct or fraud (by mere QRP finagling), or so some suggest. Isn’t that an incentive for making QRPs the norm? 

The following is a recent blog discussion (by  Ulrich Schimmack) on the Jens Förster scandal: I thank Richard Gill for alerting me. I haven’t fully analyzed Schimmack’s arguments, so please share your reactions. I agree with him on the importance of power analysis, but I’m not sure that the way he’s using it (via his “R index”) shows what he claims. Nor do I see how any of this invalidates, or spares Förster from, the fraud allegations along the lines of Simonsohn[i]. Most importantly, I don’t see that cheating one way vs another changes the scientific status of Forster’s flawed inference. Forster already admitted that faced with unfavorable results, he’d always find ways to fix things until he got results in sync with his theory (on the social psychology of creativity priming). Fraud by any other name.

Förster

Förster

The official report, “Suspicion of scientific misconduct by Dr. Jens Förster,” is anonymous and dated September 2012. An earlier post on this blog, “Who ya gonna call for statistical fraud busting” featured a discussion by Neuroskeptic that I found illuminating, from Discover Magazine: On the “Suspicion of Scientific Misconduct by Jens Förster. Also see Retraction Watch.

Does anyone know the official status of the Forster case?

How Power Analysis Could Have Prevented the Sad Story of Dr. Förster”

From Ulrich Schimmack’s “Replicability Index” blog January 2, 2015. A January 14, 2015 update is here. (occasional emphasis in bright red is mine) Continue reading

Categories: junk science, reproducibility, Statistical fraudbusting, Statistical power, Statistics | Tags: | 22 Comments

“Only those samples which fit the model best in cross validation were included” (whistleblower) “I suspect that we likely disagree with what constitutes validation” (Potti and Nevins)

toilet-fireworks-by-stephenthruvegas-on-flickr

more Potti training/validation fireworks

So it turns out there was an internal whistleblower in the Potti scandal at Duke after all (despite denials by the Duke researchers involved ). It was a medical student Brad Perez. It’s in the Jan. 9, 2015 Cancer Letter*. Ever since my first post on Potti last May (part 1), I’ve received various e-mails and phone calls from people wishing to confide their inside scoops and first-hand experiences working with Potti (in a statistical capacity) but I was waiting for some published item. I believe there’s a court case still pending (anyone know?)

Now here we have a great example of something I am increasingly seeing: Challenges to the scientific credentials of data analysis are dismissed as mere differences in statistical philosophies or as understandable disagreements about stringency of data validation.[i] This is further enabled by conceptual fuzziness as to what counts as meaningful replication, validation, legitimate cross-validation.

If so, then statistical philosophy is of crucial practical importance.[ii]

Here’s the bulk of Perez’s memo (my emphasis in bold), followed by an even more remarkable reply from Potti and Nevins. Continue reading

Categories: evidence-based policy, junk science, PhilStat/Med, Statistics | Tags: | 28 Comments

S. Stanley Young: Are there mortality co-benefits to the Clean Power Plan? It depends. (Guest Post)

YoungPhoto2008

.

 

S. Stanley Young, PhD
Assistant Director
Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC

Are there mortality co-benefits to the Clean Power Plan? It depends.

Some years ago, I listened to a series of lectures on finance. The professor would ask a rhetorical question, pause to give you some time to think, and then, more often than not, answer his question with, “It depends.” Are there mortality co-benefits to the Clean Power Plan? Is mercury coming from power plants leading to deaths? Well, it depends.

So, rhetorically, is an increase in CO2 a bad thing? There is good and bad in everything. Well, for plants an increase in CO2 is a good thing. They grow faster. They convert CO2 into more food and fiber. They give off more oxygen, which is good for humans. Plants appear to be CO2 starved.

It is argued that CO2 is a greenhouse gas and an increase in CO2 will raise temperatures, ice will melt, sea levels will rise, and coastal area will flood, etc. It depends. In theory yes, in reality, maybe. But a lot of other events must be orchestrated simultaneously. Obviously, that scenario depends on other things as, for the last 18 years, CO2 has continued to go up and temperatures have not. So it depends on other factors, solar radiance, water vapor, El Nino, sun spots, cosmic rays, earth presession, etc., just what the professor said.

young pic 1

So suppose ambient temperatures do go up a few degrees. On balance, is that bad for humans? The evidence is overwhelming that warmer is better for humans. One or two examples are instructive. First, Cox et al., (2013) with the title, “Warmer is healthier: Effects on mortality rates of changes in average fine particulate matter (PM2.5) concentrations and temperatures in 100 U.S. cities.” To quote from the abstract of that paper, “Increases in average daily temperatures appear to significantly reduce average daily mortality rates, as expected from previous research.” Here is their plot of daily mortality rate versus Max temperature. It is clear that as the maximum temperature in a city goes up, mortality goes down. So if the net effect of increasing CO2 is increasing temperature, there should be a reduction in deaths. Continue reading

Categories: evidence-based policy, junk science, Statistics | Tags: | 35 Comments

Should a “Fictionfactory” peepshow be barred from a festival on “Truth and Reality”? Diederik Stapel says no (rejected post)

photo-on-9-17-14-at-9-49-pm1So I hear that Diederik Stapel is the co-author of a book Fictionfactory (in Dutch,with a novelist, Dautzenberg)[i], and of what they call their “Fictionfactory peepshow”, only it’s been disinvited at the last minute from a Dutch festival on“truth and reality” (due to have run 9/26/14), and all because of Stapel’s involvement. Here’s an excerpt from an article in last week’s Retraction Watch (article is here):*

Here’s a case of art imitating science.

The organizers of a Dutch drama festival have put a halt to a play about the disgraced social psychologist Diederik Stapel, prompting protests from the authors of the skit — one of whom is Stapel himself.

According to an article in NRC Handelsblad:

The Amsterdam Discovery Festival on science and art has canceled at the last minute, the play written by Anton Dautzenberg and former professor Diederik Stapel. Co-sponsor, The Royal Netherlands Academy of Arts and Sciences (KNAW), doesn’t want Stapel, who committed science fraud, to perform at a festival that’s associated with the KNAW.

FICTION FACTORY

The management of the festival, planned for September 26th at the Tolhuistuin in Amsterdam, contacted Stapel and Dautzenberg 4 months ago with the request to organize a performance of their book and lecture project ‘The Fictionfactory”. Especially for this festival they [Stapel and Dautzenberg] created a ‘Fictionfactory-peepshow’.

“Last Friday I received a call [from the management of the festival] that our performance has been canceled at the last minute because the KNAW will withdraw their subsidy if Stapel is on the festival program”, says Dautzenberg. “This looks like censorship, and by an institution that also wants to represents arts and experiments”.

Well this is curious, as things with Stapel always are. What’s the “Fichtionfactory Peepshow”? If you go to Stapel’s homepage, it’s all in Dutch, but Google translation isn’t too bad, and I have a pretty good description of the basic idea. So since it’s Saturday night,let’s take a peek, or peep (at what it might have been)…

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here we are at the “Truth and Reality” Festival: first stop (after some cotton candy): the Stapel Fictionfactory Peepshow! It’s all dark, I can’t see a thing. What? It says I have to put some coins in a slot if I want to turn turn him it on (but that they also take credit cards). So I’m to look down in this tiny window. The curtains are opening!…I see a stage with two funky looking guys– one of them is Stapel. They’re reading, or reciting from some magazine with big letters: “Fact, Fiction and the Frictions we Hide”.

Stapel and Dautzenberg

Stapel and Dautzenberg

STAPEL: Welkom.You can ask us any questions! In response, you will always be given an option: ‘Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?’

“Well I’ve brought some data with me from a study in social psychology. My question is this: “Is there a statistically significant effect here?”

STAPEL:Do you want to know the truth or do you want to be comforted with fictions and feel-good fantasy?

“Fiction please”.

STAPEL: I can massage your data, manipulate your numbers, reveal the taboos normally kept under wraps. For a few more coins I will let you see the secrets behind unreplicable results, and for a large bill will manufacture for you a sexy statistical story to turn on the editors.

(Then after the dirty business is all done [ii].)

STAPEL: Do you have more questions for me?

“Will it be published (fiction please)?”

STAPEL: “yes”

“will anyone find out about this (fiction please)?”

STAPEL: “No, I mean yes, I mean no.”

 

“I’d like to change to hearing the truth now. I have three questions”.

STAPEL: No problem, we take credit cards. Dank u. What are your questions?’

“Will Uri Simonsohn be able to fraudbust my results using the kind of tests he used on others? and if so, how long will it take him? (truth, please)?

STAPEL: “Yes.But not for at least 6 months to one year.”

“Here’s my final question. Are these data really statistically significant and at what level?” (truth please)

Nothing. Blank screen suddenly! With an acrid smelling puff of smoke, ew. But I’d already given the credit card! (Tricked by the master trickster).

 

What if he either always lies or always tells the truth? Then what would you ask him if you want to know the truth about your data? (Liar’s paradox variant)

Feel free to share your queries/comments.

* I thank Caitlin Parker for sending me the article

[i]Diederik Stapel was found guilty of science fraud in psychology in 2011, made up data out of whole cloth, retracted over 50 papers.. http://www.nytimes.com/2013/04/28/magazine/diederik-stapels-audacious-academic-fraud.html?pagewanted=all&_r=0

Bookjacket:

defictiefabriek_clip_image002

[ii] Perhaps they then ask you how much you’ll pay for a bar of soap (because you’d sullied yourself). Why let potential priming data go to waste?  Oh wait, he doesn’t use real data…. Perhaps the peepshow was supposed to be a kind of novel introduction to research ethics.

 

Some previous posts on Stapel:

 

Categories: Comedy, junk science, rejected post, Statistics | 5 Comments

Some ironies in the ‘replication crisis’ in social psychology (4th and final installment)

freud mirror espThere are some ironic twists in the way social psychology is dealing with its “replication crisis”, and they may well threaten even the most sincere efforts to put the field on firmer scientific footing–precisely in those areas that evoked the call for a “daisy chain” of replications. Two articles, one from the Guardian (June 14), and a second from The Chronicle of Higher Education (June 23) lay out the sources of what some are calling “Repligate”. The Guardian article is “Physics Envy: Do ‘hard’ sciences hold the solution to the replication crisis in psychology?”

The article in the Chronicle of Higher Education also gets credit for its title: “Replication Crisis in Psychology Research Turns Ugly and Odd”. I’ll likely write this in installments…(2nd, 3rd , 4th)

^^^^^^^^^^^^^^^

The Guardian article answers yes to the question “Do ‘hard’ sciences hold the solution“:

Psychology is evolving faster than ever. For decades now, many areas in psychology have relied on what academics call “questionable research practices” – a comfortable euphemism for types of malpractice that distort science but which fall short of the blackest of frauds, fabricating data.
Continue reading

Categories: junk science, science communication, Statistical fraudbusting, Statistics | 60 Comments

Blog at WordPress.com.