junk science

“Murder or Coincidence?” Statistical Error in Court: Richard Gill (TEDx video)

“There was a vain and ambitious hospital director. A bad statistician. ..There were good medics and bad medics, good nurses and bad nurses, good cops and bad cops … Apparently, even some people in the Public Prosecution service found the witch hunt deeply disturbing.”

This is how Richard Gill, statistician at Leiden University, describes a feature film (Lucia de B.) just released about the case of Lucia de Berk, a nurse found guilty of several murders based largely on statistics. Gill is widely-known (among other things) for showing the flawed statistical analysis used to convict her, which ultimately led (after Gill’s tireless efforts) to her conviction being revoked. (I hope they translate the film into English.) In a recent e-mail Gill writes:

“The Dutch are going into an orgy of feel-good tear-jerking sentimentality as a movie comes out (the premiere is tonight) about the case. It will be a good movie, actually, but it only tells one side of the story. …When a jumbo jet goes down we find out what went wrong and prevent it from happening again. The Lucia case was a similar disaster. But no one even *knows* what went wrong. It can happen again tomorrow.

I spoke about it a couple of days ago at a TEDx event (Flanders).

You can find some p-values in my slides [“Murder by Numbers”, pasted below the video]. They were important – first in convicting Lucia, later in getting her a fair re-trial.”

Since it’s Saturday night, let’s watch Gill’s TEDx talk, “Statistical Error in court”.

Slides from the Talk: “Murder by Numbers”:


Categories: junk science, P-values, PhilStatLaw, science communication, Statistics | Tags: | Leave a comment

“Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)

Sell me that antiseptic!

We were reading “Out, Damned Spot: Can the ‘Macbeth effect’ be replicated?” (Earp,B., Everett,J., Madva,E., and Hamlin,J. 2014, in Basic and Applied Social Psychology 36: 91-8) in an informal gathering of our 6334 seminar yesterday afternoon at Thebes. Some of the graduate students are interested in so-called “experimental” philosophy, and I asked for an example that used statistics for purposes of analysis. The example–and it’s a great one (thanks Rory M!)–revolves around priming research in social psychology. Yes the field that has come in for so much criticism as of late, especially after Diederik Stapel was found to have been fabricating data altogether (search this blog, e.g., here).[1] Continue reading

Categories: fallacy of non-significance, junk science, reformers, Statistics | 15 Comments

Skeptical and enthusiastic Bayesian priors for beliefs about insane asylum renovations at Dept of Homeland Security: I’m skeptical and unenthusiastic

Danver State Hospital

Danvers State Hospital

I had heard of medical designs that employ individuals who supply Bayesian subjective priors that are deemed either “enthusiastic” or “skeptical” as regards the probable value of medical treatments.[i] From what I gather, these priors are combined with data from trials in order to help decide whether to stop trials early or continue. But I’d never heard of these Bayesian designs in relation to decisions about building security or renovations! Listen to this…. Continue reading

Categories: junk science, Statistics | 11 Comments

capitalizing on chance (ii)

Mayo playing the slots

DGM playing the slots

I may have been exaggerating one year ago when I started this post with “Hardly a day goes by”, but now it is literally the case*. (This  also pertains to reading for Phil6334 for Thurs. March 6):

Hardly a day goes by where I do not come across an article on the problems for statistical inference based on fallaciously capitalizing on chance: high-powered computer searches and “big” data trolling offer rich hunting grounds out of which apparently impressive results may be “cherry-picked”:

When the hypotheses are tested on the same data that suggested them and when tests of significance are based on such data, then a spurious impression of validity may result. The computed level of significance may have almost no relation to the true level. . . . Suppose that twenty sets of differences have been examined, that one difference seems large enough to test and that this difference turns out to be “significant at the 5 percent level.” Does this mean that differences as large as the one tested would occur by chance only 5 percent of the time when the true difference is zero? The answer is no, because the difference tested has been selected from the twenty differences that were examined. The actual level of significance is not 5 percent, but 64 percent! (Selvin 1970, 104)[1]

…Oh wait -this is from a contributor to Morrison and Henkel way back in 1970! But there is one big contrast, I find, that makes current day reports so much more worrisome: critics of the Morrison and Henkel ilk clearly report that to ignore a variety of “selection effects” results in a fallacious computation of the actual significance level associated with a given inference; clear terminology is used to distinguish the “computed” or “nominal” significance level on the one hand, and the actual or warranted significance level on the other. Continue reading

Categories: junk science, selection effects, spurious p values, Statistical fraudbusting, Statistics | 4 Comments

S. Stanley Young: More Trouble with ‘Trouble in the Lab’ (Guest post)

 Stanley Young’s guest post arose in connection with Kepler’s Nov. 13, and my November 9 post,and associated comments.

YoungPhoto2008S. Stanley Young, PhD Assistant Director for Bioinformatics National Institute of Statistical Sciences Research Triangle Park, NC

Much is made by some of the experimental biologists that their art is oh so sophisticated that mere mortals do not have a chance [to successfully replicate]. Bunk. Agriculture replicates all the time. That is why food is so cheap. The world is growing much more on fewer acres now than it did 10 years ago. Materials science is doing remarkable things using mixtures of materials. Take a look at just about any sports equipment. These two areas and many more use statistical methods: design of experiments, randomization, blind reading of results, etc. and these methods replicate, quite well, thank you. Read about Edwards Deming. Experimental biology experiments are typically run by small teams in what is in effect a cottage industry. Herr professor is usually not in the lab. He/she is busy writing grants. A “hands” guy is in the lab. A computer guy does the numbers. No one is checking other workers’ work. It is a cottage industry to produce papers.

There is a famous failure to replicate that appeared in Science.  A pair of non-estrogens was reported to have a strong estrogenic effect. Six labs wrote into Science saying the could not replicate the effect. I think the back story is as follows. The hands guy tested a very large number of pairs of chemicals. The most extreme pair looked unusual. Lab boss said, write it up. Every assay has some variability, so they reported extreme variability as real. Failure to replicate in six labs. Science editors says, what gives. Lab boss goes to hands guy and says run the pair again. No effect. Lab boss accuses hands guy of data fabrication. They did not replicate their own finding before rushing to publish. I asked the lab for the full data set, but they refused to provide the data.  The EPA is still chasing this will of the wisp, environmental estrogens. False positive results with compelling stories can live a very long time. See [i].

Begley and Ellis visited labs. They saw how the work was done. There are instances where something was tried over and over and when it worked “as expected”, it was a rap. Write the paper and move on. I listened to a young researcher say that she tried for 6 months to replicate results of a paper. Informal conversations with scientists support very poor replication.

One can say that the jury is out as there have been few serious attempts to systematically replicate. There is now starting systematic replication. I say less than 50% of experimental biology claims will replicate.

[i]Hormone Hysterics. Tulane University researchers published a 1996 study claiming that combinations of manmade chemicals (pesticides and PCBs) disrupted normal hormonal processes, causing everything from cancer to infertility to attention deficit disorder.

Media, regulators and environmentalists hailed the study as “astonishing.” Indeed it was as it turned out to be fraud, according to an October 2001 report by federal investigators. Though the study was retracted from publication, the law it spawned wasn’t and continues to be enforced by the EPA. Read more…

Categories: evidence-based policy, junk science, Statistical fraudbusting, Statistics | 20 Comments

T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)

Tom Kepler’s guest post arose in connection with my November 9 post & comments.


Professor Thomas B. Kepler
Department of Microbiology
Department of Mathematics & Statistics
Boston University School of Medicine

There is much to say about the article in the Economist, but the first is to note that it is far more balanced than its sensational headline promises. Promising to throw open the curtain on “Unreliable research” is mere click-bait for the science-averse readers who have recently found validation against their intellectual insecurities in the populist uprising against the shadowy world of the scientist. What with the East Anglia conspiracy, and so on, there’s no such thing as “too skeptical” when it comes to science.

There is some remarkably casual reporting in an article that purports to be concerned with mechanisms to assure that inaccuracies not be perpetuated.

For example, the authors cite the comment in Nature by Begley and Ellis and summarize it thus: …scientists at Amgen, an American drug company, tried to replicate 53 studies that they considered landmarks in the basic science of cancer, often co-operating closely with the original researchers to ensure that their experimental technique matched the one used first time round. Stan Young, in his comments to Mayo’s blog adds, “These claims can not be replicated – even by the original investigators! Stop and think of that.” But in fact the role of the original investigators is described as follows in Begley and Ellis: “…when findings could not be reproduced, an attempt was made to contact the original authors, discuss the discrepant findings, exchange reagents and repeat experiments under the authors’ direction, occasionally even in the laboratory of the original investigator.” (Emphasis added.) Now, please stop and think about what agenda is served by eliding the tempered language of the original.

Both the Begley and Ellis comment and the brief correspondence by Prinz et al. also cited in this discussion are about laboratories in commercial pharmaceutical companies failing to reproduce experimental results. While deciding how to interpret their findings, it would be prudent to bear in mind the insight from Harry Collins, the sociologist of science paraphrased in the Economist piece as indicating that “performing an experiment always entails what sociologists call “tacit knowledge”—craft skills and extemporisations that their possessors take for granted but can pass on only through example. Thus if a replication fails, it could be because the repeaters didn’t quite get these je-ne-sais-quoi bits of the protocol right.” Indeed, I would go further and conjecture that few experimental biologists would hold out hope that any one laboratory could claim the expertise necessary to reproduce the results of 53 ground-breaking papers in diverse specialties, even within cancer drug discovery. And to those who are unhappy that authors often do not comply with the journals’ clear policy of data-sharing, how do you suppose you would fare getting such data from the pharmaceutical companies that wrote these damning papers? Or the authors of the papers themselves? Nature had to clarify, writing two months after the publication of Begley and Ellis, “Nature, like most journals, requires authors of research papers to make their data available on request. In this less formal Comment, we chose not to enforce this requirement so that Begley and Ellis could abide by the legal agreements [they made with the original authors].” Continue reading

Categories: junk science, reforming the reformers, science communication, Statistics | 20 Comments

Beware of questionable front page articles warning you to beware of questionable front page articles (iii)

RRIn this time of government cut-backs and sequester, scientists are under increased pressure to dream up ever new strategies to publish attention-getting articles with eye-catching, but inadequately scrutinized, conjectures. Science writers are under similar pressures, and to this end they have found a way to deliver up at least one fire-breathing, front page article a month. How? By writing minor variations on an article about how in this time of government cut-backs and sequester, scientists are under increased pressure to dream up ever new strategies to publish attention-getting articles with eye-catching, but inadequately scrutinized, conjectures.

Thus every month or so we see retreads on why most scientific claims are unreliable,  biased, wrong, and not even wrong. Maybe that’s the reason the authors of a recent article in The Economist (“Trouble at the Lab“) remain anonymous.

I don’t disagree with everything in the article; on the contrary, part of their strategy is to include such well known problems as publication bias, problems with priming studies in psychology, and failed statistical assumptions. But the “big news”–the one that sells– is that “to an alarming degree” science (as a whole) is not reliable and not self-correcting. The main evidence is that there are the factory-like (thumbs up/thumbs down) applications of statistics in exploratory, hypotheses generating contexts wherein the goal is merely screening through reams of associations to identify a smaller batch for further analysis. But do even those screening efforts claim to have evidence of a genuine relationship when a given H is spewed out of their industrial complexes? Do they go straight to press after one statistically significant result?  I don’t know, maybe some do. What I do know is that the generalizations we are seeing in these “gotcha” articles are every bit as guilty of sensationalizing without substance as the bad statistics they purport to be impugning. As they see it, scientists, upon finding a single statistically significant result at the 5% level, declare an effect real or a hypothesis true, and then move on to the next hypothesis. No real follow-up scrutiny, no building on discrepancies found, no triangulation, self-scrutiny, etc.

But even so, the argument which purports to follow from “statistical logic”, but which actually is a jumble of “up-down” significance testing, Bayesian calculations, and computations that might at best hold for crude screening exercises (e.g., for associations between genes and disease) commits blunders about statistical power, and founders. Never mind that if the highest rate of true outputs was wanted, scientists would dabble in trivialities….Never mind that I guarantee if you asked Nobel prize winning scientists the rate of correct attempts vs blind alleys they went through before their Prize winning results, they’d say far more than 50% errors,  (Perrin and Brownian motion, Prusiner and Prions, experimental general relativity, just to name some I know.)

But what about the statistics? Continue reading

Categories: junk science, P-values, Statistics | 52 Comments

Will the Real Junk Science Please Stand Up? (critical thinking)

Equivocations about “junk science” came up in today’s “critical thinking” class; if anything, the current situation is worse than 2 years ago when I posted this.

Have you ever noticed in wranglings over evidence-based policy that it’s always one side that’s politicizing the evidence—the side whose policy one doesn’t like? The evidence on the near side, or your side, however, is solid science. Let’s call those who first coined the term “junk science” Group 1. For Group 1, junk science is bad science that is used to defend pro-regulatory stances, whereas sound science would identify errors in reports of potential risk. For the challengers—let’s call them Group 2—junk science is bad science that is used to defend the anti-regulatory stance, whereas sound science would identify potential risks, advocate precautionary stances, and recognize errors where risk is denied. Both groups agree that politicizing science is very, very bad—but it’s only the other group that does it!

A given print exposé exploring the distortions of fact on one side or the other routinely showers wild praise on their side’s—their science’s and their policy’s—objectivity, their adherence to the facts, just the facts. How impressed might we be with the text or the group that admitted to its own biases?

Take, say, global warming, genetically modified crops, electric-power lines, medical diagnostic testing. Group 1 alleges that those who point up the risks (actual or potential) have a vested interest in construing the evidence that exists (and the gaps in the evidence) accordingly, which may bias the relevant science and pressure scientists to be politically correct. Group 2 alleges the reverse, pointing to industry biases in the analysis or reanalysis of data and pressures on scientists doing industry-funded work to go along to get along.

When the battle between the two groups is joined, issues of evidence—what counts as bad/good evidence for a given claim—and issues of regulation and policy—what are “acceptable” standards of risk/benefit—may become so entangled that no one recognizes how much of the disagreement stems from divergent assumptions about how models are produced and used, as well as from contrary stands on the foundations of uncertain knowledge and statistical inference. The core disagreement is mistakenly attributed to divergent policy values, at least for the most part. Continue reading

Categories: critical thinking, junk science, Objectivity | Tags: , , , , | 16 Comments

What do these share in common: m&ms, limbo stick, ovulation, Dale Carnegie? Sat night potpourri


For entertainment only

I had said I would label as pseudoscience or questionable science any enterprise that regularly permits the kind of ‘verification biases’ in the laundry list of my June 1 post.  How regularly? (I’ve been asked)

Well, surely if it’s as regular as, say, much of social psychology, it goes over the line. But it’s not mere regularity, it’s the nature of the data, the type of inferences being drawn, and the extent of self-scrutiny and recognition of errors shown (or not shown). The regularity is just a consequence of the methodological holes. My standards may be considerably more stringent than most, but quite aside from statistical issues, I simply do not find hypotheses well-tested if they are based on “experiments” that consist of giving questionnaires. At least not without a lot more self-scrutiny and discussion of flaws than I ever see. (There may be counterexamples.)

Attempts to recreate phenomena of interest in typical social science “labs” leave me with the same doubts. Huge gaps often exist between elicited and inferred results. One might locate the problem under “external validity” but to me it is just the general problem of relating statistical data to substantive claims.

Experimental economists (expereconomists) take lab results plus statistics to warrant sometimes ingenious inferences about substantive hypotheses.  Vernon Smith (of the Nobel Prize in Econ) is rare in subjecting his own results to “stress tests”.  I’m not withdrawing the optimistic assertions he cites from EGEK (Mayo 1996) on Duhem-Quine (e.g., from “Rhetoric and Reality” 2001, p. 29). I’d still maintain, “Literal control is not needed to attribute experimental results correctly (whether to affirm or deny a hypothesis). Enough experimental knowledge will do”.  But that requires piece-meal strategies that accumulate, and at least a little bit of “theory” and/or a decent amount of causal understanding.[1]

I think the generalizations extracted from questionnaires allow for an enormous amount of “reading into” the data. Suddenly one finds the “best” explanation. Questionnaires should be deconstructed for how they may be misinterpreted, not to mention how responders tend to guess what the experimenter is looking for. (I’m reminded of the current hoopla over questionnaires on breadwinners, housework and divorce rates!) I respond with the same eye-rolling to just-so story telling along the lines of evolutionary psychology.

I apply the “Stapel test”: Even if Stapel had bothered to actually carry out the data-collection plans that he so carefully crafted, I would not find the inferences especially telling in the least. Take for example the planned-but-not-implemented study discussed in the recent New York Times article on Stapel:

 Stapel designed one such study to test whether individuals are inclined to consume more when primed with the idea of capitalism. He and his research partner developed a questionnaire that subjects would have to fill out under two subtly different conditions. In one, an M&M-filled mug with the word “kapitalisme” printed on it would sit on the table in front of the subject; in the other, the mug’s word would be different, a jumble of the letters in “kapitalisme.” Although the questionnaire included questions relating to capitalism and consumption, like whether big cars are preferable to small ones, the study’s key measure was the amount of M&Ms eaten by the subject while answering these questions….Stapel and his colleague hypothesized that subjects facing a mug printed with “kapitalisme” would end up eating more M&Ms.

Stapel had a student arrange to get the mugs and M&Ms and later load them into his car along with a box of questionnaires. He then drove off, saying he was going to run the study at a high school in Rotterdam where a friend worked as a teacher.

Stapel dumped most of the questionnaires into a trash bin outside campus. At home, using his own scale, he weighed a mug filled with M&Ms and sat down to simulate the experiment. While filling out the questionnaire, he ate the M&Ms at what he believed was a reasonable rate and then weighed the mug again to estimate the amount a subject could be expected to eat. He built the rest of the data set around that number. He told me he gave away some of the M&M stash and ate a lot of it himself. “I was the only subject in these studies,” he said.

He didn’t even know what a plausible number of M&Ms consumed would be! But never mind that, observing a genuine “effect” in this silly study would not have probed the hypothesis. Would it? Continue reading

Categories: junk science, Statistics | 5 Comments

Mayo: comment on the repressed memory research

freud mirror espHere are some reflections on the repressed memory articles from Richard Gill’s post, focusing on Geraerts, et.al.,(2008).

1. Richard Gill reported that “Everyone does it this way, in fact, if you don’t, you’d never get anything published: …People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them and are just doing their best to make this as clear as possible to everyone.”

This remark is very telling. I recommend we just regard those cases as illustrating a theory one believes, rather than providing evidence for that theory. If we could mark them as such, we can stop blaming significance tests for playing a role in what are actually only illustrative attempts, or to strengthen someone’s beliefs about a theory.

2. I was surprised the examples had to do with recovered memories. Wasn’t that entire area dubbed a pseudoscience way back (at least 15-25 years ago?) when “therapy induced” memories of childhood sexual abuse (CSA) were discovered to be just that—therapy induced and manufactured? After the witch hunts that ensued (the very accusation sufficing for evidence), I thought the field of “research” had been put out of its and our misery. So, aside from having used the example in a course on critical thinking, I’m not up on this current work at all. But, as these are just blog comments, let me venture some off-the-cuff skeptical thoughts. They will have almost nothing to do with the statistical data analysis, by the way…

3. Geraerts, et.al., (2008, 22) admit at the start of the article that therapy-recovered CSA memories are unreliable, and the idea of automatically repressing a traumatic event like CSA implausible. Then mightn’t it seem the entire research program should be dropped? Not to its adherents! As with all theories that enjoy the capacity of being sufficiently flexible to survive anomaly (Popper’s pseudosciences), there’s some life left here too. Maybe , its adherents reason, it’s not necessary for those who report “spontaneously recovered” CSA memories to be repressors, instead they merely be “suppressors” who are good at blocking out negative events. If so, they didn’t automatically repress but rather deliberately suppressed: “Our findings may partly explain why people with spontaneous CSA memories have the subjective impression that they have ‘repressed’ their CSA memories for many years.” (ibid., 22).

4. Shouldn’t we stop there? I would. We have a research program growing out of an exemplar of pseudoscience being kept alive by ever-new “monster-barring” strategies (as Lakatos called them). (I realize they’re not planning to go out to the McMartin school, but still…) If a theory T is flexible enough so that any observations can be interpreted through it, and thereby regarded as confirming T, then it is no surprise that this is still true when the instances are dressed up with statistics. It isn’t that theories of repressed memories are implausible or improbable (in whatever sense one takes those terms). It is the ever-flexibility of these theories that renders the research program pseudoscience (along with, in this case, a history of self-sealing data interpretations). Continue reading

Categories: junk science, Statistical fraudbusting, Statistics | 7 Comments

Richard Gill: “Integrity or fraud… or just quesionable research practices?”

Professor Gill

Professor Gill

Professor Richard Gill
Statistics Group
Mathematical Institute
Leiden University

I am very grateful to Richard Gill for permission to post an e-mail from him (after my “dirty laundry” post) along with slides from his talk, “Integrity or fraud… or just questionable research practices?” and associated papers. I record my own reflections on the pseudoscientific nature of the program in one of the Geraerts et.al., papers in a later post.

I certainly have been thinking about these issues a lot in recent months. I got entangled in intensive scientific and media discussions – mainly confined to the Netherlands  – concerning the cases of social psychologist Dirk Smeesters and of psychologist Elke Geraerts.  See: http://www.math.leidenuniv.nl/~gill/Integrity.pdf

And I recently got asked to look at the statistics in some papers of another … [researcher] ..but this one is still confidential ….

The verdict on Smeesters was that he like Stapel actually faked data (though he still denies this).

The Geraerts case is very much open, very much unclear. The senior co-authors Merckelbach, McNally of the attached paper, published in the journal “Memory”, have asked the journal editors for it to be withdrawn because they suspect the lead author, Elke Geraerts, of improper conduct. She denies any impropriety. It turns out that none of the co-authors have the data. Legally speaking it belongs to the University of Maastricht where the research was carried out and where Geraerts was a promising postdoc in Merckelbach’s group. She later got a chair at Erasmus University Rotterdam and presumably has the data herself but refuses to share it with her old co-authors or any other interested scientists. Just looking at the summary statistics in the paper one sees evidence of “too good to be true”. Average scores in groups supposed in theory to be similar are much closer to one another than one would expect on the basis of the within group variation (the paper reports averages and standard deviations for each group, so it is easy to compute the F statistic for equality of the three similar groups and use its left tail probability as test statistic. Continue reading

Categories: junk science, Statistical fraudbusting, Statistics | 5 Comments

Some statistical dirty laundry

Objectivity 1: Will the Real Junk Science Please Stand Up?I finally had a chance to fully read the 2012 Tilberg Report* on “Flawed Science” last night. The full report is now here. Here are some stray thoughts…

1. Slipping into pseudoscience.
The authors of the Report say they never anticipated giving a laundry list of “undesirable conduct” by which researchers can flout pretty obvious requirements for the responsible practice of science. It was an accidental byproduct of the investigation of one case (Diederik Stapel, social psychology) that they walked into a culture of “verification bias”[1]. Maybe that’s why I find it so telling. It’s as if they could scarcely believe their ears when people they interviewed “defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences” (Report 48). So they trot out some obvious rules, and it seems to me that they do a rather good job.

One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses. Violations of this fundamental rule, such as continuing an experiment until it works as desired, or excluding unwelcome experimental subjects or results, inevitably tends to confirm the researcher’s research hypotheses, and essentially render the hypotheses immune to the facts…. [T]he use of research procedures in such a way as to ‘repress’ negative results by some means” may be called verification bias. [my emphasis] (Report, 48).

I would place techniques for ‘verification bias’ under the general umbrella of techniques for squelching stringent criticism and repressing severe tests. These gambits make it so easy to find apparent support for one’s pet theory or hypotheses, as to count as no evidence at all (see some from their list ). Any field that regularly proceeds this way I would call a pseudoscience, or non-science, following Popper. “Observations or experiments can be accepted as supporting a theory (or a hypothesis, or a scientific assertion) only if these observations or experiments are severe tests of the theory” (Popper 1994, p. 89). [2] It is unclear at what point a field slips into the pseudoscience realm.

2. A role for philosophy of science?
I am intrigued that one of the final recommendations in the Report is this:

In the training program for PhD students, the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science must be covered. Based on these insights, research Master’s students and PhD students must receive practical training from their supervisors in the application of the rules governing proper and honest scientific research, which should include examples of such undesirable conduct as data massage. The Graduate School must explicitly ensure that this is implemented.

A philosophy department could well create an entire core specialization that revolved around “the relevant basic principles of philosophy of science, methodology, ethics and statistics that enable the responsible practice of science” (ideally linked with one or more other departments).  That would be both innovative and fill an important gap, it seems to me. Is anyone doing this?

3. Hanging out some statistical dirty laundry.images
Items in their laundry list include:

  • An experiment fails to yield the expected statistically significant results. The experiment is repeated, often with minor changes in the manipulation or other conditions, and the only experiment subsequently reported is the one that did yield the expected results. The article makes no mention of this exploratory method… It should be clear, certainly with the usually modest numbers of experimental subjects, that using experiments in this way can easily lead to an accumulation of chance findings…. Continue reading
Categories: junk science, spurious p values, Statistics | 6 Comments

If it’s called the “The High Quality Research Act,” then ….

Unknown-2Among the (less technical) items sent my way over the past few days are discussions of the so-called High Quality Research Act. I’d not heard of it, but it’s apparently an outgrowth of the recent hand-wringing over junk science, flawed statistics, non-replicable studies, and fraud (discussed at times on this blog). And it’s clearly a hot topic. Let me just run this by you and invite your comments (before giving my impression). Following the Bill, below, is a list of five NSF projects about which the HQRA’s sponsor has requested further information, and then part of an article from today’s New Yorker on this “divisive new bill”: “Not Safe for Funding: The N.S.F. and the Economics of Science”.



April 18, 2013


Be it enacted by the Senate and House of Representatives of the United States of America in Congress assembled,


This act may be cited as the “High Quality Research Act”.


(a) CERTIFICATION.—prior to making an award of any contract or grant funding for a scientific research project, the Director of the NSF shall publish a statement on the public website of the Foundation that certifies that the research project—

(1) is in the interests of the U.S. to advance the national health, prosperity, or welfare, and to secure the national defense by promoting the progress of science;

(2) is the finest quality, is ground breaking, and answers questions or solves problems that are of utmost importance to society at large; and

(3) is not duplicative of other research projects being funded by the Foundation or other Federal Science agencies.

(b) TRANSFER OF FUNDS.—Any unobligated funds for projects ot meeting the requirements of subjection (a) may be awarded to other scientific research projects that do meet such requirements.

(e) INITIAL IMPLEMENTATION REPORT.—Not later than 60 days after the date of enactment of this Act, the Director shall report to the Committee on Commerce, Science, and Transportation of the Senate and the Committee on Science, Space, and Technology of the House of Representatives on how the requirements set for in subsection (a) are being implemented.

(d) NATIONAL SCIENCE BOARD IMPLEMENTATION REPORT. __ Not later than 1 year after the date of enactment of this act, the national science board shall report to the committee on commerce, science, and transportation of the senate and the committee on science, space and technology of the house of representatives its findings and recommendations on how the requirements of subsection (a) are being implemented.

etc. etc.

Link to the Bill

Rep. Lamar Smith,author of the Bill, listed five NSF projects about which he has requested further information. 

1. Award Abstract #1247824: “Picturing Animals in National Geographic, 1888-2008,” March 15, 2013, ($227,437); 

2. Award Abstract #1230911: “Comparative Histories of Scientific Conservation: Nature, Science, and Society in Patagonian and Amazonian South America,” September 1, 2012 ($195,761);

3. Award Abstract #1230365: “The International Criminal Court and the Pursuit of Justice,” August 15, 2012 ($260,001);

4. Award Abstract #1226483, “Comparative Network Analysis: Mapping Global Social Interactions,” August 15, 2012, ($435,000); and

5. Award Abstract #1157551: “Regulating Accountability and Transparency in China’s Dairy Industry,” June 1, 2012 ($152,464).


MAY 9, 2013


Categories: junk science, science communication, Statistics | 14 Comments

Flawed Science and Stapel: Priming for a Backlash?

my 1st fraud kitDeiderik Stapel is back in the news, given the availability of the English translation of the Tilberg (Levelt and Noort Committees) Report as well as his book, Ontsporing (Dutch for “Off the Rails”), where he tries to explain his fraud. An earlier post on him is here. While the disgraced social psychologist was shown to have fabricated the data for something like 50 papers, it seems that some people think he deserves a second chance. A childhood friend, Simon Kuper, in an article “The Sin of Bad Science,” describes a phone conversation with Stapel:

“I’ve lost everything,” the disgraced former psychology professor tells me over the phone from the Netherlands. He is almost bankrupt. … He has tarnished his own discipline of social psychology. And he has become a national pariah. …

Very few social psychologists make stuff up, but he was working in a discipline where cavalier use of data was common. This is perhaps the main finding of the three Dutch academic committees which investigated his fraud. The committees found many bad practices: researchers who keep rerunning an experiment until they get the right result, who omit inconvenient data, misunderstand statistics, don’t share their data, and so on….

Chapter 5 of the Report, pp 47-54, is extremely illuminating about the general practices they discovered in examining Stapel’s papers, I recommend it.

Social psychology might recover. However, Stapel might not. A country’s way of dealing with sinners is often shaped by its religious heritage. In Catholicism, sinners can get absolution in the secrecy of confession. … …In many American versions of Protestantism, the sinner can be “born again”. …Stapel’s misfortune is to be Dutch. The dominant Dutch tradition is Calvinist, and Calvinism believes in eternal sin. …But the downside to not forgiving sinners is that there are almost no second acts in Dutch lives.


But it isn’t just old acquaintances who think Stapel might be ready for a comeback. A few researchers are beginning to defend the field from the broader accusations the Report wages against the scientific integrity of social psychology. They do not deny the “cavalier” practices, but regard them as acceptable and even necessary! This might even pave the way for Stapel’s rehabilitation. An article by a delegate for the 3rd World Conference on Research Integrity (wcri2013.org) in Montreal, Canada, in May reports on members of a new group critical of the Report, including some who were interviewed by the Tilberg Committees: Continue reading

Categories: junk science, Statistics | 21 Comments

Blog at WordPress.com.