Flawed Science and Stapel: Priming for a Backlash?

my 1st fraud kitDeiderik Stapel is back in the news, given the availability of the English translation of the Tilberg (Levelt and Noort Committees) Report as well as his book, Ontsporing (Dutch for “Off the Rails”), where he tries to explain his fraud. An earlier post on him is here. While the disgraced social psychologist was shown to have fabricated the data for something like 50 papers, it seems that some people think he deserves a second chance. A childhood friend, Simon Kuper, in an article “The Sin of Bad Science,” describes a phone conversation with Stapel:

“I’ve lost everything,” the disgraced former psychology professor tells me over the phone from the Netherlands. He is almost bankrupt. … He has tarnished his own discipline of social psychology. And he has become a national pariah. …

Very few social psychologists make stuff up, but he was working in a discipline where cavalier use of data was common. This is perhaps the main finding of the three Dutch academic committees which investigated his fraud. The committees found many bad practices: researchers who keep rerunning an experiment until they get the right result, who omit inconvenient data, misunderstand statistics, don’t share their data, and so on….

Chapter 5 of the Report, pp 47-54, is extremely illuminating about the general practices they discovered in examining Stapel’s papers, I recommend it.

Social psychology might recover. However, Stapel might not. A country’s way of dealing with sinners is often shaped by its religious heritage. In Catholicism, sinners can get absolution in the secrecy of confession. … …In many American versions of Protestantism, the sinner can be “born again”. …Stapel’s misfortune is to be Dutch. The dominant Dutch tradition is Calvinist, and Calvinism believes in eternal sin. …But the downside to not forgiving sinners is that there are almost no second acts in Dutch lives.


But it isn’t just old acquaintances who think Stapel might be ready for a comeback. A few researchers are beginning to defend the field from the broader accusations the Report wages against the scientific integrity of social psychology. They do not deny the “cavalier” practices, but regard them as acceptable and even necessary! This might even pave the way for Stapel’s rehabilitation. An article by a delegate for the 3rd World Conference on Research Integrity (wcri2013.org) in Montreal, Canada, in May reports on members of a new group critical of the Report, including some who were interviewed by the Tilberg Committees:

 “Flawed Science and Diederik Stapel: Priming for a Backlash?” 

That Stapel’s “too good to be true” findings went undetected for so long despite being peer reviewed in nearly all the respected international journals in his field, speaks to a wider culture of flawed science in social psychology and related fields, according to a Report issued by several Committees investigating the fraud. (Report p.48). Those accusations are now being challenged by a group of researchers in social psychology, calling themselves the Integrity Group (IG). The Report found “when interviewed, several co-authors…defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences”  ( Report, p. 48). The Integrity Group is building an even stronger defense of their practices against the Report’s charges and is expected to take the case to court.

A number of professors were willing to speak to us anonymously, insofar as their case is a developing one, and will not be presented to legal bodies until June or July 2013….

We asked an IG member who had been a co-author for a comment on Stapel’s interview with Tilberg’s Noort Committee: “Mr Stapel observed in this connection in his interview with the Noort Committee that he was personally convinced that he was helping his PhD students. By his own account the collection of the data was the greatest chore in research, which PhD students must be helped through as quickly as possible. The paradox involved in ‘helping’ through falsification became apparent to him only later” (Report, p. 41).

Asked whether he excused the behavior of Stapel, the co-author shook his head: “Not collecting data was bad, that was his mistake. The data collection, he is right, is a chore, and he saved all of us a lot of time, especially his Ph.D students. He was reacting to the pressure to expedite the research process, allowing very fast production and speedy publications. That’s why he was hired.”

The senior social psychologist leading the Integrity Group spoke harshly against the Committee’s criticisms of statistical practices, alleging that for the most part they are “perfectly acceptable” and used throughout the social sciences. We asked him to comment on the Committee report condemning, as “verification bias,” the practice of handling non-statistical significance between treated and control groups by comparing it instead with a control group from a different experiment, where the desired statistical significance is obtained ( Report, p. 49).

“This practice is perfectly valid because they are all equivalent random groups after all. The Committee members are not social psychologists and they overlooked this simple statistical fact.”

Another researcher protested the Committee’s criticism of the common practice of omitting groups of respondents if the findings did not confirm the initial hypotheses:

“I told the Levelt Committee why we omitted the subjects (students) we did, they were just answering whatever came into their heads, and were not taking the study seriously. The judgment we bring to decide which subjects to leave out is not ad hoc, because we state our hypothesis in advance.”

This was echoed by the one of Stapel’s junior co-authors:

“When I was interviewed by the Committee, I explained that we omitted subjects who do not behave as expected. The Committee members asked if we put this in the publication. Why would we? The Journals do not want that. The committee misunderstands the basic philosophy of research here: we know or have a very good idea of which theories are true, which are believable. We consider interesting hypotheses to test; data are collected to see how these theories predict. We omit items that do not behave as expected because they represent a falsification of our assumption about the design of experiment. When we find something wrong, we change it and replace it with data that can serve the purpose of showing how the theories hold true”.

Another senior social psychologist defended the practice of going along with supervisors who “urged that the data be sold as effectively as possible.” All students, the co-author explained, are discouraged from including reports that could “undermine the data, which might make editors and reviewers suspicious. If they asked any questions the missing data would be provided later.”

“The goal is to show where the hypothesis holds, so it is of no use to include data or subjects where the agreement is not found, for any reason. Would they want a physicist to use data that went against their theories or that showed errors? No of course not, yet when we omit such data, we are accused of sloppy science. This is unfair.”

Only one member of the IG said he had been in touch with Stapel since he resigned. “I was planning a research study that he happened to have data on from many years ago, but they were never used. It’s beautiful data, and clean, not part of the fraud. I will publish the paper myself, while normally I’d make him a co-author.”

We asked what he thought of the description Stapel gives in “Off the Rails” of the time he first altered his data:

I was alone in my fancy office at University of Groningen.… I opened the file that contained research data I had entered and changed an unexpected 2 into a 4.… I looked at the door. It was closed.… I looked at the matrix with data and clicked my mouse to execute the relevant statistical analyses. When I saw the new results, the world had returned to being logical. (Stapel, p. 145)

“I have much sympathy with the desire to fix the data, to make the world rational again” he replied. “It is natural for a scientist. The journals want a clear and compelling story, and they tell us to omit anything that gets in the way.”

One of the PhD students was philosophical: “If the experiment does not confirm the hypothesis, it is our fault, and we do it over til it works right. We change the subjects or the questionnaire, we find which responses are too small and must be fixed. It is not the fault of the hypothesis. I read this in Kuhn, that’s what we all believe*.

To read the full article, go here. For a less extreme criticism of the Tilberg report see here. Also timely.

*Kuhn, T (1962), Structure of Scientific Revolutions.

Categories: junk science, Statistics

Post navigation

21 thoughts on “Flawed Science and Stapel: Priming for a Backlash?

  1. Kent Staley

    The Kuhn reference is a nice touch! But I thought “I was alone in my fancy office” strained credulity. Shouldn’t it be something like, “I was toiling against the forces of oppression seeking to deprive the world of my amazing findings…”?

  2. I added a link to reaction to the Tilberg report issued by The Executive Committee of the European Association of Social Psychology

  3. anonymous

    To be fair, the report by the EASP brings up two important points in defense of their charge that the Tilburg report is defamatory, based on a recent article in Perspectives on Psychological Science (2012, vol 7) (Stroebe). First, “other sciences have a higher incidence of fraud cases than (social) psychology, and, second that across science as a whole it is very rare for fraud to be detected as a result of peer review. … Thus, importantly, not detecting fraud through the reviewing process is not a pecularity of social psychology.” From the (8 December, 2012 report of the Executive Committee of the EASP).

    • Anonymous: OK. But the Report may still be correct to allege a more cavalier attitude in social psychology, making fraud easier to commit. They said the reported effect sizes in Stapel’s work were ridiculously high (for the field), so it’s surprising a reviewer did not notice. I am only aware of these points through the report.

      • Jeff Sherman

        The Report may be correct or may be incorrect about those cavalier attitudes. Because the committees did not seek evidence on the matter, we do not know. Nevertheless, the allegation was made, and you repeat it here as if it were a known fact. In fact, the only data on the matter of which I am aware appears in John et al’s paper on Questionable Research Practices, which shows that cognitive psychologists and neuroscientists have the same views on QRPs as do social psychologists.

        Is this an April Fool’s joke?

        • Jeff Sherman

          Let me amend my comment:

          Because the committees did not seek evidence on the matter, they should have refrained from making the claim.

        • e.berk

          Jeff: Reading the pages of the Report that are cited, it looks like they spent quite a bit of time interviewing researchers and had a team of statisticians as well. They clearly sought evidence on the matter. They themselves express surprise at what they found. There were three different committees investigating. Perhaps they obtained a very unrepresentative group, but they consulted with those involved, including Stapel.

          • Jeff Sherman

            That is not evidence on the matter. Those are anecdotes. Evidence on the matter is provided by John et al.

            • Jeff: Thank you so much for alerting the blog to the paper on questionable research practices (QRPs).
              I think it is true that the Tilberg Committee did not set out to study QRPs across some population, but rather to meticulously discover the extent of the problem in the case at hand, and try if possible to exonerate parties and explain the problems being overlooked by peer review. Having discovered, in the course of carrying out that task, what they did,they could not very well hide it. They do call for more study which I take it has been done. That said, let me just be clear that the first time I read the report was 2 days ago (growing out of a long-standing interest in the philosophy of statistics and science).
              The QRP paper looks fascinating, on a quick scan. I will study it when I can. Whether their procedures are best for unearthing QRPs, given the reluctance people would have to admit unwarranted practices, is something I hope others will study. Perhaps their method can be combined, if it hasn’t already been, with formal statistical procedures that I heard were used to identify another problem case in social psychology.

              Last point: if the John et.al QRP paper had unearthed no or few QRPs, would it be warranted evidence against the indications of the Report? I don’t think so.

    • anonymous

      Click to access The-Psychologist.pdf

      “True, the report does not compare the observed disquieting facts in the domain of social psychology with the situation in neighbouring or further afield sciences, either with respect to the incidence of fraud, or generally with respect to the occurrence of bad or sloppy science. It is, given the existing literature on this
      topic, more than likely that such a comparison would have led the Committees to the conclusion that social psychology is not unique in these respects. However, such a comparative investigation was not part of the Committees’ commission. The terms of reference,
      specified in the opening section of the report, limit the investigation to determining which publications (co-)authored by Stapel are fraudulent and to offering a view on the methods and the research culture that may have facilitated this misdemeanour.”


      “We were pleased to notice that, in the various responses our report elicited, the comparison to other sciences was not used as an excuse for the observed shortcomings in social psychology. Whatever the outcome of these comparisons do show, we believe it is critical that the responsible organisations and practitioners in social psychology continue to focus attention on fostering research integrity and monitoring proper research practices
      at all levels. If the revelation of Stapel’s fraud, the report’s analysis
      of the research culture in which it took place, and the report’s re
      commendations to guard against such misconduct have sharpened that attention, the Committees’ major efforts have not been in vain.”

  4. Corey

    “When we find something wrong, we change it and replace it with data that can serve the purpose of showing how the theories hold true”.

    Please tell me this is an April Fools’ Day joke.

    • anonymous

      Last year there was
      which had many fooled. But here all the citations from the Report, Stapel, the European Association (a link to which was only added after Kent Staley’s post) check out. I didn’t read the entire report.

  5. Christian Hennig

    Without having any detailed knowledge, the whole story reminds me of a former colleague who went on to a position in the Netherlands. He told me that the funding system there is very directly driven by publications to the extent that the universities get a fixed and known amount of money for every journal, proceedings etc. publication. He was rather positive about it at the time because he said that it would encourage people to write a lot and get their results our quickly.

    No idea whether it is still like that and how much it has to do with this case though.

    • Christian: I hadn’t heard this before. But while this could help explain their rush to publish, it does not explain overlooking questionable data, as the statisticians on the committee point out. Unless, the suggestion is that the “peers” are in a rather close circle.

  6. Here’s a very interesting article on fraud detection and the problem of false positives.
    The supposition that “negative results are uninteresting is related, it seems to me, to the mistaken idea that they are uninformative. If there is a high probability of discerning a discrepancy or effect d, then a failure to detect it is evidence of its absence. One could thereby rule out, with severity, certain discrepancies, whether in critiquing some else’s work or simply for setting upper bounds. This involves a severity analysis, or at least a post-data power analysis (not to be confused with “shpower”). Search this blog if interested.

  7. See the rejoinder (by the Tilberg committee reps) to the commentary on the Tilburg Report:

    Click to access The-Psychologist.pdf

    Does anyone know if those guilty of fraud of this sort are ever fined or made to do jail time? If insider traders are found legally culpable, why not fraudsters who do a lot more damage?

Blog at WordPress.com.