Professor Richard Gill
I am very grateful to Richard Gill for permission to post an e-mail from him (after my “dirty laundry” post) along with slides from his talk, “Integrity or fraud… or just questionable research practices?” and associated papers. I record my own reflections on the pseudoscientific nature of the program in one of the Geraerts et.al., papers in a later post.
I certainly have been thinking about these issues a lot in recent months. I got entangled in intensive scientific and media discussions – mainly confined to the Netherlands – concerning the cases of social psychologist Dirk Smeesters and of psychologist Elke Geraerts. See: http://www.math.leidenuniv.nl/~gill/Integrity.pdfAnd I recently got asked to look at the statistics in some papers of another … [researcher] ..but this one is still confidential ….
The verdict on Smeesters was that he like Stapel actually faked data (though he still denies this).
The Geraerts case is very much open, very much unclear. The senior co-authors Merckelbach, McNally of the attached paper, published in the journal “Memory”, have asked the journal editors for it to be withdrawn because they suspect the lead author, Elke Geraerts, of improper conduct. She denies any impropriety. It turns out that none of the co-authors have the data. Legally speaking it belongs to the University of Maastricht where the research was carried out and where Geraerts was a promising postdoc in Merckelbach’s group. She later got a chair at Erasmus University Rotterdam and presumably has the data herself but refuses to share it with her old co-authors or any other interested scientists. Just looking at the summary statistics in the paper one sees evidence of “too good to be true”. Average scores in groups supposed in theory to be similar are much closer to one another than one would expect on the basis of the within group variation (the paper reports averages and standard deviations for each group, so it is easy to compute the F statistic for equality of the three similar groups and use its left tail probability as test statistic.
The same phenomenon turns up in another unpublished paper by the same authors and moreover in one of the papers contained in Geraerts (Maastricht) thesis. I attach the two papers published in Geraert’s thesis which present results in very much the same pattern as the disputed “Memory” paper. Four groups of subjects, three supposed in theory to be rather similar, one expected to be strikingly different. In one of the two, just as in the Memory paper, the average scores of the three similar groups are much closer to one another than one would expect on the basis of the within-groups variation.
I got involved in the quarrel between Merckelbach and Geraerts which was being fought out in the media so various science journalists also consulted me about the statistical issues. I asked Geraerts if I could have the data of the Memory paper so that I could carry out distribution-free versions of the statistical tests of “too good to be true” which are easy to perform if you just have the summary statistics. She claimed that I had to get permission from the University of Maastricht. At some point both the presidents of Maastricht and Erasmus university were involved and presumably their legal departments too. Finally I got permission and arranged a meeting with Geraerts where she was going to tell me “her side of the story” and give me the data and we would look at my analyses together. Merckelbach and his other co-authors all enthusiastically supported this too, by the way. However at the last moment the chair of her department at Erasmus university got worried and stepped in and now an internal Rotterdam (=Erasmus) committee is investigating the allegations and Geraerts is not allowed to give anyone the data or talk to anyone about the problem.
I think this is totally crazy. First of all, the data set should have been made public years ago. Secondly, the fact that the co-authors of the paper never even saw the data themselves is a sign of poor research practices. Thirdly, getting university lawyers and having high level university ethics committees involved does not further science. Science is furthered by open discussion. Publish the data, publish the criticism, and let the scientific community come to its own conclusion. Hold a workshop where different points of view of presented about what is going on in these papers, where statisticians and psychologists communicate to one another.
Probably, Geraerts’s data has been obtained by some combination of the usual “questionable research practices” which are prevalent in the field in question. Everyone does it this way, in fact, if you don’t, you’ld never get anything published: sample sizes are too small, effects are too small, noise is too large. People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them and are just doing the best to make this as clear as possible to everyone.
RichardPS summary of my investigation of the papers contained in Geraert’s PhD thesis:
ch 8 Geraerts et al 2006b BRAT Long term consequences of suppression of intrusive anxious thoughts and repressive coping.
ch 9 Geraerts et al 2006 AJP Suppression of intrusive thoughts and working memory capacity in repressive coping.These two chapters show the pattern of four groups of subjects, three of which are very similar, while the fourth is strikingly different with respect to certain (but not all) responses.In the case of chapter 8, the groups which are expected to be similar are (just as in the already disputed Memory and JAb papers) actually much too similar! The average scores are closer to one another than one can expect on the basis of the observed within-group variation (1 over square root of N law).In the case of chapter 9, nothing odd seems to be going on. The variation between the average scores of similar groups of subjects is just as big as it ought to be, relative to the variation within the groups.
Geraerts et al (2008 Memory pdf). “Recovered memories of childhood sexual abuse: Current ﬁndings and their legal implications” Legal and Criminological Psychology 13, 165–176
It was Richard Gill who first told me about Diederik Stapel shortly after I started blogging, see an earlier post on Diederik. We were at a workshop on Error in the Sciences at Leiden in 2011. I was very lucky to have had Gill be the commentator/presenter of my paper—he was excellent!—and I thank him for these intriguing items. My puzzlements and reactions will follow in a separate post….
Richard: Thanks so much for the guest post. I have to admit some surprise that some examples had to do with recovered memories. I thought that whole area was dubbed pseudoscientific way back when “therapy induced” memories of childhood sexual abuse were discovered to be just that–manufactured, and a lot of people wrongly accused. But I realize this is an updated research program, acknowledging those earlier problems, but still…. You write: “People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them and are just doing the best to make this as clear as possible to everyone.” I think this is quite right, and thus these cases might best be described as interpreting the data in the light of one’s theory, rather than testing the theory.
What is a post hoc least significant difference test?
The Elbians have figured out how to put slides directly on the blog! Looks nice, doesn’t it?
Post-hoc least significant difference test: seems to go back to Fisher. Some kind of Bonferoni correction when you only report the nominally significant results while actually doing many tests? http://en.wikipedia.org/wiki/Post-hoc_analysis
About recovered memories: I do believe that Merckelbach – McNally – Geraerts do belong to a second wave in that field, honestly doing their best to do decent science.
About my remarks “people are not deliberately cheating” … these are directed at the general field of social psychology (another field, actually: the field of Stapel, Smeesters, …). Real effects are probably quite small, natural variation is large, sample sizes are small. The chance of getting significant p-values of all the hoped for non-null effects and none of the expected null-effecs if the experimenter actually did exactly what they appear to have done is negligeable. Yet people build up succesful research carreers doing this time and time again, and probably not by only publishing the results of, say, one in ten of their experiments. So what do they actually do? Hard to know without pre-published strictly described experimental protocols, catefully kept log-books of the whole procedure of data-collection, *selection*, *cleaning*, *stopping rule*, …