Here are some reflections on the repressed memory articles from Richard Gill’s post, focusing on Geraerts, et.al.,(2008).
1. Richard Gill reported that “Everyone does it this way, in fact, if you don’t, you’d never get anything published: …People are not deliberately cheating: they honestly believe in their theories and believe the data is supporting them and are just doing their best to make this as clear as possible to everyone.”
This remark is very telling. I recommend we just regard those cases as illustrating a theory one believes, rather than providing evidence for that theory. If we could mark them as such, we can stop blaming significance tests for playing a role in what are actually only illustrative attempts, or to strengthen someone’s beliefs about a theory.
2. I was surprised the examples had to do with recovered memories. Wasn’t that entire area dubbed a pseudoscience way back (at least 15-25 years ago?) when “therapy induced” memories of childhood sexual abuse (CSA) were discovered to be just that—therapy induced and manufactured? After the witch hunts that ensued (the very accusation sufficing for evidence), I thought the field of “research” had been put out of its and our misery. So, aside from having used the example in a course on critical thinking, I’m not up on this current work at all. But, as these are just blog comments, let me venture some off-the-cuff skeptical thoughts. They will have almost nothing to do with the statistical data analysis, by the way…
3. Geraerts, et.al., (2008, 22) admit at the start of the article that therapy-recovered CSA memories are unreliable, and the idea of automatically repressing a traumatic event like CSA implausible. Then mightn’t it seem the entire research program should be dropped? Not to its adherents! As with all theories that enjoy the capacity of being sufficiently flexible to survive anomaly (Popper’s pseudosciences), there’s some life left here too. Maybe , its adherents reason, it’s not necessary for those who report “spontaneously recovered” CSA memories to be repressors, instead they merely be “suppressors” who are good at blocking out negative events. If so, they didn’t automatically repress but rather deliberately suppressed: “Our findings may partly explain why people with spontaneous CSA memories have the subjective impression that they have ‘repressed’ their CSA memories for many years.” (ibid., 22).
4. Shouldn’t we stop there? I would. We have a research program growing out of an exemplar of pseudoscience being kept alive by ever-new “monster-barring” strategies (as Lakatos called them). (I realize they’re not planning to go out to the McMartin school, but still…) If a theory T is flexible enough so that any observations can be interpreted through it, and thereby regarded as confirming T, then it is no surprise that this is still true when the instances are dressed up with statistics. It isn’t that theories of repressed memories are implausible or improbable (in whatever sense one takes those terms). It is the ever-flexibility of these theories that renders the research program pseudoscience (along with, in this case, a history of self-sealing data interpretations).
5. Let’s give the researchers a bit more leeway. Let’s consider how they propose to “test” their hypothesized explanation. We still won’t need to look at the data for this…In Geraerts own research (as reported ‘in press’) “we found that the memories of CSA emerging during recovered memory therapy could not be corroborated, whereas those emerging outside therapy were corroborated just as often as memories of CSA that had never been forgotten.” (ibid., 23).
First of all, they could never have literally “found” that information, but let us grant for the sake of argument that they found the memories recovered in psychotherapy so unreliable that those spontaneously discovered/remembered are quite reliable in comparison. (They did not, by the way, check on the reliability of the CSA memories of their research subjects, so far as I can tell.) Doesn’t this admission show that recovered memory therapy was/is a highly unreliable practice? If repressed memory therapists managed to“uncover” CSA “memories” by essentially manufacturing them, then isn’t there a danger that they are capable of implanting yet more false impressions in their subjects? I just wonder about the self-criticism here…
6. The gist of what they claim to show is that participants “recruited through ads in papers”, (ibid., 24) who reported spontaneously recovered CSA are actually just very good at deliberately forgetting unpleasant things (as compared to a control group who report no abuse). Two other groups are recruited: one with therapy-discovered CSA, and a second with people who never forgot CSA. So 4 groups in all.
In the main part of the experiment, all the participants write down positive and negative (anxious) events from the past few years, then are asked to suppress thinking about them during a 2 minute “suppression period.” The negative events are not the long ago CSA events, by the way. If one of the “target thoughts” pop into their minds in the suppression period, they are to trigger a joystick. (Various stages of imagining, expressing and suppressing thoughts ensue. They take home a 7-day diary to keep up the reports.)
I take it the researchers didn’t register in advance what would count as a failed result. I mean, let’s say the therapy-discovered CSA group reported statistically significantly fewer occurrences of the negative target during those two minutes (instead of the sponaneous group). That might be interpreted as indicating they tend to obey therapists’ wishes (they suppress when they’re told to suppress). That too could have been a publishable result, helping to explain the rampant false memories in this general group.
What they claim they hoped to show is that those who report spontaneously recovered memories are not repressors even though they think they are. That is, they hope to show the spontaneous recoverers do not “automatically” blank out negative events. Instead they are “suppressors” (those who deliberately don’t think about negative events). Let’s grant that was the pre-data goal. But is there really a difference here? Those who report spontaneously recovered memories claim they really never thought about the CSA until the day it was spontaneously brought to mind, but Geraerts claims they actually had remembered it but they forgot they remembered it. So, we know in advance that self-described repressors are easily redescribed by the researchers as suppressors.
All of these points, and many more besides, would arise in a critique before even looking at any results. It is based on logic and some information of the flaws of this and related research programs.We do not say the theories are implausible, only that the onus is on the researchers to show how they will conduct a stringent test of their theories, but we do not see that.
Note that the above criticisms are quite separate from the statistical questions Professor Gill was called in to consider. We don’t need shrewd statistics to criticize this research–although maybe we do for fraud. Yet as fraudbuster-buster* Gill seems to be saying, there is a fine line between fraud and bad practices.
7. So what about the statistical analysis? “LSD tests indicated that people with spontaneous recovered memories reported significantly fewer occurrences of the anxious target thought than did the other groups.” (25) This is by means of Post-hoc Least-Significant-Difference (LSD) tests. Putting the best spin on the statistics, what is the upshot?
People reporting recovered memories are not repressors, but rather suppressors, as evidenced by the fact that they successfully block out negative events (when told not to think about them in an experiment), at least statistically significantly more often than do the other groups.
But notice that these people answered the ad, so they haven’t suppressed the memory of the CSA event. To Geraerts, further evidence that they are suppressors is the fact that they don’t think too much about the negative (target) event in the week after the experiment. But this seems irrelevant, since we know they remembered the CSA event.
But others are apparently giving the research greater mileage than I would. As Gill observes, “they honestly believe in their theories and believe the data is [are] supporting them”. I am prepared to be corrected by suppressors…
*This term seems more apt, now that I better understand Gill’s work in this arena.
REFERENCES:
Geraerts, E., McNally, R. J., Jelicic, M., Merckelbach, H., & Raymaekers, L. (2008). Linking thought suppression and recovered memories of childhood sexual abuse. Memory, 16, 22-28.
Gill, R. (2013). https://errorstatistics.com/2013/06/08/richard-gill-integrity-or-fraud-or-just-quesionable-research-practices/
Dear Deborah
Please do not refer to me as a fraudbuster! Fraud implies intention to deceive. I do not suggest there is fraud involved in these articles about recovered memories. Possibly the case arose because an ambitious and brilliant young psychologist is not given enough mentoring and works in an environment where there is not enough statistical expertise that people really understand what they are doing. It’s possible to deceive in good faith and with complete integrity. Especially when you are a dedicated scientist who firmly believes in their theories and who sees their job as to let their colleagues see the light.
I got interested and involved when journalists following the conflict between the authors asked me for advice on the statistics. In the mean time, the senior authors have asked the journal “Memory” to retract the paper, because they themselves doubt the data presented there. They and their opponent, lead author Geraerts, who does not want to retract the paper, have been asked to submit notes to the journal motivating their positions. So we have the beginnings of a public scientific discussion, and that is what I think is absolutely necessary.
Later I heard that already at the time of Geraerts’ thesis several psychologists had had their doubts about the results, “too good to be true” but from a content matter point of view, not from a statistical point of view. This led me to look at some of the papers in the thesis, while I wait to be allowed to see the data of the Memory paper. There were two papers with the same basic experimental design and the same theoretical expectations; one of them strikingly showed the same statistical “too good to be true” pattern, the other didn’t at all. Mystery remains unresolved. None of the co-authors have the data, Geraerts is not allowed to talk to me.
What we don’t need, I think, embargos on free exchange of scientific information and secret investigations by university ethical standards committees, leading perhaps to disciplinary action by the university against a scientist. This is not science. Instead, organize a workshop! Let Geraerts post the Memory article data on internet (where it should have been posted long, long ago)!
About fraud-busters: Uri Simohnsohn is a famous fraud-buster. He saw indications of “too good to be true” in publications of social psychologist Dirk Smeesters. He corresponded with Smeesters on his findings, got some of the raw data, got further evidence of “too good to be true”. Smeesters refused to admit anything wrong and both Simonsohn and Smeesters reported the situation to Smeesters’ university authorities. There is quite strong statistical evidence of actual fabrication of data in this case, but Smeesters has always denied that. There was a secret internal investigation. Smeesters resigned to avoid being fired and many of his papers have been withdrawn. I think there are statistical flaws in the statistical methodology used by the investigative committee but no-one on the committee is allowed to discuss what they did. (They used a “false discovery rate” approach to control for false positives in multiple testing, but this comes down to assuming a priori a certain rate of true positives. ie an a priori assumption of guilt.)
This is not science. I think Simonsohn should have written up his findings for publication and submitted them for peer review; not reported Smeesters for suggested fraud to his university authorities. (That might have happened later, but that is another matter).
There is a witch-hunt going on at the moment, which is not healthy, and will certainly lead to miscarriages of justice. University disciplinary committees and administrators and lawyers don’t understand subtle statistical issues and don’t see a difference between so-called “questionable research practices” – which are widespread and even accepted in some fields and indeed necessary in order to be a succesful researcher – and fraud.
Gill: Thank you for this. I think I picked up the term from you, even though I think I made it clear that you were very generous in your interpretation here. I might have surmised in other case you might not be. But I will cross out that term. You might even be a fraud-buster-buster. And that is the most interesting and really new thing that is coming out of this discussion; that is, your critique of both the existence and methods used in these searches for possible questionable practices. I think you have a very good point! I will have more to say later today.
Richard:
First, I deliberately restricted my critique to logical/methodological issues that would arise before even looking at the data. (I don’t know if that’s the same as what you call finding it too good to be true on the basis of “content”.) Second, and most interestingly, to me, is your remark: “I think there are statistical flaws in the statistical methodology used by the investigative committee but no-one on the committee is allowed to discuss what they did. (They used a “false discovery rate” approach to control for false positives in multiple testing, but this comes down to assuming a priori a certain rate of true positives. ie an a priori assumption of guilt.)” I share your doubts here, but I want to understand the problem better. Can you please explain?
Third, I think you raise a very good point about the proper scientific way to go about criticizing other people’s work. Thanks for bringing clarity into this matter.
This analysis brings out a fairly devastating problem with the whole experiment: the subjects in the spontaneous memory group can’t be representative of people with aptitude to put CSA out of their minds (because they are unpleasant memories to think about). Why not? Because that group is not answering an advertisement to voluntarily be a subject in an investigation of CSA memories.
The thing is, we don’t know what Geraerts asked the subjects when she interviewed them. Did she ask if they thought about the CSA now that they spontaneously remembered it? True, you’d think if they were keen to forget those events, the last thing they’d be doing is joining up for a research study on this. But also, with respect to what might have been communicated during the interview that took place to classify subjects, did she hint that they were wondering about their skill in blocking things out? Who knows, my point was to emphasize that it’s the flexibility in theory and data interpretation, and lack of evidence that the new protocol will avoid known problems in this area, that opens the research to criticism.
But Gill has just raised a very important potential criticism of the use of statistics in statistical fraud-busting that I hadn’t recognized…growing out of the Smeesters case…..
Through some e-mails with Gill, I now have some clarification of his comment above regarding the Smeesters case: (They used a “false discovery rate” approach to control for false positives in multiple testing, but this comes down to assuming a priori a certain rate of true positives. ie an a priori assumption of guilt.)”
I will just take a passage from his webpage (where he discusses Smeesters)
http://www.math.leidenuniv.nl/~gill/,
on some “provisional” criticisms.
“Erasmus-CWI uses the pFDR method (positive – False Discovery Rate) in some kind of attempt to control for multiple testing. In my opinion, adjustment of p-values by pFDR methodology is absolutely inappropriate in this case. It includes a guess or an estimate of the a priori ‘proportion of null hypotheses to be tested which are actually false’. Thus it includes a ‘presumption of guilt’! … This methodology was invented for massive cherry-picking experiments in genome wide association studies. It was not invented to correct for multiple testing in the traditional sense, when the simultaneous null hypothesis should be taken seriously. Innocent till proven guilty. Not proven guilty by an assumption that you are guilty some significant proportion of the times. In order that Smeesters himself is protected from cherry picking by Erasmus-CWI he should have insisted on a Bonferroni correction of the p-values which they report; a much stronger requirement than the pFDR correction. … There are hundreds of different pFDR methods, which one was used?” (Richard Gill)
Some more reasons the question the study. Being told not to think about negative events has no relation to choosing to block them out. Two minutes?