I had said I would label as pseudoscience or questionable science any enterprise that regularly permits the kind of ‘verification biases’ in the laundry list of my June 1 post. How regularly? (I’ve been asked)
Well, surely if it’s as regular as, say, much of social psychology, it goes over the line. But it’s not mere regularity, it’s the nature of the data, the type of inferences being drawn, and the extent of self-scrutiny and recognition of errors shown (or not shown). The regularity is just a consequence of the methodological holes. My standards may be considerably more stringent than most, but quite aside from statistical issues, I simply do not find hypotheses well-tested if they are based on “experiments” that consist of giving questionnaires. At least not without a lot more self-scrutiny and discussion of flaws than I ever see. (There may be counterexamples.)
Attempts to recreate phenomena of interest in typical social science “labs” leave me with the same doubts. Huge gaps often exist between elicited and inferred results. One might locate the problem under “external validity” but to me it is just the general problem of relating statistical data to substantive claims.
Experimental economists (expereconomists) take lab results plus statistics to warrant sometimes ingenious inferences about substantive hypotheses. Vernon Smith (of the Nobel Prize in Econ) is rare in subjecting his own results to “stress tests”. I’m not withdrawing the optimistic assertions he cites from EGEK (Mayo 1996) on Duhem-Quine (e.g., from “Rhetoric and Reality” 2001, p. 29). I’d still maintain, “Literal control is not needed to attribute experimental results correctly (whether to affirm or deny a hypothesis). Enough experimental knowledge will do”. But that requires piece-meal strategies that accumulate, and at least a little bit of “theory” and/or a decent amount of causal understanding.
I think the generalizations extracted from questionnaires allow for an enormous amount of “reading into” the data. Suddenly one finds the “best” explanation. Questionnaires should be deconstructed for how they may be misinterpreted, not to mention how responders tend to guess what the experimenter is looking for. (I’m reminded of the current hoopla over questionnaires on breadwinners, housework and divorce rates!) I respond with the same eye-rolling to just-so story telling along the lines of evolutionary psychology.
I apply the “Stapel test”: Even if Stapel had bothered to actually carry out the data-collection plans that he so carefully crafted, I would not find the inferences especially telling in the least. Take for example the planned-but-not-implemented study discussed in the recent New York Times article on Stapel:
Stapel designed one such study to test whether individuals are inclined to consume more when primed with the idea of capitalism. He and his research partner developed a questionnaire that subjects would have to fill out under two subtly different conditions. In one, an M&M-filled mug with the word “kapitalisme” printed on it would sit on the table in front of the subject; in the other, the mug’s word would be different, a jumble of the letters in “kapitalisme.” Although the questionnaire included questions relating to capitalism and consumption, like whether big cars are preferable to small ones, the study’s key measure was the amount of M&Ms eaten by the subject while answering these questions….Stapel and his colleague hypothesized that subjects facing a mug printed with “kapitalisme” would end up eating more M&Ms.
Stapel had a student arrange to get the mugs and M&Ms and later load them into his car along with a box of questionnaires. He then drove off, saying he was going to run the study at a high school in Rotterdam where a friend worked as a teacher.
Stapel dumped most of the questionnaires into a trash bin outside campus. At home, using his own scale, he weighed a mug filled with M&Ms and sat down to simulate the experiment. While filling out the questionnaire, he ate the M&Ms at what he believed was a reasonable rate and then weighed the mug again to estimate the amount a subject could be expected to eat. He built the rest of the data set around that number. He told me he gave away some of the M&M stash and ate a lot of it himself. “I was the only subject in these studies,” he said.
He didn’t even know what a plausible number of M&Ms consumed would be! But never mind that, observing a genuine “effect” in this silly study would not have probed the hypothesis. Would it?
II. Dancing the pseudoscience limbo: How low should we go?
Should those of us serious about improving the understanding of statistics be expending ammunition on studies sufficiently crackpot to lead CNN to withdraw reporting on a resulting (published) paper?
“Last week CNN pulled a story about a study purporting to demonstrate a link between a woman’s ovulation and how she votes, explaining that it failed to meet the cable network’s editorial standards. The story was savaged online as “silly,” “stupid,” “sexist,” and “offensive.” Others were less nice.”
That’s too low down for me.…(though it’s good for it to be in Retraction Watch). Even stooping down to the level of “The Journal of Psychological Pseudoscience” strikes me as largely a waste of time–for meta-methodological efforts at least.
I was hastily making these same points in an e-mail to A. Gelman just yesterday:
E-mail to Gelman: Yes, the idea that X should be published iff a p<.05 in an interesting topic is obviously crazy.
I keep emphasizing that the problems of design and of linking stat to substantive are the places to launch a critique, and the onus is on the researcher to show how violations are avoided. … I haven’t looked at the ovulation study (but this kind of thing has been done a zillion times) and there are a zillion confounding factors and other sources of distortion that I know were not ruled out. I’m prepared to abide such studies as akin to Zoltar at the fair [Zoltar the fortune teller]. Or, view it as a human interest story—let’s see what amusing data they collected, […oh, so they didn’t even know if women they questioned were ovulating]. You talk of top psych journals, but I see utter travesties in the ones you call top. I admit I have little tolerance for this stuff, but I fail to see how adopting a better statistical methodology could help them. …
Look, there aren’t real regularities in many, many areas–better statistics could only reveal this to an honest researcher. If Stapel actually collected data on M&M’s and having a mug with “Kapitalism” in front of subjects, it would still be B.S.! There are a lot of things in the world I consider crackpot. They may use some measuring devices, and I don’t blame those measuring devices simply because they occupy a place in a pseudoscience or “pre-science” or “a science-wannabe”. Do I think we should get rid of pseudoscience? Yes! [At least if they have pretensions to science, and are not described as “for entertainment purposes only”.] But I’m afraid this would shut down [or radically redescribe] a lot more fields than you and most others would agree to. So it’s live and let live, and does anyone really think it’s hurting honest science very much?
There are fields like (at least parts of) experimental psychology that have been trying to get scientific by relying on formal statistical methods, rather than doing science. We get pretensions to science, and then when things don’t work out, they blame the tools. First, significance tests, then confidence intervals, then meta-analysis,…do you think these same people are going to get the cumulative understanding they seek when they move to Bayesian methods? Recall [Frank] Schmidt in one of my Saturday night comedies, rhapsodizing about meta-analysis:
“It means that the behavioral and social sciences can attain the status of true sciences: they are not doomed forever to the status of quasi-sciences or pseudoscience. ..[T]he gloom, cynicism, and nihilism that have enveloped many in the behavioral and social sciences is lifting. Young people starting out in the behavioral and social sciences today can hope for a much brighter future.”(Schmidt 1996)
III. Dale Carnegie salesman fallacy:
It’s not just that bending over backwards to criticize the most blatant abuses of statistics is a waste of time. I also think dancing the pseudoscientific limbo too low has a tendency to promote its very own fallacy! I don’t know if it has a name, so I made one up. Carnegie didn’t mean this to be used fallaciously, but merely as a means to a positive sales pitch for an idea, call it H. You want to convince a person of H? Get them to say yes to a series of claims first, then throw in H and let them make the leap to accept H too. “You agree that the p-values in the ovulation study show nothing?” “Yes” “You agree that study on bicep diameter is bunk?” “Yes, yes”, and “That study on ESP—pseudoscientific, yes?” “Yes, yes, yes!” Then announce, “I happen to favor operational probalogist statistics (H)”. Nothing has been said to advance H, no reasons have been given that it avoids the problems raised. But all those yeses may well lead the person to say yes to H, and to even imagine an argument has been given. Dale Carnegie was a shrewd man.
 Vernon Smith ends his paper:
My personal experience as an experimental economist since 1956 resonates, well with Mayo’s critique of Lakatos: “Lakatos, recall, gives up on justifying control; at best we decide—by appeal to convention—that the experiment is controlled. … I reject Lakatos and others’ apprehension about experimental control. Happily, the image of experimental testing that gives these philosophers cold feet bears little resemblance to actual experimental learning. Literal control is not needed to correctly attribute experimental results (whether to affirm or deny a hypothesis). Enough experimental knowledge will do. Nor need it be assured that the various factors in the experimental context have no influence on the result in question—far from it. A more typical strategy is to learn enough about the type and extent of their influences and then estimate their likely effects in the given experiment”. [Mayo EGEK 1996, 240]. V. Smith, “Method in Experiment: Rhetoric and Reality” 2001, 29.
My example in this chapter was linking statistical models in experiments on Brownian motion (by Brown).
 I actually like Zoltar (or Zoltan) fortune telling machines, and just the other day was delighted to find one in a costume store on 21st St.
1. You write, “bending over backwards to criticize the most blatant uses of statistics is a waste of time.” I don’t know what is meant by “bending over backwards.” Is the writing of blog posts and scientific articles a form of “bending over backwards”? “Bending over backwards” suggests to me a form of contortion, but when I write about these studies I am being as direct as possible. I don’t seek out these papers; people send them to me.
2. These flawed papers are published in the top journals. So: (a) some of these results do get believed, they get taken seriously in the news media and by other scientists, (b) the people who get these publications in top journals get recognition, promotion, etc., and do more of the same, (c) this reward structure encourages other researchers to do more of such studies and discourages them from doing more serious work which might require more effort with less expected payoff.
3. By thinking more seriously about how to analyze such data better, I can make methodological progress. I got a lot of insight into the notorious example a few years ago about beauty and sex ratios. The published studies that I was criticizing were bad for many reasons. But I didn’t just shoot them down using the easiest method possible. I took the studies seriously and thought hard about how better to analyze such data.
4. I don’t know what you mean by “operational probalogist statistics.” But if the topic is Bayesian statistics, I’ve already written two books full of applications of Bayesian methods. I certainly don’t need to talk about flawed ESP studies etc etc to make the point that Bayesian methods can be useful. I write about these examples for reasons 1,2,3 above, not out of any sales pitch.
Andrew: Thank you for your very thoughtful comments.
Point #1.Yes, I’m afraid this was a bit of stretching the limbo metaphor. In a sense it’s far too easy to knock them down, like using nukes to kill fleas. My sense of irony kicks in, a kind of stop loss as regards papers with titles like ‘Ovulation leads women to perceive sexy cads as good dads’. http://psycnet.apa.org/psycinfo/2012-12669-001/
Nearly everything is marketing, commercial, entertainment, and this is no different. It’s as if they’re making fun of themselves (like the Inquirer rag), it’s all “wink, wink” just a little joke. If the journal editors and peer reviewers are not aghast, then the healthiest reaction seems to be: live and let live.
(Better than having one’s head explode, as in the cartoon from the link in Stan Young’s last post.)
Point #2. I agree with your (a)-(c), and there are a lot of areas where the reward structure strikes me as perverse. Maybe I’m applying the “just war theory,” to pick battles: It stipulates “that there is a reasonable possibility of success”. I’d like to know: what do you think a plausible solution is? You doubtless know these journals better than I, do you think there’s a prayer’s chance of their doing an about face? I don’t see it.
#3. Your point 3, is surely the most constructive, but after the tenth time, serious diminishing returns set in.
#4.On point 4, again, I’d like to know if the presumption is that more or better statistics can rectify these studies. I can think of some methods making things worse, so I think it’s important to explain if there’s a presumed remedy, as opposed to, say “just stop”. (Although “just stop” would suit me, I don’t see it happening.) I don’t think you presuppose without argument to have a better way, but many, many people fall into the slippery slide (Dale Carnegie fallacy). I’m calling for a constructive upshot. Or do you think the present state of play in the “statistics wars” is fruitful?
I’m surprised that “Stapel dumped most of the questionnaires into a trash bin outside campus” rather than, say, shredding and/or burning them. Wouldn’t it have been a riot if students had followed him to the dumpster, pulled out the questionnaires after he left, and filled them out, placing them in his mailbox?
If we take the broad ‘error statistical critique,’ covering, as it surely does, these informal and formal critiques from bad statistics to pseudoscience, then it is just what critical rationalists from Popper to Musgrave have ordered: a self-correcting/error correcting tool such that applications are criticized by further applications of the self same method. False p-values, misspecifications, poor controls—not only exposed, but revealed so simply that we have all learned how to get good at it, even CNN watchers. The statistical abusers are left holding the reins: to choose methodological progress, driven by the criticism (as found by Andrew Gelman, third point), or to choose to remain in the swamp of pseudoscience, aware that their status is fully recognized, thereby entitling them to wear with honesty the sign on Zoltar: “for entertainment only”. I know of no other statistical methodology that has shown its mettle so aptly for honest self-disclosure and for putting the kibosh on wishful thinking.
Good to have discovered this website! Same concerns troubled me but many scientists reject such discussions as demagogy. Difficult to fit the debates anywhere in “mainstream” social sciences
It might just be that we lack certainty in own perceptions..?
I have got myself into a heated debate about the subject in a thread https://www.linkedin.com/groupItem?view=&gid=4292855item=233366048&type=member&commentID=144853857&trk=hb_ntf_COMMENTED_ON_GROUP_DISCUSSION_YOU_CREATED#commentID_144853857 as you can see some were downright rude
Would also be great to have your opinion on the articles I have written thus far http://www.kon.org/urc/urc_research_journal12.html
Back to reading!