David Mellor, from the Center for Open Science, emailed me asking if I’d announce his Preregistration Challenge on my blog, and I’m glad to do so. You win $1,000 if your properly preregistered paper is published. The recent replication effort in psychology showed, despite the common refrain – “it’s too easy to get low P-values” – that in preregistered replication attempts it’s actually very difficult to get small P-values. (I call this the “paradox of replication”.) Here’s our e-mail exchange from this morning:
Dear Deborah Mayod,
I’m reaching out to individuals who I think may be interested in our recently launched competition, the Preregistration Challenge (https://cos.io/prereg). Based on your blogging, I thought it could be of interest to you and to your readers.
In case you are unfamiliar with it, preregistration specifies in advance the precise study protocols and analytical decisions before data collection, in order to separate the hypothesis-generating exploratory work from the hypothesis testing confirmatory work.
Though required by law in clinical trials, it is virtually unknown within the basic sciences. We are trying to encourage this new behavior by offering 1,000 researchers $1000 prizes for publishing the results of their preregistered work.
Please let me know if this is something you would consider blogging about or sharing in other ways. I am happy to discuss further.
David Mellor, PhD
Project Manager, Preregistration Challenge, Center for Open Science
David: Yes I’m familiar with it, and I hope that it encourages people to avoid data-dependent determinations that bias results. It shows the importance of statistical accounts that can pick up on such biasing selection effects. On the other hand, coupling prereg with some of the flexible inference accounts now in use won’t really help. Moreover, there may, in some fields, be a tendency to research a non-novel, fairly trivial result.
And if they’re going to preregister, why not go blind as well? Will they?
David Mellor 10:45AM
We’re working now on our evaluation of the effect of preregistration to try to answer those two questions. The question of whether or not people only register trivial questions will be answerable through content expert evaluation asking people to evaluate a series of research questions from registered and similar, unregistered publications. So far we have seen some replications, but also several novel research ideas come through the submission process.
I believe that a similar method could be used to evaluate the degree to which flexibility in inference is affected by preregistration. We are requiring as much of that as possible in the preregistrations, so even if it’s not bullet proof, I think this does get to a big part of the problems.
We’re not requiring blind data analysis at this point, mostly because we already have a lot of requirements for the competition as it stands. We’re hoping to nudge individuals to a greater number of best practices, including blind data analysis and data sharing, but want to meet people where they are as much as possible and encourage better practices from there.
I think David’s reply is interesting, and maybe a little surprising (in a good way)–one reason I’m posting this. Here are some scattered remarks:
The fact that you have a hard time replicating a significant finding attained thanks to hunting and flexible methods is actually a point in favor of significance testing: biasing selection effects show up (in an invalid P-value) and thus are detectable. This can be demonstrated. It is an open question as to whether methods like Bayes ratios (as currently used) have any similar, built-in alarm mechanisms.
In a curmudgeonly mood, let me kvetch some more. A crucial problem with most studies goes beyond the formal statistics to the question of erroneously taking statistical effects as giving information about research hypotheses. I don’t see sufficient attention being paid to this. It’s a well-known fallacy to take statistical significance as substantive significance (when the substantive claim hasn’t been well probed by the statistical effort). So it’s blocked by a non-fallacious use of statistical tests. By contrast, a popular way to block this is to give a high prior to the “no effect” null. The trouble is, this enables (rather than showing what’s fallacious about) moving from statistical to substantive. Rather than a methodological criticism, as it should be, it become a disagreement about the plausibility or truth of the substantive claim.
I’d like to start seeing rewards given to papers that critically examine, and perhaps falsify, some of the standard experimental and measurement assumptions underlying questionable research in the social sciences. For some discussion on this blog, see .
What do readers think?
 The paradox of replication
Critic 1: It’s much too easy to get small P-values.
Critic 2: We find it very difficult to get small P-values; only 36 of 100 psychology experiments were found to yield small P-values in the recent Open Science collaboration on replication (in psychology).
For a resolution to the paradox, see this post.