Solving the replication crisis (FTX proposal)post by Michael_Wiebe · 2022-04-25T21:04:06.674Z · EA · GW · 19 comments
Please describe your project in under 100 words. If the project has a website, what’s the URL? Please describe what you are doing very concretely—not just goals and long-term vision, but specifically what you are doing in the next few months. What’s the case for your project? How long have you been working on this project, and how much has been spent on it? What has been achieved so far? Do you have any reservations about your project? Is there any way it could cause major harm? If so, what are you going to do to prevent that? What will it look like if your project has gone poorly / just OK / well at that time? None 19 comments
Here's my rejected FTX proposal (with Abel Brodeur) to solve the replication crisis by hiring full-time replicators. (I left out the budget details.) [Added some edits in brackets for more context.]
Please describe your project in under 100 words.
We will actually solve the replication crisis in social science by hiring a “red team” of quantitative researchers to systematically replicate new research. Currently, there are few penalties for academics and journals that publish unreliable research, because few replications are attempted. We will fundamentally change academic incentives by making researchers know that their work will be scrutinized, which will motivate them to improve research design, or else face a loss of reputation. By fixing scientific institutions now, we can reap the compounding benefits of reliable knowledge over the long-term future.
If the project has a website, what’s the URL?
Please describe what you are doing very concretely—not just goals and long-term vision, but specifically what you are doing in the next few months.
Currently, the Institute for Replication is using volunteers to systematically reproduce and replicate new studies from leading journals in economics and political science. With funding from FTX, we can hire a Project Scientist (Michael Wiebe), post-docs, and research assistants to massively scale-up reproductions and replications. [Definitions: "reproduce" = being able to run the code and obtain the results that are in the paper; "replicate" = re-analyzing the paper using different methods and/or different data. Note that most social science papers use observational data and are not lab experiments.]
We can also launch a cash prize for completed replications, to incentivize even more replications. This can be implemented in several ways; for example, giving a prize of ~$1000 for high quality replications completed using the Social Science Reproduction Platform, as judged by a panel of experts. [We're open to revising this number, eg. to $5k; most grad students replicate papers for coursework, so it might not take much to incentivize them to submit.]
What’s the case for your project?
Social science is facing a replication crisis. Researchers produce unreliable findings that often do not replicate, and the root problem is the lack of replications.
Academics have basically no incentive to perform replications, since they usually do not yield original findings, and are not valued by journals. Since they do not lead to publications, replications do not help academics get tenure, and hence few are attempted. The replications that are done are conducted by volunteers in their spare time, and can even have negative career effects if they upset powerful academics.
The rareness of replications causes peer review to be an inadequate form of quality control. Knowing that research won’t be closely scrutinized, journals and referees have little incentive to check for data quality issues, coding errors, or robustness. If a paper with unreliable findings gets published, the journal suffers no loss in reputation, because no one will replicate the paper to expose its flaws. Hence, referees take empirical results at face value, and focus instead on framing the research question and appropriately citing the literature.
Knowing that their work will not be reproduced nor replicated, most researchers don’t invest time in preparing replication packages, and don’t check for data or coding errors. The result is entire fields with serious reproducibility problems.
We can fix these incentives by investing heavily in reproduction and replication, and making a big push to systematically replicate new research. With a team of full-time replicators and cash prizes for completed replications, researchers will now expect their work to be immediately scrutinized as a regular practice. This scrutiny will put researchers’ reputations on the line: if their findings are not robust, their work will not be cited (or worse, be retracted), ultimately affecting their promotion and tenure outcomes. At the same time, high-quality work will be rewarded. A big push will attract widespread attention to amplify these reputation effects. Hence, researchers will put more effort into better research design and fixing errors before submitting for publication. To avoid a reputation for publishing unreliable findings, journals will improve their peer review standards. The end result is a scientific literature containing reliable knowledge, to help guide our species through the long-term future.
How long have you been working on this project, and how much has been spent on it?
The Institute for Replication was launched in January, with no funding raised yet.
What has been achieved so far?
We are collaborating with a large team of researchers to reproduce and replicate studies in economics and political science. We have already reproduced over 200 studies, and are currently working with about 50 independent researchers to replicate 30 studies. (See here for precise definitions of ‘reproduce’ and ‘replicate’.) We have built a large network of researchers interested in reproductions and replications. Our collaborators include journal editors, data editors, and reproducibility analysts at selected journals. We have already put together many special journal issues dedicated to replications. We have also conducted a survey of editors of leading outlets in economics, finance, and political science to help replicators identify journals interested in publishing replications.
Do you have any reservations about your project? Is there any way it could cause major harm? If so, what are you going to do to prevent that?
We expect failure to look like a null effect: no one pays attention. We would publish negative replications, but researchers and editors would not change their practices, and departments would not change their tenure and promotion decisions. One possible harm is executing the project badly and giving replication a bad reputation. We can prevent this by giving prizes only to top quality replications, and requiring transparency by allowing the original authors to publicly respond to replications of their work. I4R already has a conflict of interest policy.
Another possible harm is negative replications being taken as evidence of cheating by researchers, as opposed to honest mistakes; this could lead to a backlash against replication. We can prevent this by encouraging a culture of charitability, with replicators giving authors the benefit of the doubt when discussing problems and errors.
What will it look like if your project has gone poorly / just OK / well at that time?
Poorly: we are unable to hire some of the post-docs; replications are low quality; less than 100 replications completed. (We are currently at 30 ongoing replications accepted since mid-January, so we naively expect 100 per year.)
Just OK: we successfully hire 5 post-docs and RAs; replications are high quality, adding important robustness checks (with positive or negative results); 250 replications completed; some media coverage; some replications and retractions published in original journals.
Well: we successfully hire 10+ postdocs and RAs; high quality replications; 500 replications completed; widespread media coverage; replications published in original journals; negative replications lead to retractions from journals; journals implement new peer review standards; departments account for replications/retractions in tenure/promotion decisions.
Comments sorted by top scores.