Science isn’t facing a reproducibility catastrophe, but the scientific community could take important steps to improve reporting and replicability, according to a new report from the US National Academies of Sciences, Engineering, and Medicine.
“There is no crisis but also no time for complacency,” says Harvey V. Fineberg, president of the Gordon and Betty Moore Foundation and chair of the National Academies committee that examined the issue.
▸ Reproducibility: Obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis. Also called computational reproducibility.
▸ Replicability: Obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
Source: Reproducibility and Replicability in Science, National Academies Press, 2019.
Scientists’ inability to replicate or confirm research studies—and few incentives for them to try—has been an increasing concern in the research community. That spurred the US Congress to ask for this study, which was funded primarily by the National Science Foundation. The National Academies committee focused on computational reproducibility because of its role as a tool across all sciences.
The report is relevant to chemistry, especially in computational chemistry and machine learning, says computational chemist Joshua Schrier, a professor at Fordham University. “The challenges in chemistry really mirror the challenges in other fields.”
The meaning of reproducibility and replicability currently vary widely among research disciplines, so the report lays out specific definitions of the terms. It also makes a wide variety of recommendations for actions that scientists, federal research funders, scientific societies, journal publishers, and others can take to improve reproducibility and replicability.
Among its recommendations, the committee suggests scientists should strive to include complete information on their data, study methods, and the computation environment when publishing their work. They also should present the possible sources of uncertainty in their studies.
Universities, companies, and scientific societies should provide training on the proper use of statistics for analyzing data. Schrier points to several initiatives, such as the Molecular Sciences Software Institute, that are working on chemistry training modules to teach people best practices.
US federal funding agencies should invest in the development of open-source tools to support reproducibility. The NSF in particular should fund research into what level of reproducibility is reasonable, as well as work to develop data repositories and other related reproducibility tools. A centralized repository system would be especially helpful, Schrier says, because currently “a lot of these extra materials for computational reproducibility are scattered among a lot of different services.”
Journals should identify ways to make sure claims made in their publications are reproducible. Currently, methods sections do not provide enough information to allow other scientists to replicate results, and the report suggests journals should accept more file formats for supplemental data. Schrier says he’s encouraged to see more chemistry articles already linking to repositories or finding other ways of disseminating either data or programs used in the research.
The committee also noted that just because research isn’t reproducible doesn’t mean it is wrong. Differences in hardware or software are among the reasons a replica of a study might not reach the same conclusion. “Full reproducibility is not always possible,” says committee member Juliana Freire, a computer science professor at New York University.