Mycoplasma contamination (blue fuzz, bottom) can be a source of irreproducibility. Healthy control cells (top) are shown for comparison. Each stained nucleus (blue circles) is ~6–10 µm.
Reproducibility is a hallmark of science—at least, it’s supposed to be. Over the past several years, it’s become increasingly apparent that many lab studies, especially in the life sciences, are not reproducible. As a result, many putative drug targets or diagnostic biomarkers can’t be validated. Some estimates suggest that more than 50% of all published life sciences research is irreproducible, and some indicate that the figure might be even higher.
The problem flew below the radar for years. A 2012 comment in Nature by C. Glenn Begley, a former vice president at Amgen, and Lee M. Ellis, an oncologist at the University of Texas M. D. Anderson Cancer Center, drew attention to the problem. They described Amgen scientists’ attempts to replicate the key findings in 53 “landmark” fundamental cancer studies that claimed to identify potential new drug targets. They were able to replicate the findings in only 11% of the cases.
“When we first published that paper, we got hate mail,” says Begley, who is now chief scientific officer at TetraLogic Pharmaceuticals. “People would accost me at meetings and say, ‘How dare you publish something like that?’ ”
Fortunately for Begley, over time, researchers have come around. “People have generally recognized there is a problem,” he says.
If the first step is admitting there’s a problem, the research community may be on the road to recovery. Nonprofit organizations, funders, and journals have launched initiatives addressing cell line authentication, characterization and handling of tissue samples, public assay databases, and transparency of research. One organization is even attempting to replicate 50 cancer biology studies. Individually, each program tackles a small piece of the problem, but together they could add up to real change.
One source of irreproducibility in biological experiments is misidentified cells. Cultured cells that are supposed to be one thing and turn out to be another lead to misinterpretation of data and claims that don’t hold up in replication attempts.
For example, a cell line can be contaminated with bacteria called mycoplasma or with another cell line. Or a cell line might have started out as the right one but genetically drifted during repeated cell divisions.
To authenticate their cells, researchers don’t need a new standard. Instead scientists need to follow the one that already exists: using polymerase chain reaction analysis to measure characteristic DNA regions known as short tandem repeats. The number of repeating units is characteristic of a particular cell line.
That assay is pretty cheap, says Leonard P. Freedman, president of the Global Biological Standards Institute (GBSI), an organization that advocates the development and adoption of standards and best practices to improve the overall quality of biological research. But compliance is low, estimated to be only 15–30%, a figure Freedman finds “shocking.”
At GBSI’s BioPolicy Summit, held last month in Washington, D.C., Freedman announced that the organization will launch a social media awareness and advocacy campaign in January to draw attention to the need for cell line authentication. That campaign, with the Twitter hashtag #authenticate, is part of a larger initiative called Reproducibility2020 that GBSI will roll out in the new year.
One way to get researchers on board is for funders to require cell line authentication. Jon R. Lorsch, director of the National Institute of General Medical Sciences, said at the GBSI summit that NIGMS is making “targeted investments in better ways to authenticate.” NIGMS is considering requiring that grant applicants include a statement about authentication in their proposals. But there are concerns about whether grants will include money for such testing or whether it will become an unfunded mandate.
At least one nonprofit research foundation is requiring its grant recipients to perform such analysis. For the past year, the Prostate Cancer Foundation has requested that grantees report mycoplasma analyses and cell authentication results in their annual progress reports. “If they were smart, they did an authentication assay up front,” Howard R. Soule, chief scientific officer at the Prostate Cancer Foundation, said at the GBSI summit. Otherwise, to satisfy the reporting requirements, researchers have to run an authentication assay after the research is complete, an unhelpful exercise.
Another area that has been plagued by irreproducibility is biomarker discovery and development. Many putative biomarkers have failed in validation studies because of this problem. The National Biomarker Development Alliance (NBDA), a nonprofit organization based at Arizona State University, is working with the research community to establish a process for biomarker development that proceeds seamlessly from early discovery to validation and approved diagnostics.
“Most people are developing biomarkers because they want to have a surrogate end point for a clinical finding,” says Anna Barker, director and president of NBDA. But “most people don’t start with the right clinical question.” Without the right clinical question, inappropriate samples may be used for discovery studies. A major problem is that researchers often use tissue samples that were acquired for patient care rather than for basic research. Patient samples are good enough for morphological and histological analyses but oftentimes not good enough for molecular-level analyses.
“Most researchers are very happy to get their hands on any samples that they can,” says Carolyn C. Compton, chief medical officer at NBDA. “You can have a perfect analytical test—high performing, accurate, reproducible—and still get the wrong answer if you bugger up the starting material.”
Last week, NBDA held two back-to-back “convergence conferences” that brought together a range of stakeholders, including analytical experts, instrument manufacturers, pathologists, and other researchers, to identify the best ways to acquire and handle biospecimens for genomic and proteomic biomarker discovery analysis.
“We’re trying to define the top three to five quality-compromising steps in specimen handling that we can address on a practical level,” Compton says. “Since there are no standards in place right now, the results certainly have to be better than what we’ve got.” If NBDA succeeds in identifying appropriate actions, the College of American Pathologists has said that it will enforce them through its laboratory accreditation system. “That would be transformational for translational research,” Compton says.
“I think we’ll have standards that actually impact discovery within three years,” Barker says. “We have to have those. Otherwise, this lack of reproducibility is going to continue.”
Another project to improve the transparency of how biospecimens have been handled for biomarker discovery is being sponsored by GBSI. For that initiative, Joshua LaBaer and coworkers at Arizona State University’s Biodesign Institute are developing a registry into which everybody who prepares clinical blood or tissue samples will be encouraged to deposit their protocol.
“There are really only two requirements,” LaBaer says. The first is to have a standard operating procedure (SOP). “Right there, we’re going to have a huge success, because I think a lot of places don’t do that,” he adds. The second is that users must submit SOPs in a standardized format that everyone can read.
To encourage researchers to participate, the protocols won’t be peer-reviewed. But LaBaer thinks that as the database grows, an informal peer review process will develop. “Over time, peer pressure will get people to start improving their sample methods,” he says. “Once we start discovering which SOPs work well, I think everybody will coalesce around those.”
The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) has been ahead of the curve in trying to develop reproducible methods for biomarker discovery and development. From 2006 to 2011, CPTAC ran an initiative to address analytical variability issues. The consortium sent identical samples to multiple labs to identify and quantify proteins. It took multiple rounds of testing, but eventually the results were reproducible from lab to lab (C&EN, Aug. 10, 2009, page 36).
As a next step in improving biomarker discovery, CPTAC announced its Assay Portal, a repository of targeted proteomic assays using selected-reaction monitoring, in July and took it public in October (Nat. Meth. 2014, DOI:10.1038/nmeth.3002). In selected-reaction monitoring, pairs of precursor and fragment ions are used to identify disease-related proteins with mass spectrometry.
The portal will collect protocols for validated MS-based assays and is expected to have about 1,000 assays by late next year. For now, the assays are coming from CPTAC’s five centers. “We’re all mandated to contribute at least 200 assays,” says Daniel C. Liebler, a CPTAC member and director of the Jim Ayers Institute for Precancer Detection & Diagnosis at Vanderbilt University.
Journals are also making changes to improve reproducibility of life sciences research. Last year, the Nature journals adopted an 18-point checklist for life sciences authors to promote transparency. “There is a risk in mandates,” Véronique Kiermer, executive editor at Nature Publishing Group, said at the GBSI summit. If there’s no way to verify that a researcher actually completed a task, the checklist becomes a box-ticking exercise, she added.
The Nature journals are among a group of more than 70 journals that publish preclinical research that agreed last month to a set of reproducibility standards developed by the National Institutes of Health. The standards encourage transparency in publications.
Science magazine has taken an extra step to improve the quality and reproducibility of the research it publishes. In July, the journal established a Statistics Board of Reviewing Editors. Members of the board assess papers flagged by editors as needing extra scrutiny of the statistical methods used to analyze the data. More than 70 papers were sent to the statistical review board between July and October, according to Marcia McNutt, editor-in-chief of Science. The process is going well, she says. Before making any changes, she would “want to have a review of the process, its successes, and its shortcomings sometime next year.”
The American Chemical Society, which publishes journals across chemistry and the allied sciences, includes in its author instructions guidance for the reporting of reproducible research for the communities served by its journals.
But journals can’t be the sole gatekeepers of reproducibility. “The role of journals is to insist on transparency in research, to remind authors of what those best practices in reproducibility are, and to ask authors to declare whether those processes were followed,” McNutt says. “To ensure reproducibility—for example, by reproducing the research—would be a prohibitive burden on journals and reviewers.”
At the GBSI summit, Kiermer likewise cautioned that journals can’t bear the brunt of ensuring reproducibility. “Journals are at the end of the process,” she said. “The study is already done when it arrives.”
One organization is taking on the challenge of reproducing published studies. Science Exchange is a marketplace for scientific collaboration. Through its online site, scientists can order experimental services—including validations—from member labs on a fee-for-service basis. Such validation studies could be requested by the original researchers who want to see their work validated or, more often, by others who want to validate findings before proceeding to build on them.
And now Science Exchange is doing some of the validation studies on its own behalf. Last year, the organization received a $1.3 million grant from the Laura & John Arnold Foundation to fund a reproducibility project in cancer biology. For that project, the organization identified 50 widely read and cited papers in cancer biology published between 2010 and 2012 and set out to replicate the key findings, according to Elizabeth Iorns, cofounder and chief executive officer of Science Exchange.
The replication experiments are being done in a “registered report” format. Science Exchange submits the data acquisition and statistical analysis plans to the online journal eLife, which peer-reviews and publishes the protocols. Once data collection begins, researchers add data and analyses to the registered reports.
“Data collection is under way for those that have come back from peer review,” Iorns says. “We haven’t finished any of the experiments yet.” Her goal is to finish the cancer biology project in the next year.
Much of the reproducibility problem could be eliminated by making simple changes to the way experiments are run. For TetraLogic’s Begley, the single biggest problem is that most experiments are not run in a blinded fashion, in which one person runs the experiments and another person interprets the data.
Begley worries that a whole generation of scientists was never taught how to properly conduct research in a reproducible manner and that they in turn don’t know how to train their students. He places the responsibility for such training squarely on the universities and research institutes.
“I’m proposing that principal investigators should be retrained every year,” Begley says. “They should be forced to sit down and go through their own papers or their colleagues’ papers” to identify unblinded or uncontrolled experiments. “They should be forced to confront what I’ve been confronting for the past 10 years.”
But Begley acknowledges that a 100% reproducibility rate is probably not desirable either. “If you do the experiment properly and you get the wrong answer, that’s okay,” he says. “The thing that frustrates me most is that 80% of the time experiments are not done properly.”
Initial steps are being taken to provide materials for education. NIH has developed training modules on reproducibility and transparency to be used as part of its training on responsible research conduct for intramural postdoctoral scientists. Those materials will be made available to other organizations to use in their training. In addition, in August, NIGMS announced a program to fund the development of training modules in the area of reproducibility that can be completed in a day or less.
“Everything we’re proposing will come with a cost,” Begley says. “The cost will either be financial or decreased research productivity. I’m completely happy for it to be the latter. We’ll have decreased research productivity, but what we get will be of higher quality.”