Volume 91 Issue 26 | p. 18
Issue Date: July 1, 2013

Crowdsourcing Toxicity Prediction

Computational contest aims to improve models for predicting chemical effects
Department: Government & Policy
Keywords: data, toxicogenetics, computational challenge
Large data sets are generated with the help of robots and high-throughput assays. Their analysis is being crowdsourced to improve models for predicting chemical toxicity.
Credit: Maggie Bartlett, MHGRI
A robot arm transfers compounds to high-throughput assay plates at the National Institutes of Health, National Genome Research Institute, Chemical Genomics Center in Rockville, MD.
Large data sets are generated with the help of robots and high-throughput assays. Their analysis is being crowdsourced to improve models for predicting chemical toxicity.
Credit: Maggie Bartlett, MHGRI

A unique competition is under way to encourage computationally minded people to develop new models for predicting the toxicity of chemicals. The goal is to crowdsource the analysis of large sets of in vitro toxicity data to determine whether chemical toxicity can be inferred from genomic and chemical structure information.

The idea is to put large data sets in the public domain “to see if multiple groups working on the same problem can do a more effective job of answering a question than one individual group could do on its own,” says Lara Mangravite, a principal scientist at Seattle-based Sage Bio­networks who is managing the competition.

Attempts to develop models for predicting toxicity have been slow up until now, and such models haven’t been very effective. Contest organizers plan to place all of the data and the resulting models developed during the competition into an open commons so investigators can bounce ideas off of each other and improve their models along the way.

The three-month competition, which opened on June 10, is being led by Sage, the University of North Carolina (UNC), the National Institutes of Health, and the Dialogue for Reverse Engineering Assessments & Methods (DREAM). Sage and DREAM are nonprofit groups dedicated to data sharing and exchange. Both have managed computational challenges in the past.

For this particular challenge, participants are given access to data from one of the largest-ever high-throughput human-cell-based toxicity studies. They are also given access to genetics data for each cell line and chemical structure information.

Participants can try to solve one or both of the competition’s challenges. In the first challenge, they are asked to develop a model to predict how an individual’s genetics affects response to chemical exposures. In the second challenge, they are asked to develop a model that can predict toxicity to cells on the basis of chemical structure information.

The only prize involved is recognition by peers and a trip to Toronto this fall to pre­sent the winning model at a DREAM conference. But those incentives appear to be enough to attract a steady flow of interest.

Within one week of its launch date, the challenge had attracted about 50 entrants. “This feels pretty reasonable relative to other challenges we have run. We would like there to be several hundred people, and we think we are well on track to that,” Mangravite says.

Although the competition is open to anyone, Mangravite doesn’t expect people from outside the scientific community to participate. “This is a pretty complicated computational question,” she says. It is likely, however, that scientists from fields other than toxicology will participate, she notes. In former challenges, “we’ve gotten electrical engineers, physicists, and computer scientists,” she says.

For this competition, data were made available from a large population-based toxicity study conducted by researchers at UNC, the National Institute of Environmental Health Sciences (NIEHS), and the National Center for Advancing Translational Sciences (NCATS). Both NIEHS and NCATS are part of NIH.

The study involved treating 1,086 human cell lines, representing nine geographic regions of the world, with 179 chemicals, including pharmaceuticals and environmental contaminants. The cell lines were made available through the 1000 Genomes Project—a publicly funded effort to map genomic variation across 1,000 individuals.

With the help of robotic equipment at the NCATS Chemical Genomics Center in Rockville, Md., the team conducted high-throughput assays, measuring the amount of adenosine triphosphate (ATP) within cells at various doses of each chemical. ATP is essential to cell function, so measuring its levels is a way to monitor cell viability, explains Ivan Rusyn, a toxicologist and professor of environmental sciences at UNC who led the team that grew and screened the cells. Low ATP levels mean a dying cell.

Last year, Sage challenged the computational community to build the best predictive model of breast cancer survival based on genomic information. That competition attracted 354 participants from 35 countries. More than 1,700 models were submitted.

Anyone interested in answering a complicated computational question can participate in the challenges. One of the top 10 winners in one of last year’s challenges was a high school student, points out Stephen H. Friend, president and cofounder of Sage.

Organizers hope that the models developed during the toxicogenetics competition, which ends on Sept. 15, will ultimately help inform government agencies as to which chemicals present the greatest risk to human health. But even if such models don’t make their way into regulatory decision making, one of the goals of the challenge is to build a community of people who are interested in working together to solve computational problems related to chemical toxicity. And that part is already starting to happen.

Chemical & Engineering News
ISSN 0009-2347
Copyright © American Chemical Society