Issue Date: May 25, 2009
Next-Generation DNA Sequencing Raises The Bar In Laboratory Data Management
The decoding of the human genome opened the floodgate on a huge reservoir of raw data. Armed with microarray technology, drug researchers have attempted to corral and organize these data. Despite a decade of work, they have managed to come up with only a very limited view of the mechanisms of the human body and its diseases.
Over the past two years, however, new DNA sequencing technology has emerged allowing a much more comprehensive view of biological systems at the genetic level. Next-generation sequencing, as it is called, has also increased by orders of magnitude an already daunting volume of data generated in laboratories. The result is an immense information technology (IT) challenge.
Machines such as Applied Biosystems' Solid 3 and Illumina's Genome Analyzer IIx can generate terabytes of data daily in a laboratory. These sequencers are equipped with software that can analyze and vet raw output, bringing the cull down to the gigabyte range. Software is being developed for secondary and tertiary analyses that will reduce data volumes further. Still, researchers are facing a new world of data management and storage that is likely to make IT an integral facet of bench-level genomics.
"The main issues have to do with generating and storing information," says Martin Leach, executive director of basic research and biomarker IT at Merck & Co. "Now, with the algorithms and processing sets coming with these new technologies, you can rapidly take the raw image data and raw textual data and reduce them and manipulate them. But try backing those data up. It's a nightmare."
Leach says the movement of processed and compressed data from laboratories to corporate computer networks is a key focus in research IT, and advances in laboratory computing power have helped. Cloud computing is among the new storage options under consideration as researchers get a better handle on what to do with the data. "With the changes in technology, we are getting better at identifying what we really need to keep," he says.
But the pool of data continues to grow rapidly. "We recently reached a milestone of scanning a billion molecules at one time," says Francisco de la Vega, vice president for Solid system applications and bioinformatics at Applied Biosystems, a division of Life Technologies. De la Vega compares that astronomical number with 96, the number of samples per run that was state of the art when the genome was first sequenced.
The Solid 3 system works by taking a series of digital scans of DNA molecules arrayed on glass beads. Data throughput is much higher than with the older microarray technology, but the scans cover shorter lengths of the DNA molecule, requiring the sequencer to reconstruct the molecule from smaller segments compared with previous protocols. From there, subsequent data analysis can determine genetic variations and RNA sequences—levels of analysis that are feasible only with next-generation sequencing, according to de la Vega.
Adam Lowe, director of life sciences marketing at Illumina, says his company recently enhanced the data analysis capability of its sequencing products with an upgraded version of its Genome Analyzer II. Software enhancement has boosted the system's data generation capability while decreasing storage requirements, Lowe explains. "We are working actively with computational biologists and computer hardware developers on systems for small labs," he says.
Vendors and equipment users are also working on developing industry standards for storing genomic information, Lowe says. The goal is to make data more portable between data banks and to facilitate collaborative research.
Suppliers of specialized systems and software are developing products to minimize data generation. For example, the German firm Febit has introduced a microfluidics-based system for hybridization-based DNA capture. HybSelect prepares DNA segments for targeted sequencing on next-generation equipment. "It increases throughput, drives down the cost, and minimizes the data to the level of what is required," says Chief Executive Cord F. Staehler.
The advances in technology will require researchers to advance their computer skills. "There is a blurring of the line between what is research and what is IT," Merck's Leach says. "There will be new approaches to visualizing data and data reductions in order to get to the crux of what is being investigated. The skill sets and capabilities of researchers are going to have to be more IT-enabled. Researchers will have to be more IT savvy."
- Chemical & Engineering News
- ISSN 0009-2347
- Copyright © American Chemical Society