Advertisement

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

ENJOY UNLIMITED ACCES TO C&EN

Biological Chemistry

The Next Generation In Genome Sequencing

Advances in technology create new challenges in data analysis

by Rick Mullin
May 9, 2011 | A version of this story appeared in Volume 89, Issue 19

Input
[+]Enlarge
Credit: Shutterstock
The amount of information entering the system outstrips the analytical capacity of drug researchers.
Credit: Shutterstock
The amount of information entering the system outstrips the analytical capacity of drug researchers.

Every now and then, a headline says it all. An article last fall in Genome Medicine, for example, carried this one: “The $1,000 genome, the $100,000 analysis?” (DOI: 10.1186/gm205). Suppliers and users of genome sequencing and informatics systems agree that the author, Elaine Mardis, perfectly captured the dilemma in genomics-based drug research. In fact, some say Mardis, director of technology development at the Genome Institute at Washington University in St. Louis, is lowballing the cost of analyzing genomics data.

COVER STORY

The Next Generation In Genome Sequencing

In the decade since the human genome was decoded, advances in sequencing technology have brought the cost of a single genome sequence down from nearly $1 million in 2007 to close to the $1,000 mark.

The original cloning-based Sanger technique, which afforded an accurate but high-cost sequencing of a full genome, gave way to next-generation sequencing (NGS), a high-throughput, imaging-based system that vastly increased the speed of sequencing, as well as the output of data. Last year, Ion Torrent, a company founded by NGS pioneer Jonathan M. Rothberg, introduced yet another generation of genomics technology—a semiconductor-based system that dramatically lowers the cost of sequencing but also limits its scope, focusing on shorter segments of the genome.

The needle has hardly moved, however, on the analysis of genomic data. As Mardis points out, genomics analysis is a multidisciplinary function. In assessing one genome from a specific patient, data analysis needs to be performed by molecular and computational biologists, geneticists, pathologists, and physicians, all with different skill sets and requirements of the data. According to Mardis, the key to navigating the full river of genomics data is to gear the collection and processing of data to clinical applications.

[+]Enlarge
Credit: Credit
Credit: Credit

Several trends in bioinformatics are moving in exactly this direction. Most important is the incorporation of genomics-oriented informatics in commercial drug discovery and development. And as sequencing technologies have evolved, so have means of collecting, analyzing, and sharing the data. Drug companies are signing on with contract research organizations and joining consortia that share informatics infrastructures.

The bioinformatics challenge to genomics stretches well beyond the analytical capabilities of sequencing technology, Mardis tells C&EN. NGS and other genomics techniques allow for generation and storage of data, launching a pipeline of preliminary analyses that generally match base calls, or the arrangement of DNA base pairs in a sample, to the reference human genome.

“Once you have the alignment in place, you fire off a number of different algorithms,” she says. “You look for structural variants. This involves large blocks of DNA that have either gone missing, been amplified to the extent that there are more than two copies, or have changed position.” Computational challenges quickly increase.

One way to control the process is to design data collection with clinical applications in mind. “You need to engineer data analysis so that you don’t generate a genome in eight to 10 days and then take four to five months to analyze it,” Mardis says.

A group of 300 researchers at her Genome Institute, which is funded by the National Institutes of Health through the National Human Genome Research Institute, focuses on streamlining genome analysis. The institute, Mardis says, values clinical relevance. She credits the evolution of sequencing technology with providing data that can be interpreted in the context of the clinic and serve the needs of doctors treating specific patients.

Genomics data are generated from three main sequencing technologies that have evolved over the past 10 years, each with a distinct take on automating the process of determining the nucleotide order of a given DNA fragment. The earliest method to be widely deployed, Sanger sequencing, also called chain-termination sequencing, is based on a cloning process capable of sequencing long strands of DNA. Though accurate, it is laborious.

Various “next generation” techniques emerged following the decoding of the human genome, including the 454 Sequencer developed by Rothberg, then chief executive officer of CuraGen. The 454 uses a high-throughput-imaging technique that stores up to terabytes of image data produced in the process.

Rothberg formed a company called 454, which was acquired by Roche in 2007. Meanwhile, Illumina, Life Technologies, and other firms developed similar systems. Illumina is currently the market leader in NGS.

According to Divyaa Ravishankar, a consultant with Frost & Sullivan, the NGS business, currently worth $746 million globally, is expected to grow to nearly $3 billion by 2017. The market is growing faster in North America than in Europe, but growth is surging in China. BGI (previously known as Beijing Genomics Institute), she says, is developing what will be the largest NGS center in the world. In the U.S., she notes, much of the growth is due to the increased interest in genomics research at drug companies.

“NGS has mostly been used in the research community,” Ravishankar says. “Now it is moving into the diagnostic community. It will have a massive impact on personalized medicine, where you need to study protein translation.” Its main limiting factor, she says, is the challenge posed by analyzing large volumes of data.

Rothberg acknowledges some data overload, but he claims the problem has been overstated. “Early on, there was a fear that we were going to be overloaded with data when the first NGS machines came out,” he says. “Part of the blame I will take.”

The 454 machine was the first NGS system to use massively parallel sequencing. “Those machines used light and images. They basically took photographs of the sequencing process,” he explains. As such, the data overload was not entirely from genetic sequencing information. “There was an overload of image files,” he says. “In fact some machines were creating terabytes of image files.”

Realizing that imaging is not an integral aspect of sequencing, Ion Torrent researchers began designing a sequencer from scratch. “We said, sequencing is chemical,” Rothberg notes. “Why use an imager? Why not go back to the beginning and make a solid-state device and integrated circuit analogous to a CMOS sensor, which is what a cell phone camera uses to captures photons and turn them into voltages. We developed ion chips that, instead of seeing photons, see chemistry.”

Rather than storing images, the Ion Personal Genome Machine follows the incorporation of bases during the sequencing reaction, Rothberg says. “It cuts the data load a hundredfold right from the start by not having to take pictures,” he says. What’s more, the hardware costs $49,000 and the disposable sequencing chip—one used per sequencing—costs $250. Traditional NGS systems run into the hundreds of thousands of dollars, with each experiment costing as much as $35,000. The trade-off is that Rothberg’s machine sequences short segments of DNA of up to 100 base pairs. NGS technology can sequence more than 700 base pairs.

Rothberg notes, however, that the longer reads on NGS systems are limited by speed as well as parallelism, an area in which the semiconductor technology is making big strides. Ion Torrent’s commercially available chip includes 1.2 million sensors, and the company has developed chips with up to 11 million sensors, Rothberg says. He adds that the information technology development strength of Ion Torrent’s new parent company—Life Technologies, which acquired Ion Torrent last year—will lead to higher capacity semiconductor sequencers.

Although the semiconductor system has garnered a lot of attention—Rothberg graced the cover of Forbes magazine in December—it will enter a market in which several generations of sequencing technology operate side by side. This is certainly the case in laboratories operated by contract research firms, which are beginning to cater to the rising need for genomic sequencing and analysis in the drug industry.

Covance, a leading contract research organization, recently formed a discovery and translational services (DTS) division, incorporating its discovery services unit, Biomarker Center of Excellence, genomics lab, and immunology services business. “The DTS platform aligns us more with our customers’ drug discovery models,” says Thomas G. Turi, vice president of science and technology for the new unit.

DTS is composed, partly, of assets acquired from drug companies. In 2008, Covance got a lab and drug discovery assets from Eli Lilly & Co. as part of a larger outsourcing deal. The following year, Covance purchased a genomics lab in Seattle from Merck & Co. The facility, which provides high-throughput genomic sequencing, is equipped with Illumina systems and other NGS technology, Turi says, but they are used in conjunction with earlier generations of diagnostic technology, such as the Affymetrix GeneChip microarray system.

“Microarrays have not been displaced in their entirety,” he says. “They are still a very well utilized platform to ask specific questions. It can be more cost-effective to do an experiment first on a microarray as a feasibility check and then migrate to a more NGS-based approach.”

The market for genomics services is poised for growth, Turi says. The Lilly and Merck acquisitions, he argues, are an indication that drug companies are willing to outsource this kind of research.

The increase in genomics research will require greater informatics support as well, Turi notes. Covance recently announced a partnership with Ingenuity Systems, a provider of analytical software, under which the companies will develop protocols for obtaining insight into raw NGS data and a format for answering research questions. Covance will provide clients with a genomics knowledge management service incorporating Ingenuity software.

Informatics suppliers are also positioning themselves for the influx of genomics in drug industry labs. Accelrys, a leading life sciences informatics company, this year introduced a product called NGS Collection for Pipeline Pilot that’s based on its core laboratory information system. According to Clifford Baron, director of business development, NGS Collection for Pipeline Pilot supports data platforms used in Illumina, Life Technologies, 454, and other sequencing technologies.

[+]Enlarge
Credit: Genentech
Seshagiri
Credit: Genentech
Seshagiri

Some drug companies have amassed their own, multigenerational bank of genome sequencing systems. “We are very technology agnostic,” says Somasekar Seshagiri, principal scientist in molecular biology with Genentech. The company uses its parent firm’s 454 system as well as Illumina’s. And it still uses microarray technology.

“We use whatever gives the best answer to the questions we ask,” Seshagiri says. The company has been an early adopter of systems going back to Sanger sequencing, and it places a heavy emphasis on doing the work in-house.

Investing in technology has been a challenge in recent years, given the rapid generational turnover in technology and the amount of data that is generated. And semiconductor-based sequencing represents yet another wave on the horizon. Seshagiri says Genentech is investigating participation in genomics research consortia. The firm also makes its data available to other researchers. “It’s in the academic tradition,” he says. “Obviously we can’t do this all by ourselves.”

[+]Enlarge
Credit: Eli Lilly & Co.
Barber
Credit: Eli Lilly & Co.
Barber

Lilly, for its part, contracts with large-scale genome sequencing providers, such as BGI and Complete Genomics in Mountain View, Calif., according to Thomas D. Barber, senior research scientist and group leader of genetics at the company. Lilly submits tissue samples for sequencing and may get back a half-terabyte of data, Barber says. From there, the company uses open-source, or nonproprietary, software for referencing the human genome.

It’s a challenging, computationally intense process. “While great minds have worked long and hard, and great progress has been made, it is an extraordinarily difficult task to identify the appropriate DNA variants,” Barber says.

Lilly is among the drug companies that have started working in research consortia, some including other drugmakers. For example, Lilly, Merck, and Pfizer are partners in the Asian Cancer Research Group, a genomics consortium associated with BGI. Such collaborations mark a “monumental change in philosophy” for large pharma, Barber points out, but also an important realization about intellectual property (IP).

“We don’t view the understanding of the genetic basis of a disease as proprietary intellectual property,” he says. “This is something that should be shared, and we will select our IP around the drugs we develop.”

[+]Enlarge
Credit: AstraZeneca
Christianson
Credit: AstraZeneca
Christianson

Anastasia Christianson, senior director of R&D information at AstraZeneca, says her company is also involved in consortia, including those sponsored by the Innovative Medicines Initiative, a program launched by the European Commission and the European Federation of Pharmaceutical Industries & Associations. IMI is currently deploying a genomics knowledge management system developed by Johnson & Johnson to support several working groups.

Advertisement

When it comes to informatics, Christianson, like Seshagiri at Genentech, professes agnosticism in regard to technology. “We are trying to take an active and innovative approach to bridging the gap between data and knowledge,” she says, “not getting lost in the details of local data analysis or particular technologies, but applying them to making decisions on drug projects.” AstraZeneca is also working on an information bridge to the clinic, she says.

Like Lilly, AstraZeneca contracts with external genomics labs for sequencing and focuses on the analysis in-house. This shift involves another kind of culture change for big pharma, Christianson says: an elevation of the role of information scientists. Every drug discovery team has an informatics specialist charged with channeling internal and external sources of data into the project and facilitating its analysis.

“All the data being produced is useful,” she says. “But we have not done enough of a holistic analysis across all of it, relating data from one domain to another, to understand the content.”

Rothberg claims that generational change in genomics technology will address the shortcomings in analysis. He sees his company’s low-cost technology as a viable option for drug companies to bring sequencing into any laboratory.

“What we launched last year was a chip that sequences 100 base pairs,” he says. “Internally, we are at 300 base pairs, and we have announced that we will have a commercial chip at 400 next year.” Rothberg says his experience developing the 454 technology to the point of sequencing hundreds of bases assures him that Ion Torrent will also be able to increase the capacity of the semiconductor method. “We will get to 700 too,” he says.

At the same time, Rothberg expects the trend toward personalized and clinically focused medicine will boost the demand for targeted sequencing. In genomics, he says, the “killer app is a machine that sequences hot spots or small sets of genes.”

Advances in sequencing will, in turn, accelerate the analysis of data, he says. “We are making the correlations between sequence changes and the genetic basis of disease,” he says. “Over the next 15 years, we will correlate the changes and outcomes that will allow us to give people the right medicine—to understand where to intervene and how.”

Genentech’s Seshagiri is also optimistic regarding the potential of genomics. He says his company is experimenting with semiconductor-based sequencing.

And data, at any volume, is a good thing, Seshagiri says. “It is a challenge at one level, but an opportunity at another,” he adds. “The community is rising up to it. Sequencing technology will become a commodity. It will be the analysis that matters.”

Article:

This article has been sent to the following recipient:

0 /1 FREE ARTICLES LEFT THIS MONTH Remaining
Chemistry matters. Join us to get the news you need.