If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



The Big Picture

Drug firms forge an information management architecture to take on the research data glut

by Rick Mullin
October 1, 2007 | A version of this story appeared in Volume 85, Issue 40

Credit: Veer
Researchers push for targeted data retrieval.
Credit: Veer
Researchers push for targeted data retrieval.

"MY ROLE at Lilly is quite unusual, actually," says Susie M. Stephens, principal research scientist for discovery and development informatics at Eli Lilly & Co. "I sit inside of discovery IT, and my job is to identify interesting collaborations in the area of integrative informatics—ways for Lilly to integrate heterogeneous and diverse data, to bring it all together, to analyze it, and to visualize it."

Stephens, who has a Ph.D. in physiology and conducted postdoctoral research in molecular biology, came to Lilly this year after working for computer giants Sun Microsystems and Oracle for the past decade.

She is part of a new breed of pharmaceutical IT (information technology) managers, trained in science and computers and working to construct a new IT architecture for drug research by combining in-house systems development with outside collaborations. These managers are focused, in part, on bridging discovery and development, two very different scientific disciplines that have traditionally employed fundamentally different IT.

That tradition needs to be broken, according to Stephens, her industry colleagues, and the drug companies that employ them. Newly staffed by people like Stephens, the pharmaceutical sector is placing heavy emphasis on improving research efficiency through a systematic sharing of data between the discovery and clinical development stages of the drug creation process.

Their problem is that the industry is swimming in an ocean of data that need to be parsed and channeled into useful information. For Stephens, taking charge of the data begins with gaining access to it, a process that may involve breaking other industry traditions. "One of the most interesting aspects of my role is open innovation," she says. "Any software we produce is intended to be open source; any data we produce should be publicly available. Any methodology or ontology should be publicly available."

The other interesting aspect of her job is sitting down with the competition. "Pharma companies are under pressure to become more efficient," Stephens says. "So if we can work with other pharma companies to do something once rather than all of us needing to create things on our own and argue over which approach is better, we think that will help deliver better economies of scale."

Her counterparts at Pfizer, Merck & Co., Roche, and other major drug companies concur that the industry is coming to terms with the need to collaborate on IT standards. Most major drug companies have begun pursuing collaborative IT ventures with institutional, academic, and industrial partners.

Still, most IT development occurs in-house, in consultation with research scientists, in an effort to bridge not only discovery and development but also chemistry and biology. The focus is on databases and software, such as electronic laboratory notebooks (ELNs), that accommodate research. In fact, some observers of the research process say pharmaceutical companies, traditionally slow to adopt software commonly used in other industries, are now pressuring vendors to develop systems that meet their needs.

M. Vidyasagar, executive vice president for advanced technology at IT consultancy Tata Consultancy Services, says data mining has emerged as the crux in an industry that has stored too much data and is only beginning to develop technologies to analyze or even access it.

"There is a lack of standardization on terminology and a lot of preprocessing that has to be done," Vidyasagar says. Researchers themselves are unlikely to provide detailed categorical information on everything they enter into a database, he says. "The idea is to use computer programs to do this kind of categorization."

De Graaf
Credit: Pfizer
Credit: Pfizer

The World Wide Web Consortium (W3C), an international group that has developed standards such as XML for Internet search technology, has developed a research standard for the Internet. Drug companies are working on several fronts to implement the W3C standard, called the semantic Web, for searching purposes in R&D. However, the bulk of the work is still being done within individual companies.

Credit: Michael Manning
Credit: Michael Manning

This go-it-alone approach may not be ideal. "Ultimately, you want to push science as a frontier. You want everybody to take advantage of everybody else's knowledge base," Vidyasagar says. "But given how siloed the pharmaceutical industry is, even getting different units in the same company to exchange information is really the first step."

Credit: BMS
Credit: BMS

Alan S. Louie, research director at consulting firm Health Industry Insights, agrees that the amount of data in storage far exceeds the searching capability of IT systems for drug research. The volume and nonlinear organization of imaging data, in particular, hobbles databases, according to Louie. "With imaging, you produce megabyte- to gigabyte-size files leading to terabytes of data," he says. "In addition, you have all the analysis, where someone addresses some key part of a particular tumor with specific dimensions and types of growth. All that metadata has to get captured in such a way that it isn't necessarily linked with particular files."

IN A PERFECT WORLD, specific biological characteristics would be made available to researchers making precise queries to a database that automatically collects information on relevant diseases or biological mechanisms. Such a database, Louie says, is essential to researchers in the field of systems biology, an emerging branch of science that focuses on the study of complex interactions in biological systems.

"We're not there yet," Louie says.

David de Graaf, director of systems biology at Pfizer's research technology center in Cambridge, Mass., agrees. Commercially available IT, he says, has not fully evolved for predictive disease modeling, for example. Nor, he says, can generic databases support bioinformatics-a computer-based technique employing statistics, applied mathematics, and artificial intelligence to solve biological problems, generally at the molecular level. Part of the problem, according to de Graaf, is that bioinformatics and other new research disciplines are still in a state of flux.

"Our efforts suffer from a lack of definition in that different groups in different situations use the terminology of systems biology very differently," de Graaf says. "Even if they use the terminology consistently, they apply it at different parts of the value chain."

Researchers, therefore, don't generally know exactly what the things they need to collect "look like," he says. "In terms of establishing data models, areas like genomics and proteomics were blessed in that they were defined by their processes—researchers knew what they were going to collect and do," de Graaf explains. Systems biology, on the other hand, deals with heterogeneous data from, say, genomics, patients, and enzyme-linked immunosorbent assays—virtually any kind of research data in any form.

According to de Graaf, the challenge is developing a protocol for data storage, data integration, and database querying. "When people think about systems biology, they tend to think of querying: 'I have all this data, what am I going to do with it now? How do I infer across all this? Can you tell me what the multiple elements are that influence my disease process or whether my drug works or not?' "

The drug industry first attempted to meet this challenge with basic data mining, but it proved limited by its inability to take into account the dynamic nature of biological processes. Next came computer modeling programs, which were developed in-house at Pfizer. De Graaf emphasizes that IT development follows the science. "This whole process encompasses generating data in the lab," he says. "This is not an in silico discipline by any stretch of the imagination. It is very much based on data-data specifically produced to inform the model."

THE PROCESS cycles back to the lab, where researchers use IT tools to formulate hypotheses that are then tested. "You go through the cycle again and again until you have refined your model and your description of the biological system enough for the IT system to give you results that you are confident in," de Graaf says. In this way, modeling programs become tools for translational medicine, giving both discovery and development scientists a window into complex biological systems.

Models also bridge science disciplines. "We have to realize that chemists make tools that change biological systems," he says. "And our bread and butter is in understanding how those tools change biological systems. This new wave of technology will allow us to be far more careful in how we catalog information."

And Pfizer, de Graaf points out, has a lot of information. "As we go into what I can unabashedly call a new age of drug discovery," he says, "we'll find we can't solve these issues ourselves. We are looking in the academic and high-tech community for partnerships."

One such partnership is with Genstruct, a computational systems biology firm with which Pfizer has developed a biomarker used to investigate vascular injuries.

Keith Elliston, Genstruct's chief executive officer, says his company was launched in 2002 to work in computational research partnerships.

In addition to being extremely conservative, he explains, the drug industry accounts for a relatively small portion of the business software market and so has had comparatively little influence on software developers compared with the government and industries such as finance. "While many industries have been successful in working with software companies to develop technologies such as computer-aided design and modeling, the pharma industry has had a very difficult time," he says.

The research Genstruct does with partners, Elliston adds, is also fostering a shared knowledge base. Pfizer and other partners, including GlaxoSmithKline, contribute to a pool of publicly available data. Only a portion of the data, he notes, is treated as proprietary.

According to Elliston, drug companies are beginning to realize that data alone do not constitute a competitive edge. "There really is only one human biology out there," he says. The competitive front is moving away from databanks and libraries toward systems and processes for analyzing data in the development of therapies.

Bryn Roberts, head of research informatics at Roche, agrees. "Broadly speaking," he says, "most companies have access to the same kinds of compounds and the same targets. And so it's really down to what decisions you make, and this comes down to how you treat the knowledge and information that you have."

Roche is working in systems biology partnerships with Swiss universities, including the Swiss Federal Institute of Technology (ETH), to develop analysis techniques in metabolic disease and other therapeutic areas. The company wants to create an IT infrastructure with which researchers can understand cells of interest from an especially revealing perspective that incorporates proteomics, genomics, and disease pathway data. "We are going for a totally integrated environment," Roberts says, "putting together inferences from experiments of different domains to get a picture of the whole system."

Roche, like most drug companies, mixes commercially available software and IT tools developed in-house. At this point, about half the systems Roche uses are from commercial vendors. These firms include Genedata, which supplies data integration software, and MDL, the supplier of laboratory information management systems and ELNs that was recently purchased by Symyx.

Bristol-Myers Squibb is also integrating IT applications, but for now, the company is working on separate discovery and development tracks. In both cases, BMS is putting research on a service-oriented architecture, a network of software programs that operate independently but can be accessed centrally.

According to Alastair Binnie, the company's executive group director for discovery informatics and automation, while discovery and clinical development will have some programs in common, the architectures remain separate for practical reasons. "There are still very great differences between drug discovery and clinical development," he says. "The scientific disciplines are still quite different, and the questions scientists are asking are also quite different."

The data are also fundamentally different. "In discovery, there tends to be more heterogeneous data and experiment types, many different types of biology and chemistry going on in parallel," Binnie says. The "barrier for change" is also much lower because the process is not regulated by the Food & Drug Administration.

Jason Bronfeld, executive director of pharmaceutical development informatics at BMS, says the conversion to a service-oriented architecture reflects a change in how drug research organizations view data management. "Ten years ago, there was a notion that if we could just look at all the data, then we could get a look at just the data that we want," he says. Such thinking oriented the industry toward monolithic databases. The objective now, Bronfeld says, is to house data in such a way that researchers can see only what they want, an approach well-served by the loose aggregation of software in a service-oriented architecture.


MEANWHILE, Internet applications are beginning to impact IT development in the pharmaceutical industry, but progress is slow. "I'm still surprised there are so many client-server applications out there being used as opposed to a move toward Web-based architecture," says Martin Leach, Merck & Co.'s director of basic research and biomarker IT.

A pharmacologist by training, Leach came to Merck earlier this year from business consultancy Booz Allen Hamilton, where he worked in the pharmaceutical IT practice. He says drug industry IT is trending toward highly configurable architectures that employ Web-based search functions such as the semantic Web and open-source programs like wikis. There is, however, a trade-off, he says. "The more generic you make the system, the more generic you have to make the data structures," Leach says. "You lose some features as you move toward the Web, but you gain some performance. Some people find a better middle ground than others."

At Merck, the emphasis is on IT to support translational research and biomarker-enhanced drug discovery, Leach says. "We are working to bridge the gap from development, where clinical trials and clinical work is performed, to discovery," he says. "We want to get the data and physical samples and push them upstream into the hands of the basic researchers so that they can do more relevant hypothesis-testing and experimentation up front."

According to industry participants, building a modern pharmaceutical IT system requires that researchers understand the importance of data management. "Drug discovery and development is an information discipline," says Peter Grandsard, executive director for research at Amgen and a board member at the Association for Laboratory Automation. But he makes a distinction between information and information technology. "I challenge people who say we need the latest, greatest IT to do our jobs better," Grandsard says. "What people struggle with more is the need for mind-sets that generate knowledge and bring it out in the open for everybody to use."

The central competitive battlefield today, he says, is the facility and speed with which drug companies access, manipulate, and share information in the lab.

Lilly's Stephens expects that software and Internet tools to support this knowledge-sharing mind-set will advance primarily at the behest of researchers. "Going forward, it will make sense for different groups to share tools," she says. "And shareable tools exist—there are some software companies offering solutions for both biology and chemistry."

More important, Stephens says, corporate management at the highest levels is promoting the idea of sharing information. "It is very interesting that Lilly is doing open innovation," she says. "The fact that we are looking to collaborate with pretty much anybody in a precompetitive landscape is a very notable thing."

[Top of Page]


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.