0
Facebook
Volume 87 Issue 21 | pp. 10-14
Issue Date: May 25, 2009

Cover Stories

The New Computing Pioneers

With in-house information technology burdened to the breaking point, the traditionally conservative drug industry is putting cloud computing to the test
Department: Business
EXIT STRATEGY
Schadt will soon leave Merck, taking with him the drugmaker's Rosetta Inpharmatics computer cluster.
Credit: Merk
8721cover1_opencxd_opt
 
EXIT STRATEGY
Schadt will soon leave Merck, taking with him the drugmaker's Rosetta Inpharmatics computer cluster.
Credit: Merk

IT MAY NO LONGER BE FAIR to characterize large pharmaceutical firms as late adopters of information technology (IT).

Having spent the past five years catching up to other industries in the deployment of enterprise software systems that link researchers and laboratories companywide, big drug firms are now starting to push data storage and processing onto the Internet to be managed for them by companies such as Amazon, Google, and Microsoft on computers in undisclosed locations.

Pfizer, Eli Lilly & Co., Johnson & Johnson, and Genentech are among the drugmakers that are piloting into an emerging area of IT services called cloud computing, in which large, consumer-oriented computing firms offer time on their huge and dispersed infrastructures on a pay-as-you-go basis. These drug companies are among the first to gauge the cost- and time-saving pros and the potential management and security cons in this largely uncharted territory.

The concept of cloud computing, based on technologies that already support e-mail and search services, has burst onto the IT scene during the past year. Success stories have already been logged across a range of industries and government organizations, including the White House, which used Google cloud services to handle the questions sent to President Barack Obama during his March 26 town hall meeting. The White House was able to field a peak of 700 e-mail hits per second from 92,934 people submitting 104,073 questions and casting 3,605,984 votes in the 48 hours leading up to the meeting.

The advantages of cloud computing to drug companies include storage of large amounts of data as well as lower cost, faster processing of those data. Users are able to employ almost any type of Web-based computing application. Researchers at the Biotechnology & Bioengineering Center at the Medical College of Wisconsin, for example, recently published a paper on the viability of using Amazon's cloud-computing service for low-cost, scalable proteomics data processing in the Journal of Proteome Research (DOI: 10.1021/pr800970z).

And Lilly has demonstrated the viability of cloud computing in pharmaceutical R&D, according to Dave Powers, the firm's associate information consultant for discovery IT. "We were recently able to launch a 64-machine cluster computer working on bioinformatics sequence information, complete the work, and shut it down in 20 minutes," he says, describing a project the firm executed using Amazon's Elastic Compute Cloud (EC2) service. "It cost $6.40. To do that internally—to go from nothing to getting a 64-machine cluster installed and qualified—is a 12-week process."

Although Lilly has a sizable installed base of computers, the company's IT infrastructure is operating at full capacity, says Andrew Kaczorek, senior systems analyst for discovery IT. "Because we have hundreds of different users, what we see is spiky utilization," Kaczorek says. "The result is that for days at a time our clusters are at 100% of capacity. This means there are actually scientists who have work to be done that is literally sitting in a queue." Although exact cost savings are difficult to calculate, they are clearly significant, according to Powers and Kaczorek, as are the time savings.

Pfizer's Biotherapeutics & Bioinnovation Center (BBC) began using Amazon cloud services earlier this year to develop and refine models in antibody docking runs, according to Giles Day, head of informatics at BBC.

"We use the cloud to shorten the process to two to three hours from two to three days," Day says. "One run costs us $300, which is a small price to pay for the time savings it generates. But what really interests me is that it changes the way we do our science. Using the cloud lets us work in a more iterative way and keep the momentum of the research project going."

Applied Biosystems
Applied Biosystems Solid system is among the next generation of DNA sequencing systems that will generate terabytes of data in laboratories.
Credit: Applied Biosystems
8721cover1_ITbiocxd_opt
 
Applied Biosystems
Applied Biosystems Solid system is among the next generation of DNA sequencing systems that will generate terabytes of data in laboratories.
Credit: Applied Biosystems

Pfizer is beginning to employ cloud computing in other research operations, but there are some downsides, Day says, one being that users must come up with their own programming to coordinate with cloud service providers. Pfizer is working with the BioTeam, a consulting firm, on connecting its work to the cloud. Lilly is using software and services from two suppliers, Cycle Computing and RightScale, to access Amazon's network and manage the transfer of data onto and off of the cloud.

Powers points out that security is a concern, limiting most if not all activity to the manipulation of public data that don't involve intellectual property or patents. Policing individual researchers' access to the cloud is an even bigger concern, according to Powers. "One of the pros of using the Web is that it is low friction—just a credit card account and you're off and running," he says. "But this is also a con for a large enterprise. We are trying to centralize a single point of entry into cloud space such that we have some ability to control it."

A lot needs to be worked out, agrees Wes Rishel, a vice president with the health care provider IT division of the Gartner Group, a consulting firm. Rishel says cloud computing is currently "very high on the hype cycle," with a rush of first-time-user success stories. "There is no doubt that the technology exists to get the costs of processing resources and disk resources down effectively to the price of electricity," he says. Although this is cause for enthusiasm, the lack of standards for entering and processing data makes cloud computing far more complicated than it might seem on the surface, according to Rishel.

INDUSTRY WATCHERS agree that cloud computing is best defined by the uses that the service firms' computers are put to. "All 'cloud' really means is the Internet," says John Wilbanks, executive director of Science Commons, a division of Creative Commons that works to establish protocols for collaborative work in science on the Web. "It's a fancy name for distributed storage and processing."

"All 'cloud' really means is the Internet. It's a fancy name for distributed storage and processing."

Wilbanks says cloud computing is currently hobbled by a lack of the communications standards that Science Commons and others have been promoting for research employing the Internet.

Drug companies are also still sorting out what kind of data will be appropriate for cloud storage and processing. "Where the data reside in the discovery process will dramatically affect the likelihood that they're ever going to be part of a cloud," Wilbanks says. The amount of sensitive data currently kept behind protective firewalls will likely limit the applicability of cloud computing in the pharmaceutical sector, he argues.

Yet the rapid creation of life sciences data keeps pointing to the use of cloud computing, and this is especially true in the area of genomics research. Advances in nanoscale and microfluidic chemistry now allow DNA to be monitored on tiny beads by photographic sensors that, according to Chris Dagdigian, principal consultant for the BioTeam, generate TIFF images in collections of up to 800 gigabytes. "This creates a massive data-capture and handling problem," he says. "We are now in an era where instruments that are showing up in very small wet laboratories are capable of producing a terabyte or more of data in a day."

Eric Schadt, executive scientific director for genetics at Rosetta Inpharmatics, a subsidiary of Merck & Co., agrees. "The next generation in technologies for gene expression profiling and DNA sequencing is going to be able to generate data so fast and at such a large scale that it is going to be overwhelming to many," he says. "We thought microarrays and high-density SNP arrays were generating high-dimensional data that were difficult to hold," he says, referring to single-nucleotide polymorphisms. "Well, these next-generation technologies are going to be one to two orders of magnitude beyond that."

The data should yield answers to questions about how complex disease systems manifest themselves in the human body, Schadt says. "It is not going to be DNA variation on its own that will tell us how genes interact with a given disease," he says. "More and more, you will see people integrating the DNA variation information with gene expression information or metabolite information or protein information." Cloud computing may be employed by researchers coming to grips with all of the data management involved, he says.

Merck, though, has not taken the plunge. The company has amassed a computer center at Seattle-based Rosetta, which it acquired in 2001, with about 10,000 processors and an elaborate Internet-based architecture allowing researchers working on thousands of projects anywhere at Merck to access data from storage. But that situation is about to change.

In the coming months, Merck will be handing the Rosetta computer cluster, and a majority of the data therein, to a nonprofit bioinformatics database called Sage Bionetwork being formed by Schadt and Stephen Friend, senior vice president and oncology franchise head at Merck Research Laboratories. The drug firm, which will have open access to Sage, will consolidate research computing at its new Center of Excellence for Molecular Profiling & Research Informatics, in Boston. Meanwhile, Sage will pursue partnerships with other public and private research centers in order to expand the database. Sage, as it develops, may well incorporate cloud computing, according to Schadt.

AS DRUG FIRMS come to grips with what can be accomplished in cloud computing, service suppliers are amassing a distributed computer utility infrastructure to accommodate booming demand. Amazon is seeing rapid growth in its cloud storage offering, Amazon EC2, which was introduced in 2006. Its server capacity has increased from 18 billion files to 52 billion files over the past year, according to Adam Selipsky, vice president of Amazon Web Services.

Selipsky says Amazon's cloud service can be viewed as a virtual data server with flexible and nearly limitless capacity. "It looks and feels like the same raw iron in any company data center," he says. "But it's in our basement, not yours." And just as any company would add software and proprietary applications to its basic computing infrastructure, cloud users run their applications on the Web on Amazon's distributed infrastructure. Amazon, according to Selipsky, provides storage, message queuing, and high-performance computing on a pay-per-use basis.

The service has garnered a lot of interest from the life sciences community, where companies under budgetary constraints are dealing with increasing computing burdens, Selipsky notes. Speed of processing has also brought drug researchers to the cloud, he says, adding that Amazon recently introduced Amazon Elastic MapReduce, a cloud-computing utility that relies on Hadoop (the name comes from the developer's child's stuffed toy elephant), a Web application from Apache Software Foundation that accelerates processing by enabling computers to work with thousands of files and petabytes of data. A petabyte is 1,024 terabytes or about 1 million gigabytes.

In addition to cost and time savings, Mike Naimoli, director of Microsoft's life sciences business in the U.S., says cloud computing may also enable data sharing among drugmakers and contract research organizations and other partners. The company introduced its Azure Services Platform for cloud computing last year.

"It is one thing to get the data up there, another to interact and work with it," he says. "That is done through applications hosted on Azure. Microsoft won't build those applications. We provide a framework and fabric that can host the user's applications. Anything users can do locally and adapt to the Internet can run on the cloud."

Rishi Chandra, senior product manager for Google cloud-computing service Google App Engine, says the intensive need for data storage and computing will drive drug research toward cloud services. "It makes more sense to operate in a distributed network, where you can handle spikes in demand," he says.

Chandra says Google enables users to put data behind a secure firewall in its cloud infrastructure. Managing access to the data requires some custom integration work, however. Although Google, like Amazon and Microsoft, manages the security of its physical computer infrastructure, users will be responsible for data encryption, data access control, and other security measures on the cloud.

Some of this work is being done by third-party software suppliers such as Cycle Computing, which began developing open-source software for high-performance computing four years ago. Cycle has since launched a business application and security management service for cloud computing, according to Jason Stowe, its chief executive.

The company has a partnership with Schrödinger, a computational chemistry software firm. And Stowe says Cycle is developing applications for next-generation genome sequencing that will allow researchers to use the cloud to process and condense raw data from their laboratories.

Lilly's Powers says Cycle has taken on the challenge of replicating the drugmaker's in-house IT operations online. "There is a lot of complexity to taking an environment that lends itself to static clusters and making it dynamic," he says. "Cycle has the expertise to develop browsers that assemble clusters on the cloud, handle scheduling, and move data to the point where we can submit an algorithm or scientific workload, indicate that we want to run it X number of times, and send it off."

ALTHOUGH USERS and vendors alike agree that these are early days for cloud computing, they also view such IT services as a viable option for a range of work beyond storage and processing of nonproprietary data. Some see it as an environment for collaborative work and as a secure environment for clinical trial data.

Karen Riley, a spokeswoman for the Food & Drug Administration, says cloud computing is clearly new territory. "If cloud services become the archive for clinical trial data, our concern would be to safeguard the system for write protection in order to prevent tampering," she says. Auditing companies such as Google would not be practical, she says, and the responsibility for data security would likely remain with the trial sponsors. Riley notes that FDA already trusts secure external servers for e-mail communication of clinical trial data.

"It is exciting to us to think we may be on the cutting edge," Powers says, acknowledging that Lilly's use of the cloud currently stops well short of collaborative research or clinical trial management. He says the proven cost savings and accelerated research afforded by cloud computing make it attractive to drug companies, as does the potential for more advanced uses in the future.

"From our CEO down, we are changing how we are doing business. Things are being done much more collaboratively," Powers says. "If we want to move forward, it will increase the burden on our infrastructure and IT to a scale that we are not familiar with. There is also a sense of urgency to make things happen quickly. So the question is, do we build that infrastructure ourselves, or do we, in the spirit of looking externally, turn to the cloud?"

 
Chemical & Engineering News
ISSN 0009-2347
Copyright © American Chemical Society