Gentlemen, you can’t place data in here. This is the Cloud! Or so was the attitude of many drug companies a couple of years ago, when C&EN published a story about the information technology (IT) infrastructure for the pharmaceutical “lab of the future.” The reality then was that major drug companies were not committing data storage and analysis to the huge banks of external computers known as the cloud. Operated by firms such as Google and Amazon, cloud computing services are touted as a means of eliminating the need for expensive internal data systems.
Unlike the financial services industry and many research-driven manufacturing sectors, the pharmaceutical industry was largely unwilling to risk housing its research data anywhere other than behind its own walls. Beyond a handful of large companies that pioneered internet IT strategies, drugmakers were standing back from the cloud.
Pharmaceutical R&D labs are latecomers to cloud computing. As other businesses embraced putting sensitive information in external computer banks, drug companies resisted, citing concerns about data security and the special needs of their researchers. That resistance is now starting to break down. In the following pages, C&EN explores how drug companies are moving research data to the cloud and the software they are using to do it.
At the time, IT managers speculated that the sector’s resistance would soon break down in the face of a shifting research landscape. A rampant increase in the volume of data produced in drug discovery would make storage in the cloud an attractive option, they said. And cloud storage would also grow as a means of sharing data with collaborators and contract research organizations (CROs) without having to establish direct computer links. Increased confidence in the security of cloud computing would also urge pharmaceutical research cloudward.
Look to the sky
Today, the line between the lab and the cloud is indeed solidifying. Big drug companies and research institutions have launched cloud computing programs, and major vendors of research software have debuted cloud-enabled versions of their core products.
Meanwhile, a new generation of software suppliers spawned in the early days of the genomics revolution is offering data analysis products designed to use the cloud. And many of the IT functions that were traditionally delivered via purchased computers and software are being offered under software-as-a-service arrangements in which vendors charge for use of tools accessed over the internet.
Vendors and IT managers now speak of the ability to “spin up” drug discovery data and analysis to the cloud. The information can be accessed by multiple partners and then simply shut off at the end of a project, with the desired data transferred for storage at a partner’s in-house data banks or on the cloud. The cloud enables projects to come and go without the use of dedicated IT assets.
The shift of research IT to cloud-hosted computing and storage has not been wholesale, however. As they begin connecting data systems with contractors on the cloud, drug companies continue to keep their own mainframes busy. And the IT personnel that run them are well aware that taking full advantage of cloud computing could put them, and those mainframes, out on the street.
According to Michael Elliott, head of the life sciences IT consulting firm Atrium Research, cloud computing has made sense for drug research for several years, given the cost of maintaining in-house IT systems and the difficulty of integrating those systems after mergers. The trend toward connecting research data systems with partners has provided a big push skyward.
“Externalization—basically the virtualization of research—has been the prime driver to get people over some of the resistance and move to the cloud,” Elliott says. “People are beginning to think that it’s time to look at a different operating model.”
Still, Elliott claims that many of the technologies that are promoted as cloud-compatible were not built that way. Rather than being designed to take advantage of what the cloud offers, some centralized data management systems are moving unchanged from the lab to cloud hosting services such as Amazon, he says. Many of the traditional laboratory software vendors are “taking a big leap” with claims that their systems are cloud-ready, Elliott says.
“There is a lot of catch-up going on now,” he contends. “Vendors for many years denied there was a movement to the cloud because customers said they didn’t want it. But now, many of the vendors are caught kind of flat-footed. Many of the laboratory information management system vendors and some of the lab notebook vendors are now saying, ‘We have to do something about this,’ and the way they are doing it isn’t truly what you would expect of a cloud-based platform.”
Elliott adds that work culture may also weigh companies down as they move toward cloud computing. “Surveys I have done show the people most resistant to the cloud are the IT folks,” he says. “I think that has quite a lot to do with job protection.”
The second-most opposed to the cloud, he says, are the lawyers. “They get worried about everything getting stolen because they don’t understand that it’s just as easy to get it stolen from inside as it is from outside.”
There is also resistance in the lab itself, where researchers tend to be skeptical of IT employees and protocols that dictate work procedures, Elliott points out. Most prefer having computer applications that run on their desktops as backup to central data systems. Going to the cloud will eliminate that option, many researchers fear.
New in the cloud
Despite lingering resistance, software vendors agree that a shift to the cloud is under way in drug industry labs. Most claim they have been adapting products for the cloud in anticipation, with several introducing software as a service and other cloud options for products they already sell to be run on laboratory computers. Others, especially in the area of genomics analysis software, have designed systems specifically for use in the cloud.
Seven Bridges was among the first companies to develop cloud-based software for drug research. Launched in 2009 by a scientist involved with the 1000 Genomes Project, the company saw a problem in the mountains of data being generated through genomics research, says Ameya Phadke, marketing strategist with the company. “To extract real value and insight, you need a scalable platform to analyze massive data sets in a collaborative way. Back then, there was nothing to do that.”
Seven Bridges’ software is designed to process and analyze gene sequences. “The cloud is what our software uses to conduct the computation to store and analyze data,” Phadke says. The company began marketing a product that connects Amazon’s cloud service in 2012. It now also works with Google’s service.
Phadke says it makes sense for drug researchers to shift their computing to the cloud. “The strength of a pharmaceutical company is its focus on discovering and developing therapeutics and bringing them to market as opposed to all the stuff that goes along with running a massive computer cluster,” he says. The sector “is really warming up to the cloud.”
Rob Brown, vice president of global informatics at Dotmatics, says business is picking up for his company’s cloud-based data analytics service with small biotech and drug companies that lack internal data analysis systems. Also interested are large firms seeking to facilitate external research relationships in which data are shared and analyzed.
Working in the cloud is no less secure than using an in-house data system, Brown contends. And any security concerns are outweighed by a cloud-based system’s ability to handle a diverse set of data types. Moreover, the cloud offers better options for sharing data than SharePoint, Dropbox, and other tools used in-house to exchange data with research partners.
“We host our software on Amazon,” Brown says. “The nice thing is the level of security and scalability that they provide.” Amazon also allows users some discretion as to which of its computer banks around the world will host their data. “Someone working with a Chinese partner may want data in the Far East, but outside China,” he says.
Arxspan, another provider of cloud-based life sciences research software, operates its own cloud rather than contracting with Amazon or Google, according to James Martin, vice president of business development. Maintaining its own data repository gives customers a feeling of greater security and gives Arxspan greater control of its customers’ data.
The company, which offers a menu of data analysis and management programs that run on electronic laboratory notebooks (ELNs), says it designs software exclusively for cloud applications.
Lab system migration
The traditional big providers of broad laboratory information management systems are also embracing the cloud. In 2014, one of the biggest, Dassault Systèmes’ Biovia health care sciences division, introduced ScienceCloud, which incorporates several cloud-compatible products that had been purchased by the laboratory IT firm Accelrys. Dassault acquired Accelrys that year to form the core of Biovia.
Ton van Daelen, senior director of Biovia’s ScienceCloud business, says the drug industry’s labs not only lag other industries in implementing cloud computing strategies, but they are also behind clinical pharmaceutical research, where security concerns are not seen as a hindrance, despite the sensitivity of patient data.
“But this is changing rapidly,” he says. “As much as 50% of research budgets are spent on collaborations with CROs. So, data already need to go outside the firewall. This is currently done using e-mail and flash drives, which can easily be hacked or stolen. After looking into the cloud again, companies are realizing it can be at least as secure as methods being used today.”
As van Daelen points out, a hacker can drive up next to a lab and tap into the wireless network. “There is no way to drive up to that company’s data at an Amazon data center,” he says, “because they don’t tell you where their centers are. And there is no wireless signal.”
Accelrys began assembling an internet-compatible product line in 2011 with the acquisition of the ELN supplier Contur Software. The next year, it acquired the Hit Explorer Operating System (HEOS), a software-as-a-service lab management platform. Accelrys first marketed HEOS as a web-compatible facet of its cheminformatics IT system, which is centered on the data management tool Pipeline Pilot.
Accelrys took HEOS, which had been designed as a simple data-sharing platform, and added customized applications to handle chemical registration, biological registration, assay data management, inventory management, and data analysis. Some of this was done through partnerships with software vendors including Amplified Informatics for data visualization, Notiora for text analysis, and Discngine for assay technology.
Biovia has not, however, adapted all its systems to cloud computing. Its primary ELN, acquired from Symyx Technologies in 2010, is paired with in-house IT. In fact, the cloud computing applications available through Biovia are directly connected to its traditional in-house hardware and software. “Pipeline Pilot is the glue, the integration layer that allows people to take ScienceCloud and make sure it is still connected with on-premises data sources and applications,” van Daelen says.
By 2012, Thermo Fisher Scientific and IDBS, two other major research IT vendors, also began pushing software tools to the internet with cloud computing adaptations. Thermo Fisher Cloud is a web-based data repository for genomics, proteomics, and other research data offered on Amazon. IDBS introduced E-WorkBook Cloud, a web-ready data management software package, and E-WorkBook Connect, which facilitates collaborations in the cloud.
IDBS is moving its software from a client-server-based architecture, in which computers in a lab are connected to a central database or server, to a web-browser-based architecture, says Paul Denny-Gouldson, a vice president and product manager at IDBS. The firm hopes to finish a four-year product-conversion project by the end of the year.
Learning to fly
Major drug companies are investing in cloud software—but cautiously. James Connelly, who retired this month as Sanofi’s head of research data management, says he realized years ago that the French drug company was going through a fundamental change in research that would require a rewiring in the IT department.
“Sanofi had been 100% internal research 10 years ago,” he says. “Now, it’s reoriented. The company wants to take advantage of all the biotech research being done outside the company and manage its risk.”
This shift could give Sanofi the opportunity to manage a lot less IT. “Historically, companies like Sanofi built huge infrastructures to handle research data—multiple databases and data warehouses,” Connelly says. “It turned into this big, monstrous entity of expensive-to-keep old technology.”
Connelly envisioned an infrastructure in which companies would maintain a sizable in-house data bank while placing project-related data analysis and storage in the cloud. Ideally, the setup would be facilitated by software from multiple vendors. New projects would be launched and completed projects turned off once the required data was transferred to the in-house data bank.
But corporate policy stood in the way of implementing what Connelly saw as a flexible and agile system, he says. The company had started using HEOS about five years ago. In the end, it standardized on Biovia’s ScienceCloud to support collaborations with outside partners.
“The way it’s set up today, there is a ScienceCloud installation in the cloud, and they have segments of it that are accessed by specific collaborations.” Connelly says he would have preferred having a free hand in selecting software on a project-by-project basis, noting that he liked some of the products from Dotmatics and Arxspan. Yet he acknowledges that using a single cloud supplier does offer the advantage of data standardization across all projects.
The Belgian drugmaker UCB also began its cloud campaign with HEOS five years ago, according to Sarah Archibald, a principal scientist in informatics. “A partnership started up, and we needed a solution to securely share data externally,” she says. “We also needed to provide some capability for analysis for a small organization without much IT and informatics.”
“We looked around, and the only thing we came across was HEOS,” Archibald says. “That partnership did not continue, but we thought HEOS was an interesting tool. So we used it for a couple of small lab projects. A couple years ago, a more substantial collaboration came along and we used it.”
UCB began using HEOS to facilitate the exchange of structural scientific data, Archibald says, and has since implemented analytical applications.
In Archibald’s view, the adoption of cloud computing has had little effect on UCB’s traditional informatics infrastructure. Although cloud computing facilitates research partnerships involving off-site data management, both data mining and storage of key data will continue to take place in-house, she says.
Genentech is using Dotmatics’ Studies Notebook software as an ELN-based support for working with chemistry CROs, according to Jeff Blaney, Genentech’s director of computational chemistry and cheminformatics. The information shared in the cloud, he says, is “low sensitivity”—primarily chemical structures and synthesis data, with no information on biological targets.
Blaney says Genentech has no plan to move away from in-house computers, which are already sharing data via the internet with no storage or analysis in the cloud. He describes a “surgical” approach to cloud computing.
“We recognize the need for a practical approach for dealing with a dynamic mix of CROs and collaborators,” he says. “Importing data from all of them is not easy, and plopping down software and making partners run it at their site is not practical. So the cloud approach seems like a pretty clear win for a chemistry ELN.” There is no strategy to go beyond chemistry on the cloud, “but it’s safe to say we are thinking about it.”
Blaney says he has seen little resistance to the cloud on the part of Genentech scientists. Rather, researchers and managers alike are interested in “seeing how it works” for solving technical problems, he says.
Genentech’s genomics research operation, on the other hand, is currently standing back from the cloud, according to Matt Brauer, senior scientist for bioinformatics and computational biology. The group is studying options both for an internal cloud—hosting its own computers for spinning up data—and for an external cloud. At the moment, however, the company’s data center is capable of handling the volume of data generated during research.
Genentech researchers are not entirely comfortable with letting any data related to patients go outside the firm’s walls, Brauer says. He acknowledges that many companies are using cloud computing specifically for clinical research, but he figures that larger companies with full IT assets are likely taking a more cautious approach than smaller firms. “It is all a matter of how much risk you are willing to assume,” he says.
Meanwhile, the Broad Institute is crafting its own connection to the cloud for genomics research through a series of partnerships. Broad announced in April that it is collaborating with Amazon, Cloudera, Google, IBM, Intel, and Microsoft to enable access to GATK4, a cloud-based version of its Genome Analysis Toolkit. The announcement marks an expansion of a program launched last year using Google Cloud Platform as a provider.
When Broad started licensing the GATK software to genomics researchers in 2012, it was challenged to support them in their use of its huge data sets. Moving to the cloud gives licensees data storage and analysis capacity and also expands general access to the software, according to Geraldine Van der Auwera, manager of outreach and communications for data sciences and data engineering at Broad.
“There are amazing advantages to the cloud infrastructure,” Van der Auwera says. “External people can walk up and purchase sequencing as a service. In the past, to deliver results and data, you needed a fairly complicated system dedicated to the user. Now, we are moving toward a bucket that holds all the results.”
Van der Auwera says the cloud offers an elastic approach to researchers whose analysis and data volume requirements change regularly. Researchers look at the security question not as an obstacle, she says, but as a problem to be solved or a risk to be managed—something well worth the effort, given the potential benefits of accessing data and analytical services as needed from that bucket.
Connelly, the former Sanofi manager, agrees. Connelly says he watched in recent years as researchers at Sanofi began comparing the traditional ways of managing data and working with partners to practices enabled by cloud computing. As time went on, they became more comfortable with the security issues and more interested in the scientific possibilities.
Resistance is wearing down, he says. “I think, all of a sudden, that pharma gets it.”