If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



Giving drug researchers control of their data

A transition empowering scientists is underway in data management for drug discovery

by Rick Mullin
November 7, 2021 | A version of this story appeared in Volume 99, Issue 41


Researchers in a pharmaceutical lab.
Credit: Benchling
Drug discovery researchers, teaming with data scientists, are taking on increased responsibility for managing data at the laboratory bench. Shown is a lab run by a Benchling customer.

“I guess I’ve been digitizing for my entire career, 20 years,” said Michael Montello, senior vice president of R&D technology at GlaxoSmithKline. “I remember the first data warehouse I implemented in discovery . . . And here we are still talking about digital transformation.”

Montello was reminiscing at the Bio-IT World Conference in September, where he discussed the past 3 years of GSK’s digitization initiative. Peers from Pfizer, Roche, and Eli Lilly and Company joined him, describing similar programs at their firms in drug discovery data management.

Their discussion had a familiar ring to it: break down function silos, establish central data management, develop standard terminologies. In fact, a regular conference attendee may well have shared Montello’s observation: “Here we are again.”

But there is no question that equipping labs with information technology is a rapidly changing endeavor given the now-ubiquitous use of cloud computing and the soon-to-be ubiquitous deployment of artificial intelligence and machine learning. And the velocity of change in drug discovery IT is increased by something advancing even more rapidly—science.

The future comes fast in this game.
Liran Zvibel, CEO, WekaIO

New technologies such as gene editing and genomics place drug discovery on the front line of a transformation in data management. As laboratories produce petabytes of data, responsibility for managing and curating information in the hunt for new drugs is increasingly shifting from centralized IT departments to research teams at the bench. It’s a big change for both the scientist and the IT department.

“It is absolutely clear that the rate of scientific innovation far outstrips the speed of IT innovation. It also moves faster than I, an IT person, can build a storage system or build an analytical environment or build a database,” says Chris Dagdigian, senior director of BioTeam, a life sciences IT consultancy.

Scientists in the lab have inevitably become directly involved with IT system development, assuming responsibilities traditionally housed in corporate IT departments, Dagdigian says. In turn, the traditionally centralized laboratory data management infrastructure has morphed into bench-level IT shared by drug researchers working with data scientists and software engineers.

“The message in recent years is that the way we use and access data is very nuanced and very fluid and not amenable to simple rules-based decisions that the IT department can simply deploy,” Dagdigian says. “There are certain things when it comes to scientific data that are wildly inappropriate for IT to own and control.”

Dagdigian advocates technology that allows scientists to “self-service”: accessing, using, and filing data on their own. “On the flip side, I don’t want scientists to become IT experts just to do their job.” The IT department isn’t going to fold its tents.

Industry watchers agree that a transition is underway that can’t simply be characterized as a dispersion of centralized IT. And the approaches taken by large drug companies, start-ups, and academic research labs are bound to differ according to the amount of IT infrastructure already in place. But labs in all segments of research are forming multidisciplinary teams to develop data management systems that afford researchers far greater agency than ever before.

Fast action

The rate of change is the most challenging aspect of implementing data management IT at Pfizer, according to Holly Soares, a molecular scientist and head of precision medicine at the large drug company. “When we get offered new systems more rapidly and with increasing technological complexity, it is sometimes hard to adapt and be nimble.” Deploying new technologies, which once occurred over yearslong periods, now has to be done in months—or even days—as the company advances research on antivirals and vaccines, she says.

Pfizer’s IT management is also evolving. “We are seeing data scientists now embedded with the research teams,” Soares says, noting that the disciplines are learning from each other. Research scientists are becoming more involved in curating data, and data scientists are picking up on the expertise of research scientists, which helps them to apply IT tools in the lab more effectively. The mutual learning is prevalent in genomics research, she says, where IT is central to identifying drug targets and understanding target differentiation. “The amount of data can be in the terabyte territory.”

Inside a serious working industrial pharmaceutical laboratory with a woman in the foreground and a man laboring at the bench.
Credit: Pfizer
Holly Soares, foreground, head of precision medicine at Pfizer, sees traditional research scientists becoming more involved in curating data.

Managing the data has also become a “team sport” at Lilly, says Ramesh Durvasula, the firm’s vice president for research IT. Subject-matter experts, IT technicians, and data scientists are huddling at the lab bench, he says.

Durvasula notes that chemists and biologists coming intothe lab in recent years are well prepared for interdisciplinary research. “There are a lot more PhDs coming out of the sciences who are a lot more software and computational savvy than they were decades ago,” he says. “We need to repurpose many of our systems to empower them to leverage all the skills they bring to bear rather than assuming they have the skills of years ago, where they were strictly bench scientists.”

The challenge in equipping multidisciplinary teams centers on differences between the disciplines. “We need to understand that the consumption of data and leveraging of models by a bench scientist is very different from the consumption of data and leveraging of models by a pure data scientist,” Durvasula says.

Standardization of data is another uphill climb. “With the advancement of scientific instrumentation, the nature of the data being collected is outpacing the speed at which we can standardize,” he says.

Durvasula says research teams are working through the obstacles, however, focusing on computational models and turning the vice of hoarding data into a virtue. “It is not that the scientists need us to hoard the data; the models need us to hoard the data,” he says. The computational models established for research are emerging as a competitive distinction in laboratories, according to Durvasula. “We don’t talk about killer apps anymore,” he says. “What we are more interested in is killer data sets.”

Tom Plasterer, AstraZeneca’s senior director for strategy and collaborations in oncology translational medicine, agrees that standardization continues to vex the field but says that the pharmaceutical industry has made headway with the help of data standard repositories such as the National Center for Biomedical Ontologies’ BioPortal.

Drug industry–led efforts, like the Allotrope Foundation, have advanced common terms for data management, Plasterer says. Most recently, the FAIR principles—guidelines for ensuring data in storage are findable, accessible, interoperable, and reusable—have been adopted by drug companies including AstraZeneca and Pfizer.

Lately, Plasterer’s organization is building data modeling for oncology research. The data come from the bench.

Three data scientists huddled around a laptop in a public space of a biotech firm in Utah.
Credit: Recursion Pharmaceuticals
Mason Victors, chief product officer at Recursion (center), huddles with Lina Nilsson, vice president of product, and Juan Rodríguez, data scientist at the Salt Lake City–based start-up

“We have a ton of really smart scientists,” Plasterer says. “They’re reading papers all the time, and it’s all in their heads. And you need to pull it out and put it in a model that a computer can use. That is a lot of what I do. Taking stuff I pull out of people’s heads and putting that data in the model.”

Smaller companies, especially research-intensive biotechs that launched in a world of cloud computing and software as a service (SaaS), can address the growing agency of the scientist without having to accommodate entrenched IT systems and management.

Recursion, a biotech company that outsources its wet chemistry, puts a lot of stock in its ability to craft data management technology that fits specific projects in individual labs. Mason Victors is the chief product officer for the 8-year-old firm. The word product in his title refers not to drugs but rather to software and protocols that make up the company’s data management system.

Victors says the rise of multidisciplinary teams in drug research indicates that “the status quo is not very good—not for companies living in a digital era. Not for 21st-century biotechnology companies.”

With a background in applied mathematics, data science, and machine learning, Victors is also critical of how scientists have handled data in the lab. “Scientists want to run an experiment because they are trying to answer a very specific question,” he says. “When they get the answer, they want to stop messing with the data.” Raw data end up not being captured and curated in any repository where it can be accessed later by other researchers.

Recursion has designed a cloud-based storage system to address this problem. “The majority of Recursion’s data is in high-content, fluorescent microscopy imaging,” he says. “We run about 1.7 million experimental wells every week through our fluorescent microscopy platform, and those images, as they get captured off the microscopes, automatically get synced up to Google Cloud.”

There, the images are turned into mathematical representations that define insights on the data and can be delivered to users on request at any time, Victors says.

New tools for the bench

Vendor-supplied data management software is also moving into discovery labs. Benchling, a start-up crafting SaaS applications for drug discovery, has its eye on how scientists work and their preferences in IT tools, according to Ashu Singhal, president and cofounder of the company, which launched 9 years ago.

Singhal points to the “consumerization” of IT, epitomized by the iPhone, as well as the steady movement of data management systems to the cloud. “I think we are seeing the emergence of full SaaS platforms that obviate the LIMS,” Singhal says, referring to traditional central laboratory information management systems.

Benchling develops applications for specific areas of research, such as genomics or gene editing, putting software configuration in the user’s hands, Singhal says. “That means that they can adjust how the software is set up and how it works,” he says. “It’s software compatible with the speed at which R&D is currently moving.”

Still, the company relies on a high level of standardization. “I think 80% of what users need from us comes out of the box,” Singhal says.

Benchling claims its tools are used by 200,000 scientists at 600 companies ranging from small biotechs building IT from the ground up to large industrial and institutional research organizations with well-established data management systems.

New data storage and search options are also emerging for the bench scientist. WekaIO is one of several new companies with products designed for shared storage of large volumes of complex life sciences data. The company offers cloud-based data storage with an on-premises option in partnership with hardware suppliers including Hewlett Packard Enterprise and Hitachi Vantara. Using the firm’s WekaFS, research departments can scale storage beyond the dozens of petabytes that currently define the high end in drug discovery labs, according to CEO Liran Zvibel.

In addition to high volume, WekaFS operates at high speed. Any scientific tool can access a single data source, whereas most traditional systems store data redundantly in areas assigned to particular projects or kinds of research. Access to a single source also accommodates project changes without having to adjust dedicated data sourcing. “The future comes fast in this game,” Zvibel says. “One of our strongest abilities is flexibility.”

New software for data search has also debuted. Seven-year-old Genomenon focuses on genomics research, which has experienced explosive growth in data generation. The company uses artificial intelligence to implement a genomics search engine for clinical research and a genomics database called Genomic Landscapes for drug discovery, says CEO Mike Kline.

“We ID any variant ever published in association with a disease. We curate those and provide an interpretation with the scientific evidence that goes with it,” Kline says, describing Genomic Landscapes.

The center can hold

New technology is not obviating the workhorse LIMS and similar centralized lab IT, LIMS proponents say. In fact, Robert Voelkner, vice president of sales and marketing at the LIMS supplier LabVantage Solutions, notes that the key goals in data management may actually strengthen the position of LIMS in drug discovery. Researchers need to consolidate, harmonize, and standardize data, core competencies of a traditional LIMS. “Companies don’t want bespoke IT systems,” Voelkner says.

Voelkner says he has seen LIMS roll with massive changes in his nearly 40 years implementing the systems in drug discovery. “In the early days, you had IT organizations that were responsible for everything IT related. The lab was only one of them,” he says. The same management division was also in charge of human resources computing, email, and personal computers. Lab ­research IT is now a distinct entity.


“The CIO-level person still has strategic vision, looking to deploy standards and consolidate platforms,” Voelkner says, referring to the chief information officer. “But the people who are running the day-to-day IT operations are members of the lab organization now.”

IT systems have evolved as well to accommodate greater autonomy in developing new tools. LabVantage recently introduced a version of its LabVantage Analytics system that incorporates machine-leaning algorithms for the creation of data sets for use by researchers.

Ned Haubein, LIMS senior application manager at Penn Medicine, University of Pennsylvania Health System, says his group is working to accommodate data-savvy scientists with a centralized storage architecture allowing data curation and management in the lab.

That is a lot of what I do. Taking stuff I pull out of people’s heads and putting that data in the model.
Tom Plasterer, senior director for strategy and collaborations in oncology translational medicine, AstraZeneca

“The shift to doing more analysis in individual labs is growing,” he says. “You are getting people who have more of the skills necessary to do that work—people with statistical backgrounds.” Haubein’s department is focused on making data accessible to multiple, unrelated laboratories from the central repository.

“Part of the role we play is to define how researchers name data. We play the role of data stewards,” he says. “It’s important that most of the people on our LIMS team have a background that crosses science and technology. It allows us to talk to the labs in their language.” Haubein has a PhD in chemical engineering.

The LIMS system is updated with advanced configurations that allow customization for laboratory-level employees, Haubein says. The trend of researchers taking more control of their IT does not pose an existential threat to large data storage systems, he argues.

The largest of drug companies are also working toward a balance of traditional and emerging data management techniques that supports the needs of multidisciplinary drug discovery teams. Montello, speaking at the conference in Boston, described how GSK’s digital transformation program started small in one research area and went large across all IT development.

“What we saw was that these teams were moving faster and also managing their backlogs and priorities better. All IT products are developed in collaboration with researchers,” he said. “We have 80 product teams right now. Every single one [works] in collaboration with scientists, medical doctors, regulatory staff. That culture of collaboration and getting ahead together is extremely important to success.”


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.