Making lab data work better
Centralized and standardized analytical data management is changing drug research and development, and all of chemistry, bit by bit
By Poornima Apte, C&EN BrandLab Contributing Writer
The COVID-19 pandemic, which triggered a widespread and sudden shift to remote work, thrust a problem to the forefront: laboratories urgently need a centralized and searchable library of all analytical test results and associated metadata, accessible on demand to everyone who needs it.
Traditionally, sharing the results of an experiment among colleagues is inefficient, with scientists either emailing large files back and forth or manually assembling documents, then passing them along. An alternative approach is to centralize all results into a unified library. Scientists in the structure elucidation group at Pfizer have implemented such a strategy and can now access results from wherever they are. Pankaj Aggarwal, a principal scientist at Pfizer, says, “We can do the data analysis and even control our instruments from home.”
Having all a laboratory’s data in one place has other benefits, according to David Foley, a senior principal scientist at Pfizer. A scientist unaware of or unable to find data from a particular experiment too often simply runs that test again, wasting time and laboratory resources. A centralized library “removes the duplication of work,” Foley says.
Time saved in creating a central library for analytical results is critical because drug discovery and production cycles in the pharmaceutical industry are becoming increasingly compressed. “In the past, projects would take 10–20 years,” Aggarwal says. “Now we are talking about products getting to market in 5 years.”
Wrangling data
The data problem is not limited to access. The results from analytical experiments arrive in such varying formats that cataloging them all and accessing desired files at later dates becomes time consuming and tedious.
ACD/Labs has been standardizing analytical data—converting disparate data sets from myriad instruments—via its software for over 20 years, says Sanji Bhal, the firm’s director of marketing and communications. More recently, the Allotrope Foundation has been working to create a unified ontology for analytical techniques in the life sciences and pharmaceutical industries.
Analytical data originate from a large number of manufacturers that service the markets for mass spectrometers, liquid chromatography systems, nuclear magnetic resonance (NMR) spectrometers, and other laboratory instruments. Each manufacturer typically delivers data sets in its own format, through proprietary software that’s incompatible with any other. Since scientists don’t confine themselves to just one technique to assay a compound during, for example, structure identification or compound registration, they end up using several pieces of associated software for each technique and having to assemble that disparate data to make decisions.
Such a process leaves scientists mired in mundane tasks related to data management instead of moving on to more meaningful data analysis. “Scientists have generally learned to live with it, but the hope is that standardization simplifies the process from data acquisition to decision.” Bhal says.
Get smart
Another benefit of data standardization is that it shifts the scientist’s focus from the analytical technique to the molecule, which is where it should be. With ACD/Lab’s help, Pfizer is developing a central library of scientific information about molecules. “It’s like when you walk into a library and you have different sections on historical fiction or personal finance,” says Vijay Bulusu, Pfizer’s head of data and digital innovation for pharma science R&D. “Similarly, we’re bringing all this analytical and scientific information—with the multiple sections being NMR, LC-MS [liquid chromatography–mass spectrometry], etc.—into one library.”
In the pharmaceutical industry, scientists are using software for each of those library sections to find subtle patterns in data—for example, a suspicious peak in a spectrum arising from an impurity that consistently happens with samples from one lot. Algorithms can also find anomalies and similarities in analytical data, drawing scientists’ attention to correlations that might otherwise be lost. Such algorithms are like navigation aids that let chemists rapidly maneuver through large volumes of data, finding insights that could have been missed. “If you had a paper road map and hit a roadblock, you might not immediately find a way around it,” says Karim Kassam, ACD/Labs’ senior director of customer success. “But a live, interactive map would make navigation much easier.”
Data standardization also sets the stage for advanced technologies, such as artificial intelligence (AI) and machine learning (ML). “We’re starting to develop AI and ML models and algorithms with these big data repositories we are creating,” Bulusu says.
Every sample analyzed by the structural elucidation group at Pfizer is tagged with an electronic number, which can be used to track all its associated tests. “We envision that in the future, any analyst will be able to use a compound number and structure and get all the [associated] analytical data, irrespective of the technique,” Aggarwal says. “We are creating individual data libraries, but we are not creating data silos.”
ABOUT SPONSORED CONTENT
Sponsored content is not written by and does not necessarily reflect the views of C&EN’s editorial staff. It is authored by C&EN BrandLab writers or freelance writers approved by the C&EN BrandLab. C&EN BrandLab’s sponsored content is held to editorial standards expected in C&EN stories, with the intent of providing valuable information to C&EN readers. This sponsored content feature has been produced with funding support from ACD/Labs.