If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Drug Discovery

Hunting for drugs in chemical space

As more chemicals are available to test in the search for drugs, researchers have to decide where to look

by Laura Howes
June 26, 2022 | A version of this story appeared in Volume 100, Issue 23

Credit: Benjamin Currie


In brief

Connecting chemical building blocks allows drug hunters to explore a much bigger chemical space than before. The challenge is to narrow this field of compounds to something manageable. To do that, chemists are turning to new computational tools to navigate this increasingly huge chemical universe, and they are combining technologies. Experts say these new approaches should speed up the identification process, and industry is investing time and money to optimize the hunt.

Back in 1996, a paper came out that has been cited time and time again. In a footnote to a figure caption, Regine S. Bohacek, Colin McMartin, and Wayne C. Guida of Ciba-Geigy estimated that at least 1063 small, drug-like molecules could be produced through stitching together up to 30 carbon, nitrogen, oxygen, and sulfur atoms in different arrangements (Med. Res. Rev. 1996, 16, 3).

And yet a look at drugs that have made it into the clinic reveals that the pharmaceutical industry has explored only a fraction of that universe—in terms of the number of approved drugs and the chemical diversity these molecules represent. As medicinal chemists tweak compounds along the way to a final drug, small molecular changes can make big differences in activity and toxicity. But as a first step, scientists have to know what direction they want to explore. “In the drug design process, you have to start with something,” says Matthias Rarey of the University of Hamburg. Rarey works in cheminformatics, meaning he describes molecules computationally. He says drug hunters don’t want to waste time and money making the wrong compounds at the very beginning.

The scale of the chemical universe
A graphic showing circles in various sizes and numbers with some descriptive text to represent the scale of the chemical universe.
Sources: Med. Res. Rev. 1996, 16, 3; European Space Agency; Drug Discovery Today 2019, DOI: 10.1016/j.drudis.2019.02.013; DrugBank Online.

Many diseases are caused by misfunctioning proteins, and these are what drugs often target, fixing the disease by modifying the protein or how it works. Finding the right molecule to do that is the job of the drug hunters. To start, they need to find a “hit” that they can build on and improve through rounds of experimentation.

One way drug hunters have traditionally looked for a starting point has been high-throughput screening, which relies on arrays of small quantities of compounds, usually stored in organic solvents, that are tested against a target. Pharmaceutical consultant Wendy Warr of Wendy Warr & Associates recalls how in the mid-1990s, chemists were focused just on making or acquiring more and more compounds to feed into these assays.

“And then they began to realize, well, you don’t just make everything,” she says. “You’ve got to use some common sense. You’ve got to design a library of diverse, drug-like compounds.”

But the number of screening compounds available is now growing again and becoming bigger than ever—through both physical and virtual libraries. Companies like Ukraine’s Enamine and OTAVAchemicals and China’s WuXi AppTec offer catalogs of billions of synthetically available compounds. Even larger are the in-house virtual libraries, or spaces, owned by big pharmaceutical firms. Merck KGaA, for example, has the Merck Accessible Inventory (MASSIV), a virtual space of 1020 molecules. That’s similar in scale to the number of stars in the universe.

These libraries virtually store compounds that the labs can create from building-block molecules they keep on hand. When chemists order these compounds, they aren’t flicking through a catalog but are searching computationally, and the compounds that are returned are created dynamically from the data about the constituent building blocks and the reactions they can undergo. To navigate this growing complexity, researchers have to be smart in their choices: in how they design new compounds, build those compounds, and combine screening methods. They can’t screen everything, so they need to search where they think they’ll have the most success. And that means the importance of computation has increased.

“I think we really see a shift now because of these large, make-on-demand compound catalogs,” Rarey says. The number of possibilities is just too large, he says. “So there is always a computational element in early-phase drug discovery now, and I think this will remain also in the future.”

Combining possibilities

Enamine, for example, has been supplying screening compounds to drug hunters since the early 1990s. But what started as a library of a few thousand physical compounds that customers could order has now grown to a catalog of 23 billion possibilities. The firm doesn’t have them all on hand, but it has the pieces and expertise available to build what customers request.

Yurii Moroz, of the screening-chemical supplier Chemspace, says that in 2021, Chemspace customers ordered over 200,000 Enamine compounds that did not physically exist before they were ordered. These libraries of make-on-demand compounds have become possible only because chemical makers have encoded both their building blocks and experimental data on reactions in computer-accessible form. The information in these libraries has also enabled growth in fields such as automated synthesis planning and screening.

“For initial hit discovery, you want to make compounds financially accessible,” Moroz says. Large high-throughput screening for the early stage of drug hunting can cost up to $1 million. That’s one reason to reduce the price point for the chemicals to screen, he says.

Snapping together compounds on demand can help reduce costs, bringing the price per compound to $100–$150 rather than $1,000. After that first hit, chemists can return to the building blocks to expand from the initial molecule and design more compounds in new areas of chemical space with confidence.

These virtual catalogs work a bit like an ice cream shop, Rarey explains. If you have 10 ice cream flavors and 10 toppings, you can very quickly make a lot of different ice cream sundaes. The ice cream shop doesn’t store the sundaes premade. It keeps the components separate until an order comes in; then the different scoops and components are assembled.

There is always a computational element in early-phase drug discovery now, and I think this will remain.
Matthias Rarey, head of the Center for Bioinformatics at the University of Hamburg, Germany.

Where the analogy falls apart is that in an ice cream parlor, you can choose to combine every scoop and topping, even if they don’t taste good together. Not all building blocks will react to form new compounds, though, and not all compounds will make sense for a particular drug target. The trick is to develop ways to ensure the potential chemical space reflects the chemical reality. For example, scientists can encode reaction data so that chemists look in the right place.

While these database approaches have expanded the library of compounds that customers can order, other virtual tools can work as a first pass for predicting how molecules could dock or bind to a target protein, allowing researchers to focus their first wet-chemistry screens only on molecules that they think have a good chance of working on the target.

“Those are the programs that are increasing,” says Petro Borysko, Enamine’s director of biology, who oversees the screening experiments that the firm runs on behalf of clients. He says he’s seen a change in high-throughput screens toward more “pointed actions” that are “usually the results of some kind of virtual screen.”

Building the libraries

Key to the development of these virtual libraries is the drug-hunting approach called fragment-based drug discovery, which Rarey says “paved the way to where we are now.”

In fragment-based drug design, researchers test much smaller chemical groups against a target than in high-throughput screening. Once they find one that binds, they build a more complicated chemical scaffold around the initial binding fragment with the goal of filling in the binding site and engineering a strong bond to the target.

Finding a fit
A group led by Vsevolod Katritch at the University of Southern California virtually combined building-block molecules to look for drugs that could fit in a protein pocket. From a potential 11 billion screening compounds, the researchers designed around 600,000 compounds (A) then computationally tested them against a protein target (B). They then added more building blocks and tweaked the best-fitting molecules with more computational reactions (C and D) to find around 60 that were made physically for screening.
Small chemical structures with arrows in between showing how they can be snapped together to make different compounds.
Credit: Vsevolod Katritch/C&EN

The initial fragment needs a moiety that will bind to the protein, but it’s best to keep the initial fragment small so you can expand the fragment in as many ways as possible, according to Gianni Chessari, head of chemistry at Astex Pharmaceuticals, a fragment-based screening specialist. Starting from weakly binding first fragments helps drug hunters anchor at a starting point for exploring chemical space.

Another efficient way that researchers have explored the vastness of chemical space is with DNA-encoded libraries (DELs). These libraries are made up of screening molecules attached to unique DNA sequences that identify them. The approach allows all the screening molecules to be mixed in a single tube along with the target protein. DELs have traditionally focused on large, flexible molecules like peptides, which have more DNA-compatible chemistry.

At the DEL firm X-Chem, Ying Zhang, vice president of chemistry, has been assembling a library of smaller molecules by accumulating as many building blocks as she can but running only two or three coupling reactions between them. The result has significantly broadened the chemical space explored by the firm’s DELs (Bioorg. Med. Chem. 2021, DOI: 10.1016/j.bmc.2021.116189). Today, X-Chem has billions of compounds in multiple targeted libraries.

But with DELs too, researchers have to be smart to narrow the vast space they cover. They are being strategic about which compounds to put in the tube to begin with, often using machine learning to help guide their decisions (J. Med Chem. 2020, DOI: 10.1021/acs.jmedchem.0c00452). Combining screening technologies is also increasingly common. A firm may begin with a hit from a DEL, for instance, and then use a fragment-based search—perhaps one guided by computation—to refine the hunt.

Encoding the chemistry

BioSolveIT is a company cofounded by Rarey that offers multiple pieces of software to help chemists navigate these new chemical spaces. Director of Application Science Marcus Gastreich says it would be a mistake to think that this computational expansion of chemical space is due to only an increase in computing power. The work chemists have done to encode chemistry in computers has also massively enabled the field.

In fact, Gastreich says, just as chemists would take too long to synthesize every potential compound, it would require far too much computing power to compute every possible compound from all the building blocks available.

Instead, he says, recently developed algorithmic technologies based on the field of computational chemistry called cheminformatics are needed to search the huge chemical spaces without huge computational cost. These tools can quickly search the building blocks and compounds that are described using strings of characters or databases. For example, he says, if you have a small fragment sitting in a protein pocket, computers can quickly search for other building blocks that could be added or related fragments that might also fit. And that search can help a creative chemist trying to solve a problem.

The main driving force here is the convergence of what we call traditional computational chemistry and AI/ML. This combination is the winning formula.
Ashwini Ghogare, head of AI-enabled drug discovery, MilliporeSigma

Because chemists have built ways to encode the properties of these building blocks and how they can react, those well-validated data can also feed into artificial intelligence and machine learning (AI/ML) applications. The drug industry is investing heavily in this new breed of computer-powered applications, which can offer services such as chemical design, synthesis planning, and automated synthesis. AI/ML services may also help chemists navigate the ever-growing constellation of available compounds.

“The main driving force here is the convergence of what we call traditional computational chemistry and AI/ML,” says Ashwini Ghogare, head of AI-enabled drug discovery at MilliporeSigma, part of the life sciences business of Merck KGaA. “This combination is the winning formula.”

Ghogare and her team have been part of that investment at MilliporeSigma. For example, they have been building AI-powered drug discovery software that she says will allow users to search ultralarge chemical space and then design new screening compounds. Most large drug companies are trying to build these sorts of systems internally, she says. The new effort aims to make the same tools available to small and midsize companies as well.

The final frontier

There’s a lot of chemistry out there still to explore, but the growing chemical space means chemists and machines need to work hand in hand to make sense of it. By combining screening technologies and computationally optimizing screens, chemists are making the hunt quicker, cheaper, and more efficient.

At Enamine, Moroz is convinced of this trend of combining and optimizing because of the orders for screening compounds he’s seeing. “The change is here,” he says. “It’s definitely here.”

Experts cannot yet point to a drug on the market and show that the hunt was started by combining computationally described building blocks. “That, of course, would be the acid test,” says Warr, the consultant. But, she continues, this is a growth area for industry. “There’s a lot of recruitment going on in AI and data science,” she says. “When a lot of big money is spent on these things, it means that management is actually taking it seriously.”

On July 5, 2022, a new opening image was added with 1,1-dimethylcyclobutane removed.

This story was updated on June 29, 2022, to remove the opening image because the chemical structure of 1,1-dimethylcyclobutane resembles an offensive symbol.



This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.