If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Computational Chemistry

Synthesis-planning program relies on human insight and machine learning

The hybrid program outperforms machine-learning-only programs and shows promise in use of rare reactions

by Sam Lemonick
November 27, 2019 | A version of this story appeared in Volume 97, Issue 47

Structure of the molecule bimatoprost.

Computer-aided synthesis planning (CASP) programs aim to replicate what synthetic chemists do when tackling a synthesis: start with a target molecule and then work backwards to trace a synthetic route, including an efficient and achievable series of reactions and reagents. Work in this field stretches back 50 years, but successful examples have emerged only in the last several years. These either rely on chemistry rules written by human chemists, or on machine-learning algorithms that have assimilated synthesis knowledge from databases of reactions.

Now researchers report that one CASP program that combines human knowledge and machine learning performs better than those using only artificial intelligence, particularly for synthetic routes involving rarely used reactions (Angew. Chem. Int. Ed. 2019, DOI: 10.1002/anie.201912083).

The program is an update to Chematica, which was developed by Bartosz A. Grzybowski of the Ulsan National Institute of Science and Technology and the Polish Academy of Sciences, and is marketed by MilliporeSigma as Synthia. Grzybowski says the program now includes almost 100,000 rules that he and colleagues have encoded over 15 years. Last year, they demonstrated that Chematica’s synthetic plans are as good or better than human chemists’ in laboratory syntheses. To this point, Grzybowski has been “perhaps the staunchest proponent of the expert approach” to synthesis planning software, says Connor W. Coley of the Massachusetts Institute of Technology, who has developed a machine learning–based CASP program.


The percentage of time that the new hybrid Chematica program found a preferred synthetic step, even with few examples of it in the literature.

Now Grzybowski and colleagues have incorporated machine learning into Chematica. They trained machine-learning algorithms called neural networks on about 1.4 million product molecules that match one or more of Chematica’s expert-coded reactions. Grzybowski says this hybrid approach teaches the algorithms which of those expert rules chemists actually use. That can help Chematica avoid a synthetic step that is possible but impractical, or to favor a reaction that may be rarely seen in the literature, but is necessary for certain transformations.

Grzybowski says human insight is important to include in a CASP program because chemical synthesis poses a more difficult challenge for machine-learning algorithms than playing chess or Go, games that these programs consistently beat humans at. For one, successful synthetic-route planning often involves considering two or three steps simultaneously. And unlike making a move in those games, calculating the effects of a given synthetic transformation—for example, the effects on electron density or stereochemistry—takes significant computing time.

The researchers compared the abilities of their hybrid algorithm with those of a purely neural network–based approach published last year (Nature 2018, DOI: 10.1038/nature25978). The two methods were about equally effective at proposing synthetic steps that matched published reactions when their training data included thousands of examples of those reactions. But when there were fewer than 100 examples, the neural network approach rarely identified a verified transformation, while the hybrid version of Chematica found it more than 75% of the time. Several of the hybrid program’s proposed reactions to synthesize the glaucoma drug bimatoprost were not represented in its training data, demonstrating its ability to use unusual reactions.

Chemists agree that this human-machine partnership shows promise, especially for less common reactions. “This is important because there has been a preference of modern retrosynthetic algorithms to favor well-precedented reactions,” says Timothy A. Cernak, whose lab at the University of Michigan is sponsored by MilliporeSigma and uses Synthia. But Coley cautions that a fair comparison of a hybrid approach and a neural network alone is difficult because there’s greater potential for human experts to bias the data that the system is trained and tested on.

The researchers have not verified the generated synthetic routes in lab experiments, but Grzybowski says his group will publish new, lab-tested natural product syntheses from this program soon. He also says there are plans to incorporate the hybrid system into Synthia.


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.