If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



Free tool uses machine learning to pick better molecules for testing new reactions

Software could reduce bias and improve comparisons between reaction methods

by Sam Lemonick, special to C&EN
April 22, 2024


A chart features thousands of dots distributed in an amorphous shape. Each dot represents a molecule in the DrugBank database, and the dots are grouped into about a dozen clusters of different colors representing the structural similarity of their corresponding molecules.
Credit: Debanjan Rana
The software powering a new web-based tool can recommend a diverse set of molecules (left) to test new reactions on. The algorithm combs a compound library, groups the compounds by structural similarity (right), and identifies major clusters (letters).

A new chemical reaction method is only as valuable as the chemistry it makes possible. So a paper that describes a new reaction will almost always include a table of compounds that researchers tested their protocol on. Ideally, that table reveals how well the method works on molecules with various steric and electronic properties.

But a chemist’s choice of substrates can introduce bias into their study. The chemist might avoid testing a potentially useful substrate because it’s expensive or might reach for a substrate lying around their lab rather than order something new. Plus, synthesis papers don’t typically report negative results—the substrates that researchers test but don’t work with their reaction—which can provide valuable information about a protocol but might make a method seem less useful.

Frank Glorius and his group at the University of Münster are among the researchers who have been thinking about ways to improve substrate selection. The group is now releasing a web tool to help chemists pick substrates (ACS Cent. Sci. 2024, DOI: 10.1021/acscentsci.3c01638). The researchers think that if the tool is widely used, it will make substrate selection less biased and enable more meaningful comparisons between different reaction protocols.

The researchers’ first step in the process was to use a machine learning algorithm to peruse the molecule database DrugBank and create a map of the database’s compounds grouped by structural similarities. From that map, chemists can use the tool to home in on a list of substrates and then filter through that list using criteria such as substrate cost and availability as well as the kinds of reactions they want to perform. After this filtering, the tool helps to ensure that the substrate recommendations come from distant parts of the molecular map so that a diverse pool is maintained.

“Developing a web application is a great way to bring these ideas to a larger part of the community,” says Connor Coley, a computational chemist at the Massachusetts Institute of Technology. He has previously helped develop approaches for reducing bias in reactivity models (ACS Cent. Sci. 2023, DOI: 10.1021/acscentsci.3c01163).

Glorius’s group demonstrated the tool on two reactions that act on alkenes: a photochemical iminocarboxylation of alkenes developed in their lab (Nat. Chem. 2022, DOI: 10.1038/s41557-022-01008-w) and a widely used osmium-catalyzed dihydroxylation. For both reactions, the researchers used the same set of 15 substrates, filtered to include only molecules cheaper than €100 (about $107) per gram and less than 700 Da.

The researchers report that 7 of the 15 substrates were successful for the iminocarboxylation and 8 worked in the dihydroxylation, with success meaning better than 10% yield. They say the dihydroxylation results match what chemists have learned about that method through experiment. But in the case of the iminocarboxylation, the substrate-selection technique revealed limits in scope that hadn’t been apparent in the group’s original study. Glorius says the higher failure rate in the new study than in the original reflects the increased diversity and structural complexity of the substrates the new software selected.

The new tool is just one approach for reducing bias in substrate selection, and Coley calls it hard to judge whether one is better than any other. Glorius’s group seems to agree. Niklas P. Hölter, a master’s candidate in the group, says chemists will likely develop the most useful substrate lists by incorporating several of these new methods.


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.