IBM researchers have added enzymatic reactions to their artificial intelligence–powered synthesis planning software RXN. The algorithm can work backward to propose series of reactions that use enzymes instead of classical chemical catalysts to arrive at a target molecule(Nat. Commun. 2022, DOI: 10.1038/s41467-022-28536-w). IBM first launched the software, RXN for Chemistry, to predict the outcome of chemical reactions. It has since been updated so that users can draw a molecule and ask the program to suggest synthetic routes to produce it, a process called retrosynthesis. Switching these synthetic organic reactions to enzymatic catalysis is a focus of industry because enzymes can mean faster, greener reactions.
Accurately predicting the products of biotransformations has been a stumbling block for retrosynthetic planning, however. The IBM team got around this with their new approach, training their algorithm on four biochemical reaction databases. RXN can now correctly plan a synthesis using enzymes around 40% of the time and the system has also pointed out some errors in the databases used to train it.
Elaine O’Reilly of University College Dublin, who was not involved in the study, says the idea behind this new work is very important. But, she says, RXN’s predictions currently achieve a similar level of success to the RetroBioCat program, which uses a set of reaction rules as templates to plan syntheses and was published last year by University of Manchester researchers.
Improving prediction tools that learn directly from databases will rely on better training data. And that needs more stringent characterization of enzymatic reaction products and selectivity, O’Reilly says, which is often not rigorously reported. “This is something the biocatalysis community should seek to resolve,” she adds.
Daniel Probst, who helped build the new ability into RXN, agrees that there is room for improvement. Still, he stresses that quickly improving is something machine learning algorithms can do well in contrast to rules-based approaches that use individually coded reactions. He says there’s space for both methods, but RXN can scale and improve itself when given more training data.
He also suggests some other longer-term advantages. For example, RXN might help researchers discover new rules for biosynthesis “because we’re really based on data. So we might spot things based on the data that humans couldn’t,” he says. “On the other hand, I also like that we can maybe help a couple of PhD students that don’t have to handwrite rules during their PhD.”