When chemists develop new types of reactions, they generate a lot of data on what works and how well, along with what doesn’t work at all. Much of the data are never used, says Abigail G. Doyle, a chemistry professor at Princeton University. “We publish only a small fraction and usually only the best results,” she says. Doyle thinks that by using machine learning—in which computer algorithms find patterns in data—it might be possible to use all the data chemists generate to predict the best conditions for a reaction even when the substrate has never been used in that transformation before.
Doyle and Princeton’s Derek T. Ahneman and Jesús G. Estrada, along with Merck & Co.’s Spencer D. Dreher and Shishi Lin, take a step in this direction by using machine learning to predict the yield of a Buchwald-Hartwig amination (example shown). Their algorithm allowed for variation in the aryl halide substrate, palladium catalyst ligand, base, and the addition of an isoxazole (Science 2018, DOI: 10.1126/science.aar5169). The chemists added isoxazole to the mix because this motif is popular in druglike molecules but sometimes poisons these reactions. The team hoped to get a better idea of what conditions and specific isoxazole structures were problematic.
Using Merck’s ultra-high-throughput reaction technology, the chemists performed 4,608 reactions and used the data from a portion of those to build an algorithm that would predict the outcome of the remaining reactions. After trying several algorithms, the chemists found that the so-called random forest model performed the best.
This algorithm accurately predicted which isoxazole additives would poison the reaction, even those that weren’t included in the data used to build the model. The results could help chemists pick which ligand and base combination to use to maximize yields for the C–N coupling when a given isoxazole motif is part of their substrate.
“We’re most excited by the idea that you can apply this method to any sort of new problem that you identify in reactivity,” Dreher says, although both he and Doyle say that kind of predictive power is still a long way off.
The team’s use of machine learning “is marvelous and long overdue for the field of homogeneous catalysis and chemical synthesis in general,” says Richmond Sarpong, an expert in organic synthesis at the University of California, Berkeley.