A machine-learning technique first developed for understanding language can accurately classify reactions according to type (Nat. Mach. Intell. 2021, DOI: 10.1038/s42256-020-00284-w). The model also tagged reactions with computer-readable codes that allow chemists to search for similar reactions.
Transformers are a type of machine-learning algorithm useful for interpreting sequences of information. They’re widely used in translation software and voice assistants like Amazon’s Alexa, but chemists recently have shown their utility in chemistry (see page 19). Philippe Schwaller of IBM Research–Zurich and the University of Bern and colleagues show the transformer approach can classify reactions by type, identifying broad categories like carbon-carbon bond formation or deprotection and finer-scale groups like chloro or bromo Suzuki coupling. When categorizing reactions, the machine-learning model matched the classifications assigned by software that used human-coded rules 98% of the time. The researchers tested their model on data sets containing tens of thousands of reactions described using the SMILES (simplified molecular-input line-entry system) line notation, which describes chemical structures with a string of characters. Schwaller says the group traced many of the classification disagreements to tautomeric differences in molecules or simple typos in the data.
The group’s model generates unique codes to identify different reactions, and Schwaller says these codes are useful for more than just classifying reactions: the codes can identify very similar reactions via a database search. Schwaller says this function could help chemists screen a database for alternative reactions or help them optimize a reaction by pointing them to published procedures for similar transformations. And he says these models can be modified for other tasks, like predicting reaction yields.