Machine learning, in which computers train on large data sets to make predictions, can be a fast way to find promising molecules for various applications, but it’s only as good as the data it trains on. A new strategy could make the method more useful for identifying leads among inorganic complexes, for which reliable data can be harder to come by (J. Phys. Chem. Lett. 2018, DOI: 10.1021/acs.jpclett.8b00170).
Heather J. Kulik and colleagues at Massachusetts Institute of Technology wanted to use machine learning to find new inorganic compounds with a small energy gap between their high- and low-electron-spin states. Because light or heat can boost these molecules, called spin-crossover complexes (SCCs), into a high-spin state, they could be useful as switches and sensors. Finding new SCCs computationally presents a particular challenge for machine-learning models. Spin states and other properties of inorganic molecules are complicated, and less data is available to teach the models.
To overcome this limitation, the researchers combined a standard search algorithm with a type of machine learning called an artificial neural network to explore octahedral SCCs. The network was trained to recognize complexes with a spin-state energy gap of 5 kcal/mol or less and provided a check that limited the algorithm’s exploration to complexes more familiar to the neural network.
Their method turned up 372 leads in minutes, which would have taken about four days with a rigorous computational method, density-functional theory (DFT). Kulik concluded that about 70% were viable targets.
“By being a little bit conservative about walking away from spaces where the model was completely untrustworthy, we were able to be right—where right was reproducing the DFT result—a good amount of the time,” Kulik says. “This is a paradigm that can be extrapolated to very, very large explorations of chemical space very, very rapidly.” She’s hoping to evaluate millions of molecules in the next iteration.
Kendall N. Houk, a computational chemist at the University of California, Los Angeles, praises the research for its speed and accuracy. Houk says the paper “fits into contemporary excitement about the use of machine learning.”