If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Computational Chemistry

Machine learning can have human bias

Algorithm performance suffers when humans choose how to train it

by Sam Lemonick
September 11, 2019 | A version of this story appeared in Volume 97, Issue 36


Crystal structure of vanadium borate crystal.
Credit: Alexander J. Norquist
Crystal structure of a vanadium borate complex. Red=O, blue=B, grey=H, green=VO5

Machine learning is often touted as a way to replace human chemists in certain tasks in research. For instance, machine learning algorithms could predict new reagents likely to make desired materials, rather than a chemist searching the literature or using their intuition to find one. But humans still make and train these algorithms. A team of researchers have now shown that during those steps humans can smuggle in biases that infect machine learning and degrade its performance (Nature 2019, DOI: 10.1038/s41586-019-1540-5).

The team, consisting of Haverford College’s Sorelle A. Friedler, Alexander J. Norquist, Joshua Schrier, also of Fordham University, and a host of Haverford undergraduates, looked at how well machine learning algorithms could predict reactants and reaction conditions used to make amine-templated metal oxides. These compounds can form zeolites or metal-organic frameworks.

The researchers had noticed that human chemists trying to make these compounds used a relatively small number of amines and a narrow range of reaction conditions. In their own lab, the team found that other amines and conditions worked just as well, suggesting that human biases prevented chemists from fully exploring the possible chemical space for these reactions.

In theory, machine learning could search this space faster and more broadly than a human, and possibly identify patterns that people had missed. But these algorithms can only recognize desired characteristics of molecules or conditions by being trained on relevant datasets, which are built by humans.

To explore humans’ effect on this training, the team tested two algorithms on 110 possible reactions to make vanadium borates—one was trained on datasets built by people and the other was trained on randomly selected data. The algorithm using randomly selected data was more accurate than its counterpart and produced fewer false positives in terms of finding reagents and conditions that worked in the lab. The human-trained algorithm also missed an entire class of crystals that the random data algorithm found.

Considering machine learning’s promise, Schrier says, it’s a shame to make an algorithm “that’s just as stupid as humans because of the way it’s trained.” The team did not try to identify the roots of human bias, they just wanted to show that it limited discovery.

Leroy Cronin, a computational chemist at the University of Glasgow, says the results aren’t totally unexpected. But he cautions that there may be some areas of chemical space where human intuition is useful for making leaps a computer might not predict.



This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.