Chemists-turn-machine-learning-data

Publishers and others apply standard artificial intelligence techniques to synthesis planning and education

January 23, 2017 | A version of this story appeared in Volume 95, Issue 4

COVER STORY

Getting the most out of chemistry data with machine learning

New directions for machine learning

Chemists pin hopes on deep learning for drug discovery

Pharma partnership applies deep learning to very big data

Although chemists are excited by the potential of so-called deep-learning computational tools to make a splash in drug discovery, publishers and others are still looking to squeeze findings out of earlier, less sophisticated versions of these tools. With machine-learning techniques that “teach” themselves with large data sets, they hope to get more out of scientific information, whether in the lab or the classroom.

“It’s about how you make discoveries consumable,” says Conal Thompson, chief technology officer for CAS, a division of the American Chemical Society that’s looking into how to get more out of its chemistry databases. “What’s going to become more valuable is insight from your data or content, rather than just the content itself.” ACS publishes C&EN.

One emerging area of chemistry that is capitalizing on machine learning is computer-aided synthesis design: feeding software a target molecule and getting back possible routes chemists might use to make it.

Synthesis deconstruction

Schemes showing the core and extended core of an example synthetic reaction.

Credit: ChemPlanner

To allow computers to find synthetic pathways, expert chemists write rules to home in on the core and extended core of a reaction (hydrogens omitted for clarity). Machine-learning algorithms can then help the computer navigate synthetic possibilities and rank solutions.

“Eight years ago, there was a lot of skepticism and resistance of chemists to the whole notion” of artificial intelligence being used to solve chemical problems, says Orr Ravitz, product manager for Wiley’s ChemPlanner, one platform offering help with synthesis design. “I think a lot has changed since then, and I think that’s related to us using so many computational tools in our daily life. People are starting to expect it.” Also, the falling cost of computing power has made chemistry applications faster and less expensive.

But machine learning does not help in every situation. For example, the basic reaction rules underlying ChemPlanner and a similar program developed by a start-up company, Chematica, do not come from computers automatically extracting information from journals or patents. Instead, humans are extensively involved to identify key reactions and write the reaction rules on which the programs run. This is in part because artificial intelligence programs learn best when they train on hundreds of examples.

If chemists just want to use very common, well-established reactions, then machine learning likely could extract them from the literature, says Bartosz A. Grzybowski, developer of Chematica and a chemistry professor at Ulsan National Institute of Science & Technology and at the Polish Academy of Sciences. “But for complex synthetic planning, very rare reactions can be very important. A reaction that might appear in the literature only three times may be key to making a natural product,” Grzybowski adds.

Consequently, expert chemists write the rules that allow the software to identify the core of a reaction—the bonds that change during the reaction and their associated atoms. Chemists also write the rules that dictate when and how to incorporate other components of reagent structures that may influence reactivity, such as aromaticity or electron-donating or -withdrawing groups.

Photo of two people in front of a computer screen.

Credit: Bartosz A. Grzybowski

Grzybowski (left) and Karol Molga explore synthetic pathways using Chematica.

The software then uses those rules to identify possible reactions based on whether the chemical structures of those “extended cores” share similar properties. Machine learning comes in for navigating the options among a huge network of synthetic possibilities.

Machine-learning algorithms also play a role in scoring synthetic pathways to prioritize the order in which they’re shown to the user. Scoring is not a one-size-fits-all process, the software developers have found. Different chemists differently prioritize things such as cost, yield, number of steps, or use of protecting groups. “What we hope to do with machine learning in the future is to basically learn from the user’s interaction with the system and try to tailor prioritization to their taste, similar to what Netflix does based on your viewing history,” Ravitz says.

Researchers are also actively applying machine learning to materials science. Northwestern University professor Chris Wolverton and colleagues recently published a general framework for using machine-learning approaches to predict properties of inorganic materials (npj Comput. Mater. 2016, DOI: 10.1038/npjcompumats.2016.28). Separately, a team led by Sorelle A. Friedler, Joshua Schrier, and Alexander J. Norquist of Haverford College used machine-learning models to predict conditions for successful crystallization of inorganic-organic hybrid materials (Nature 2016, DOI: 10.1038/nature17439).

Notably, the Haverford group says in its paper that the researchers used information on “dark” reactions—failed or unsuccessful syntheses—collected from their archived laboratory notebooks to help train their machine-learning model, and they have a “Dark Reactions Project” website set up to gather similar information at darkreactions.haverford.edu. Such “dark” data will become increasingly important as people look to develop machine-learning applications, experts say.

“Nobody likes publishing negative results, but a machine and its intelligence would be much more informed by having positives and negatives,” CAS’s Thompson says. “It’s not a mistake anymore, it’s valuable information.” However, making that valuable information accessible is an unsolved problem in a scientific culture that prizes positive findings and largely ignores so-called negative results in its publications.

COVER STORY

Getting the most out of chemistry data with machine learning

New directions for machine learning

Chemists pin hopes on deep learning for drug discovery

Pharma partnership applies deep learning to very big data

Other areas that may benefit from machine learning include scientific education, where algorithms can potentially improve student learning outcomes. “One of our divisions creates educational materials for nurses, but many of our students get frustrated with the challenging material, drop out of the course, and never take their certification exam,” Dan Olley, chief technology officer at Elsevier, told CIO magazine last year. “We are using algorithms that learn how students actually use the course material,” he continued. “This way, we can create adaptability and personalization within the course to engage the students and drive better pass rates.”

Where machine learning will take scientific learning and research in the future remains to be seen. But just as computing technology has changed daily life to incorporate activities previously only seen on “Star Trek”—“Alexa, lower the temperature to 68 degrees”—it has the potential to allow the scientific enterprise to do things researchers previously only dreamed about.

CORRECTION: This story was updated on Jan. 23, 2017, to correct the credit on the reaction scheme. It should be credited to ChemPlanner, not Chematica.

Chemical & Engineering News

ISSN 0009-2347

Advertisement

LATEST

TOPICS

MAGAZINE

FEATURES

COLLECTIONS

PODCASTS

CHEMPICS

JOBS

LATEST

TOPICS

MAGAZINE

FEATURES

COLLECTIONS

PODCASTS

CHEMPICS

JOBS

Physical Chemistry

Chemists turn to machine learning to get the most out of data

Publishers and others apply standard artificial intelligence techniques to synthesis planning and education

by Jyllian Kemsley

January 23, 2017 | A version of this story appeared in Volume 95, Issue 4

Advertisement

Synthesis deconstruction

You might also like...

Join the conversation

Advertisement

TOPICS

MAGAZINE

FEATURES

COLLECTIONS

Grab your lab coat. Let's get started

Welcome!

Welcome!

Create an account below to get 6 C&EN articles per month, receive newsletters and more - all free.

It seems this is your first time logging in online. Please enter the following information to continue.

As an ACS member you automatically get access to this site. All we need is few more details to create your reading experience.

The key to knowledge is in your (nitrile-gloved) hands

Access more articles now. Choose the ACS option that’s right for you.

Thank you!

Chemists turn to machine learning to get the most out of data

Publishers and others apply standard artificial intelligence techniques to synthesis planning and education

by Jyllian Kemsley

January 23, 2017 | A version of this story appeared in Volume 95, Issue 4

Advertisement

Synthesis deconstruction

You might also like...

Join the conversation

The power is now in your (nitrile gloved) hands

Sign up for a free account to get more articles. Or choose the ACS option that’s right for you.

Option 1

Create a free account To read 6 articles each month from

Option 2

BEST VALUE

Join ACS To get even more access to

Create a free account
To read 6 articles each month from

Join ACS
To get even more access to