A depressing trend looms large for today’s pharmaceutical chemist: For the past 60 years, the cost of developing a new drug—currently about $2.5 billion—has roughly doubled every nine years. Many have placed the blame partly on cheminformatics: Even though lots of time and money have been spent on computationally searching for and docking drugs into proteins and other target molecules, the technique hasn’t lived up to the hype and hasn’t produced a prolific stream of good drug candidates.
But some chemists believe they’ve found a new strategy that may help solve their drug discovery woes. They think that a serendipitous combination of computer processing advances, access to huge chemical data sets, and a groundbreaking computational strategy called deep learning could usher in a way to quickly and efficiently teach computers to find successful drugs in ways that far surpass current computer-based methods.
Only in use for about five years, deep learning is a specialized, more sophisticated type of machine learning, a computational technique that generally uses data sets to teach itself and then applies its newly acquired “knowledge” to make predictions. Chemists have been using traditional machine learning for several decades to train computers to search libraries of compounds for possible drug candidates.
Deep learning burst on the scene in 2012 as a computer science breakthrough and has led to remarkable advances in technologies such as voice and face recognition. Once clumsy and error-prone, voice and face recognition are now accurate and ubiquitous on computers, tablets, and smartphones thanks to the computational strategy. Self-driving cars have even benefited from deep learning’s ability to navigate roadways after training itself with radar, sonar, imaging and other data. As evidence, the transportation company Uber announced in December 2016 that it would begin pilot-testing its self-driving cars in San Francisco (a project later moved to Arizona).
Now, a growing number of chemists are hoping this strategy might work for them, too.
In drug discovery, the current collection of 5,000 or so molecular properties, such as aromaticity or bond strength, that computers use to find potential therapeutic compounds have been hand-selected by chemists over the years. Deep learning, however, has the potential to find, all by itself, combinations of druglike properties needed for therapeutic candidates. And some of these combinations could very well be parameters no human could discern.
Deep learning applied to chemistry has almost evangelical support in some research circles, made up of academicians and industry scientists alike. These researchers hope the new strategy will reverse the declining success of drug discovery. Money is pouring into start-ups that offer different types of deep-learning platforms. For instance, Palo Alto-based start-up TwoXAR recently raised $3.4 million in seed money, and San Francisco-based Atomwise raised $6 million.
As with any new field that generates great excitement, some drug discovery chemists warn that the enthusiasm needs to be tempered with realism. Big data sets, a requirement for good deep-learning performance, are still hard to come by in chemistry. And researchers need to develop ways to ensure the molecules “discovered” by deep-learning algorithms are compounds that chemists can realistically synthesize.
Deep learning has yet to show that it’s significantly better at finding drug candidates than other machine-learning methods, maintains Mark Murcko, chief scientific officer of Relay Therapeutics.
Nonetheless, an increasing number of groups that blend chemists with computer scientists are banking on continuing advances in deep learning as the drug discovery tool of the future. The use of deep learning in chemistry is less than two years old, says Olexandr Isayev, a computational chemist at the University of North Carolina, Chapel Hill, “but despite that we’re seeing tremendous progress.”
For example, Atomwise is hoping its deep-learning architecture, known as AtomNet, will allow the firm to repurpose a drug, already evaluated by the Food & Drug Administration for a different use, to prevent Ebola infection.
AtomNet screened 7,000 already evaluated medications for their ability to bind strongly to a protein in the Ebola virus known as glycoprotein 2. This protein drives Ebola infection with a clawlike structure that tears into cell membranes, allowing the virus to slip in and infect cells. AtomNet pinpointed 17 promising small molecules that block this action, one of which prevented Ebola infection of cells in the lab, says Abraham Heifets, CEO of Atomwise. The firm is not yet disclosing this top candidate’s identity.
The invention of deep learning, which aided AtomNet in finding the promising candidate, can be traced back as far as 1998. But it wasn’t until 2006 that pioneering computer scientist Geoffrey Hinton, now professor emeritus at the University of Toronto, began publishing papers that laid serious groundwork for deep learning.
There are many varieties of traditional machine-learning algorithms—with names like decision trees, nearest neighbors, and neural networks. Neural networks—first described in the 1950s—model the action of neurons in the brain and are one of the most popular machine-learning methods.
Deep learning takes the architecture of neural networks to a new level of accuracy. Instead of a process that employs one set of data that leads to an output, deep learning is based on multiple layers of calculations. For example, in training a computer to recognize images of cats, a data set fed into a deep-learning algorithm gets chopped up into individual pixels.
The algorithm will begin collecting those pieces into bigger chunks that identify basic features such as outlines, or edges, of cats. Further iterations of calculations, each carried out by a different layer in the architecture, begin to close in on eyes, ears, and whiskers. Soon the computer has learned what collections of data represent cats in general. Humans learn this way too: Scientists have estimated that our cerebral cortexes process information with a six-layer architecture.
Several factors are responsible for the current deep-learning revolution. First, computer hardware has gotten faster, thanks in particular to graphical processing units, or GPUs. Originally designed for video games, GPUs are now in common use by scientists for rapid, massively parallel computations.
In addition, data sets are getting bigger. It’s now trivial to collect millions of cat images off the internet, for example, or to scan millions of tweets. In fact, without huge amounts of data, deep learning can actually perform worse than other algorithms. Because it’s so good at learning to recognize patterns, if an initial training data set is too small, the algorithm will just memorize it—a problem known as overfitting.
In a classic example of this problem, in the 1980s the U.S. military wanted to train computers to recognize tanks in aerial footage. They prepared training sets, consisting of forested areas with and without tanks. The computer learned its task on that data set without a hitch. But in real life, the project failed. It turned out that the training footage with the tanks had been taken in the morning, and the footage without the tanks had been taken in the afternoon. So the computer trained itself to recognize tanks by looking for differences in light rather than identifying the tanks themselves.
The final key advance that made deep learning practical was the development of sophisticated algorithms. Deep learning made its major debut at the 2012 annual ImageNet Large Scale Visual Recognition Challenge, a competition that pits new machine-learning algorithms against each other. A neural network architecture that went eight layers “deep,” written by Hinton, Alex Krizhevsky, and Ilya Sutskever now at the Stanford Vision Lab, had an image recognition error rate of only 15.4%, compared with 26% for the next best competitor.
The performance of this algorithm, dubbed AlexNet, was significant enough to astonish the field. The paper that resulted has been cited more than 8,000 times.
Chemists have quickly taken note of deep learning’s potential to help them recognize drug candidates and have begun developing deep-learning programs for their own use.
In drug design, a computer would look at aspects of molecules rather than pixels in images, explains Bartosz Grzybowski, chemistry professor at Ulsan National Institute of Science & Technology. Instead of identifying humans, it would identify, say, molecules that block G-protein-coupled receptors, which are popular drug targets. “It’s very difficult to wrap your head around all those data,” he says.
Stanford chemistry professor Vijay Pande’s group is collaborating with Google, using the company’s massive neural network system to search for drug candidates. The group has been training computers to model molecules that bind to and thwart the action of the enzyme β-secretase 1 (BACE1), which is believed to help produce the hallmark brain plaques of Alzheimer’s disease (J. Chem. Inf. Model. 2016, DOI: 10.1021/acs.jcim.6b00290).
Deep-learning research is already popping up in computational chemistry journals. For instance, Insilico Medicine chemist Alexander Aliper and coworkers used the technique to predict drug pharmacological properties and whether approved compounds could be repurposed (Mol. Pharmaceuticals 2016, DOI: 10.1021/acs.molpharmaceut.6b00248). And a group led by Cicero Nogueira dos Santos at IBM Watson is working on the use of deep learning to improve the success of virtually screening libraries of molecules that dock strongly to their targets (J. Chem. Inf. Model. 2016, DOI: 10.1021/acs.jcim.6b00355).
Drug development still faces more hurdles with deep learning than image or voice recognition. Large compound libraries strain to provide the huge amounts of data required for deep learning to be effective. “The reality is that chemistry lags in the amount of data available,” the University of North Carolina’s Isayev says. “Only recently have we begun to see big data sets.”
Some hope that the pharmaceutical industry, which is rabidly protective of its data, might find ways to share data sets (see page 31). There’s much talk about the need to incorporate more negative results—molecules that failed to pan out as drug candidates—to provide more robust training sets. Pharma scientists and academicians alike have troves of negative results that don’t get published.
Isayev believes a key development that will boost deep learning will be to find ways to feed representations of entire molecules into a computer, rather than submitting human-crafted features. The machines will then do a much better job of discovering significant combinations of properties. “There’s no way to do that yet,” he says. “But people are actively working on it. It’s like the Wild West.”
CORRECTION: This story was updated on Jan. 23, 2017, to correct Mark Murcko’s affiliation from Relay Pharmaceuticals to Relay Therapeutics.