Programs built by Google’s artificial intelligence company DeepMind have beaten humans playing chess, Go, and some Atari computer games. Biologists and computer scientists say the company has now done the same with the protein-folding puzzle. At an international competition, the company’s program predicted how proteins fold in three dimensions given only their amino acid sequences. “A 50-year-old grand challenge in computer science has been to a large degree solved,” says John Moult, a structural biologist at the University of Maryland, who announced the results of the competition this week.
The shapes and functions of proteins result from how the amino acids that make up each protein interact with each other and their environment. There are a vast number of these interactions to consider for even a short stretch of protein, making predicting how proteins fold an enormous challenge for scientists. In the early 1990s, Moult helped set up the Critical Assessment of protein Structure Prediction, or CASP, competition to push researchers to rise to the prediction challenge. But researchers, including Moult, admit that they had given up hope they would live to see a solution.
Then in 2018, at the last CASP conference in Cancun, researchers could be found walking around in a daze. Newcomers AlphaFold, the protein folding team from DeepMind, had just bested long-running groups with many years of experience. AlphaFold didn’t just win the competition, they put significant sunlight between themselves and the next best team. But their predicted structures still couldn’t match the actual ones obtained through structural biology experiments, like X-ray crystallography or cryogenic electron microscopy.
At this year’s competition, two-thirds of the protein structures predicted by AlphaFold were within experimental error. Basically, these structures were as good as the ones researchers could obtain through their laboratory techniques.
Many groups have turned to machine learning techniques to try to predict protein structures. They train their algorithms on known protein folds, hoping that the programs can find patterns that translate into specific folds. But not content with their results in 2018, the AlphaFold team led by John Jumper went back to the drawing board and completely rebuilt their machine learning approach for this year. It was not smooth sailing, Jumper told a press conference last week, but it works. The new AlphaFold approach uses different machine learning techniques including an attention-based algorithm to solve protein structures in small chunks, a process that Jumper likens to solving a jigsaw puzzle, with different “islands of solution” that you then have to figure out how to join up. “We really didn’t know until we saw the CASP results how far we had pushed the field,” Jumper says.
One of the researchers who assessed results from the different teams’ programs for this year’s CASP competition was Andrei Lupas at the Max Planck Institute for Developmental Biology. He says it was immediately apparent that AlphaFold had made an incredible improvement from their 2018 efforts. Not only did they have a large lead over the other groups overall, he says, but while the accuracy of the other teams’ predictions fell away as the structures became more difficult to solve, AlphaFold barely registered a difference. “They don’t care whether the target is easy or hard,” he explains.
To test just how good AlphaFold was, Lupas dug out a protein that his research team did not know the full structure of. Lupas’ group had a good data set for the protein, he said, but for the last 10 years, they’d exhausted various structural biology approaches to translate it into a 3-D structure. “And so, we gave this as a target and asked for models,” he explains. AlphaFold’s model “solved our structure within half an hour.”
“The ultimate vision behind DeepMind has always been to build general AI and then use it to help us better understand the world around us,” says Demis Hassabis, CEO and cofounder of DeepMind. Hassabis says he first became interested in the problem of protein folding in college. The company has “had big breakthroughs with games like Go, Starcraft, and Atari,” he adds. “But it’s important to realize that they were always meant as a stepping stone on the path towards this overall aim.”
This new AlphaFold program has not entirely solved the protein-folding problem. There are still some protein structures that can’t yet be solved by AlphaFold, such as complexes with many protein-protein interactions or proteins in cell membranes. However, Lupas says, for many biologists, the solutions will be good enough for their needs. For example, Lupas could use quickly solved structures to compare different proteins and find specific shapes, or domains, that suggest they evolved from a common ancestor protein or peptide.
Within the next 10 years, Lupas thinks AI will advance to the point at which biologists will need just a data set and an algorithm to solve a protein structure, allowing them to spend less time with experiments and more time thinking about and conceptualizing what those results mean.
“Proteins are the most beautiful, gorgeous structures and the ability to follow them to predict exactly how they fold up in three dimensions is really very very challenging,” says Janet Thornton, director emeritus & senior scientist at the European Bioinformatics Institute, part of the European Molecular Biology Laboratory. “This is an ideal problem for machine learning,” she adds. “But I think there are also many problems, particularly in medicine and in the environment, which will really benefit from these machine learning approaches.”
While the full details of the new AlphaFold system aren’t yet available for review, Jumper says the team plans to submit a full paper describing their work to a peer-reviewed journal just as they did after their success in 2018 (Nature 2020, DOI: 10.1038/s41586-019-1923-7). The team has also started several collaborations with research groups to see how AlphaFold could be useful. They are also exploring how they might make their services available to industry.