The UK company DeepMind and its research partners have released predicted protein structures for nearly every protein expressed in the human body—more than 20,000 of them—for free online. The new database—which also contains 3D structures of the proteomes of 20 other organisms—was made possible by AlphaFold, the artificial intelligence tool of DeepMind, which is a subsidiary of Google’s parent company, Alphabet. Along with the 365,000 total protein-structure predictions released with the new study, DeepMind’s research team made public the source code for AlphaFold (Nature 2021, DOI: 10.1038/d41586-021-02025-4).
AlphaFold shocked the science world last winter when it swept the Critical Assessment of protein Structure Prediction, or CASP, contest by accurately predicting the structures of two-thirds of their own entries. Predicting a protein’s conformation from its sequence alone has been notoriously difficult because of the many ways a chain of amino acids might arrange itself. And determining a protein structure by experiment is cumbersome. With less than 17% of the structures in the human proteome thus far confirmed, the new release of human proteins marks a huge advance, researchers say, and more is to come. AlphaFold aims to release 130 million protein structure predictions by the end of the year.
“Having all these structures available is pushing the field forward significantly,” says Alberto Perez, an integrative structural biologist at University of Florida who was not involved with the research. He has participated in CASP for over a decade. In addition to enabling insight into how these proteins function, Perez says, this opens the door for experimental biologists to validate the structures using techniques that alone might not have been enough, “so that’s going to be exciting to see.”
To make these vast data resources available, DeepMind has partnered with the European Molecular Biology Laboratory (EMBL). Sameer Velankar, a structural bioinformatician who leads EMBL’s Protein Data Bank in Europe, describes this moment as “a human genome movement for structural biology.” While many of AlphaFold’s structures are highly accurate, Velankar noted that they are still predictions, which must be experimentally confirmed. In addition, many of these proteins work bound to other proteins, nucleic acids, or small molecules, and the protein conformations associated with these pairings are not represented in this database. “It comes with a caveat,” Velankar says, “but it is still very, very significant.”
Each AlphaFold prediction is accompanied by an accuracy score, “so that’s really reassuring,” says Amy Diallo, a scientist who studies infectious disease at the University of California, San Francisco (UCSF). Even a low score can help researchers by providing an initial structure, which is one of the biggest bottlenecks in structural biology, Diallo says. This first draft can serve as starting point for experimental studies to confirm the structure.
Diallo was part of a UCSF group who recently used AlphaFold to better understand how one of the key proteins of SARS-COV-2 interacts with human proteins (bioRxiv 2021, DOI: 10.1101/2021.05.10.443524v1. She has also studied proteins involved in chlamydia and tuberculosis, but her research on chlamydia hit a roadblock without the structure of a key human protein that a chlamydia bacterial protein targeted. Now, with AlphaFold’s predicted structure, Diallo is taking the chlamydia research back up.
The newly available structures will help with understanding existing proteins. Since biologists use protein structures to gain insight into how proteins function, “these new deeplearning methods will have a big impact on protein design,” says David Baker of the University of Washington. His group recently released its own AI-based protein structure prediction software, RoseTTAFold. “It’s an exciting time—for sure—that’s definitely opening up a new era for biology,” Baker says.