If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Structural Biology

AlphaFold ‘pushes science forward’ by releasing structures of almost all human proteins

DeepMind’s AI predicted over 365,000 protein structures, which are now freely available online

by Emily Harwitz
July 29, 2021 | A version of this story appeared in Volume 99, Issue 28


A colorful 3D protein structure.
Credit: DeepMind
AlphaFold's predicted structure for Mediator of RNA polymerase II transcription subunit 23, a subunit of the human Mediator protein complex.

The UK company DeepMind and its research partners have released predicted protein structures for nearly every protein expressed in the human body—more than 20,000 of them—for free online. The new database—which also contains 3D structures of the proteomes of 20 other organisms—was made possible by AlphaFold, the artificial intelligence tool of DeepMind, which is a subsidiary of Google’s parent company, Alphabet. Along with the 365,000 total protein-structure predictions released with the new study, DeepMind’s research team made public the source code for AlphaFold (Nature 2021, DOI: 10.1038/d41586-021-02025-4).

AlphaFold shocked the science world last winter when it swept the Critical Assessment of protein Structure Prediction, or CASP, contest by accurately predicting the structures of two-thirds of their own entries. Predicting a protein’s conformation from its sequence alone has been notoriously difficult because of the many ways a chain of amino acids might arrange itself. And determining a protein structure by experiment is cumbersome. With less than 17% of the structures in the human proteome thus far confirmed, the new release of human proteins marks a huge advance, researchers say, and more is to come. AlphaFold aims to release 130 million protein structure predictions by the end of the year.

“Having all these structures available is pushing the field forward significantly,” says Alberto Perez, an integrative structural biologist at University of Florida who was not involved with the research. He has participated in CASP for over a decade. In addition to enabling insight into how these proteins function, Perez says, this opens the door for experimental biologists to validate the structures using techniques that alone might not have been enough, “so that’s going to be exciting to see.”

To make these vast data resources available, DeepMind has partnered with the European Molecular Biology Laboratory (EMBL). Sameer Velankar, a structural bioinformatician who leads EMBL’s Protein Data Bank in Europe, describes this moment as “a human genome movement for structural biology.” While many of AlphaFold’s structures are highly accurate, Velankar noted that they are still predictions, which must be experimentally confirmed. In addition, many of these proteins work bound to other proteins, nucleic acids, or small molecules, and the protein conformations associated with these pairings are not represented in this database. “It comes with a caveat,” Velankar says, “but it is still very, very significant.”

Each AlphaFold prediction is accompanied by an accuracy score, “so that’s really reassuring,” says Amy Diallo, a scientist who studies infectious disease at the University of California, San Francisco (UCSF). Even a low score can help researchers by providing an initial structure, which is one of the biggest bottlenecks in structural biology, Diallo says. This first draft can serve as starting point for experimental studies to confirm the structure.

Diallo was part of a UCSF group who recently used AlphaFold to better understand how one of the key proteins of SARS-COV-2 interacts with human proteins (bioRxiv 2021, DOI: 10.1101/2021.05.10.443524v1. She has also studied proteins involved in chlamydia and tuberculosis, but her research on chlamydia hit a roadblock without the structure of a key human protein that a chlamydia bacterial protein targeted. Now, with AlphaFold’s predicted structure, Diallo is taking the chlamydia research back up.

The newly available structures will help with understanding existing proteins. Since biologists use protein structures to gain insight into how proteins function, “these new deep­learning methods will have a big impact on protein design,” says David Baker of the University of Washington. His group recently released its own AI-based protein structure prediction software, RoseTTAFold. “It’s an exciting time—for sure—that’s definitely opening up a new era for biology,” Baker says.


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.