If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Protein Folding

DeepMind releases structure predictions for nearly every known protein

Database powered by AlphaFold algorithm now boasts predicted structures for over 200 million proteins

by Laura Howes
August 5, 2022 | A version of this story appeared in Volume 100, Issue 27


Copies of the predicted structure of the human protein disco-interacting protein 2 homolog B.
Credit: Karen Arnott/EMBL-EBI
The human protein disco-interacting protein 2 homolog B is just one of the 200 million proteins with a predicted structure in the AlphaFold database.

The Alphabet-owned company DeepMind has massively increased the number of predicted protein structures available to researchers through a free database. The release, on July 28, boosted the number of models predicted with DeepMind’s machine learning algorithm AlphaFold to over 200 million individual proteins from over 10 million species.

“From fighting disease to tackling plastic pollution, AlphaFold has already enabled incredible impact on some of our biggest global challenges. Our hope is that this expanded database will aid countless more scientists in their important work and open up completely new avenues of scientific discovery,” DeepMind CEO Demis Hassabis says in a statement.

AlphaFold has captured researchers’ attention in recent years for dramatically advancing their ability to predict the structures of proteins on the basis of their amino acid sequences. Since the algorithm won an international competition in 2020, many researchers have used it to help them understand the shapes that proteins form. In June 2021, DeepMind launched the database in collaboration with scientists at the European Molecular Research Laboratory’s European Bioinformatics Institute. At the time, the database contained the predicted structures of around 350,000 proteins, including almost all known human proteins.

Now almost all known proteins have a structure for researchers to study, although exceptions exist. Very short or long proteins and those with unnatural amino acids aren’t yet included, and neither are viral proteins. The team says it will continue to add predicted structures of new proteins as they are found.



This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.