If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Protein Folding

Meta AI releases models of over 600 million potential proteins

AI lab from tech company Meta joins the protein structure prediction game and creates models based on metagenomic data

by Laura Howes
November 3, 2022


This week, scientists at Meta, the firm behind Facebook and Instagram, released the structures of more than 600 million putative proteins in a database called the ESM Metagenomic Atlas. The structures are for proteins predicted to exist based on genetic data from large-scale metagenomic screens of soil, seawater, and other sources. The proteins themselves have yet to be isolated or identified using proteomic methods.

A ribbon depiction of the structure of PETase that came out of the ESMFold prediction algorithm.
Credit: Meta AI
The predicted structure of the plastic-degrading enzyme PETase, using the ESMFold algorithm. The ribbon is colored to show the confidence of the algorithm per amino acid location, with dark and light blue indicating higher confidence and orange and yellow indicating lower confidence.

The team describes the method used to perform this feat in a preprint (bioRxiv 2022, DOI: 10.1101/2022.07.20.500902), which has yet to undergo peer review.

In July, the Alphabet-owned company DeepMind announced that it had filled a database with predicted structures for almost all known proteins. That database holds around 200 million models made using AlphaFold, DeepMind’s algorithm for predicting protein structures. The Meta AI algorithm used to make the new protein models (ESMFold) is not as accurate as AlphaFold, but it is quicker, researchers say. The speed is a result of how the tool predicts protein structures using a language model trained on sequence data—the order of amino acids in the linear chain that make up a protein. The increased speed meant that the researchers could predict the 600 million structures in just 2 weeks, using a cluster of approximately 2,000 graphics processing units.

The Meta AI researchers have also published the code that they used to create the new database. They intend for other scientists to use the tool for their own research.

Pernilla Wittung-Stafshede, a protein folding expert at Chalmers University of Technology, says the new database “gives a really broad view of [the] protein universe on Earth.” But she cautions that structure prediction algorithms are just the beginning, with more work needed to tease out each protein’s function, which she says is the next challenge.



This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.