If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.


Big Data

AlphaFold-derived AI predicts genetic mutations’ impact

Google research team uses protein sequences and structures to classify 71 million possible missense variants as harmful or benign

by Laurel Oldach
September 19, 2023


Genetic data is more readily available than ever before—but interpreting it can be a challenge. Without months of work in the laboratory, it can be tough to tell what one of the millions of single–amino acid variants observed in humans does to protein function, let alone whether it might have a role in disease.

A photo of a printed-out DNA sequence alignment showing variation at one point in the sequence.
Credit: Shutterstock
The new tool focuses on single–amino acid changes in proteins, which can be caused by changing one DNA base in a genome.

Researchers in the fast-moving field of protein machine learning are trying to speed up variant interpretation. This week, Deep Mind, the Google team that developed the popular AlphaFold2 protein structure prediction engine, published a new artificial intelligence model that predicts whether protein-altering gene variants will have a benign or harmful effect (Science 2023, DOI: 10.1126/science.adg7492).

The model, dubbed AlphaMissense, works by combining structural predictions from AlphaFold2 with a technique called protein language modeling, which involves training an algorithm on an enormous number of amino acid sequences and using it to make statistical inferences about other sequences. It outputs a predicted pathogenicity score between 0 and 1 for each possible amino acid substitution at each point in a protein.

The team ran the model on the whole human proteome, and posted a publicly accessible database of some 71 million single amino acid substitutions. Proteome-wide, the model predicted that harmful variants would be concentrated in structured regions and at residues like cysteine that play an outsized role in maintaining structure.

What the model does not do, the Deep Mind researchers emphasized at a press conference, is predict how mutations change that structure, protein stability, or interaction with binding partners—a task recognized as a major challenge in the field. “The model predicts pathogenicity in the abstract,” says senior author Žiga Avsec. “But it doesn’t tell us the biophysical nature of what this mutation does.”

David Taylor, a structural biologist at the University of Texas at Austin, called the work “a game changer” in combining structures with protein language modeling and making the results available to all. The model doesn’t say why mutations might be harmful, but he says that biochemists might use it to help identify regions important to protein function for further study.

According to geneticists working in variant prediction, however, this is just the latest in a fast-moving field. Researchers at Illumina and a number of universities published a similar algorithm called PrimateAI-3D in June (Science , 2023 DOI: 10.1126/science.abn8197). Like AlphaMissense, the Illumina model combines a protein language model with structure data. “I’m very surprised that AlphaMissense was published in Science after PrimateAI-3D,” says Vasilis Ntranos, senior author of yet a third recent algorithm, which classified variants based strictly on a protein language model (Nat. Genet. 2023, DOI: 10.1038/s41588-023-01465-0).

Experts also stress that it is difficult to assess these algorithms’ accuracy. According to Michael Sternberg, a bioinformaticcs expert at Imperial College London, benchmarking tests suggest that AlphaMissense and its competitors appear accurate enough for now to help researchers prioritize studies at the bench but not trustworthy enough to conclusively link a gene variant to a disease.


This story was updated on Sept. 28, 2023, to correct the name of a quoted Deep Mind researcher. His name is Žiga Avsec, not Ziga Avosec.


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.