In biology, proteins don’t go it alone. They fold up in complexes and interact with each other to get stuff done. Those interactions create challenges for scientists working to predict protein structures from their amino acid sequences. It’s one thing to work out how a single string of amino acids folds up, and it’s another to figure out how a couple of folded-up proteins will fit together. Now, two teams of researchers have combined deep-learning algorithms to provide a clearer view of these complexities.
The first paper comes from a collaboration led by David Baker’s lab at the University of Washington and uses both the Baker lab’s RoseTTAFold algorithm and DeepMind’s AlphaFold software to directly predict the structures of protein complexes in a model eukaryote, the yeast Saccharomyces cerevisiae (Science 2021, DOI: 10.1126/science.abm4805).
In 2019, Baker’s group combined statistical analysis of bacterial genes and structural modeling to predict the structures of protein complexes in E. coli (Science 2019 DOI: 10.1126/science.aaw6718). But when they tried the same technique for proteins in more complicated lifeforms, they became stuck. There was too much data for the algorithm to handle. RoseTTAFold, a new tool developed by the lab this year, does a better job modeling protein complexes—and it does better still when combined with the AlphaFold algorithm.
To start, the researchers assembled a database of millions of potential protein pairs from the yeast proteome. Then they refined the data set using both RoseTTAFold and AlphaFold. This multistep approach yielded models of 912 assemblies, many of whose structures were previously unknown.
One complex illuminated by the team is the glycosylphosphatidylinositol transamidase enzyme. Mutations in this complex are related to diseases including neurodevelopmental disorders and cancer. While the full complex was too big to model, the team built up a simplified five-component version to help researchers understand how the different parts interact. Other insights involved complexes involved in chromosome segregation, DNA repair, and DNA transcription and translation, among other processes. These protein structures are available to download from a protein database called ModelArchive.
Although Baker’s team used yeast proteins as their starting point, many of the modeled proteins are conserved across eukaryotes, right up to humans in some cases. But the human proteome is much bigger than that of yeast. In a paper published this week on the preprint server BioRxiv before peer review (DOI: 10.1101/2021.11.08.467664), scientists describe using AlphaFold to model thousands of binary protein interactions from the human proteome. The research, led by Pedro Beltrao at the European Molecular Biology Laboratory’s European Bioinformatics Institute and Arne Elofsson at Stockholm University, begins to paint a picture of the complexities of human protein interactions. In some cases, the researchers found, multiple proteins interact with the same face of a key protein, suggesting the proteins must act separately. Other examples highlight more interactive large-protein-group activity. The team also explored how disease-causing mutations might change those interactions, suggesting potential disease mechanisms.
Tristan Croll, a structural biologist at the Cambridge Institute for Medical Research, says the move from predicting single protein structures to predicting interactions and complexes is very promising—and comes with big challenges. One tricky area, he says, is that evolution often creates new copies of a gene that may or may not retain the functions and interactions of the original, so it’s important that researchers doing these studies are careful to only pair up genes that do interact rather than ones that don’t. “I suspect there’s a lot of work still to be done to come to terms with that,” he adds.
Elofsson says he’s keeping this in mind as his group continues to scale up their research. Neither his group nor Baker’s can predict huge complexes yet, but Elofsson says his next step is to work towards complexes with more and more components. “It should be possible,” he says.
This story was updated on Nov. 19, 2021, to clarify that the protein structures are available from the database ModelArchive, which is separate from the Protein Data Bank.