In the past few years, huge advances have been made in predicting protein structures using artificial intelligence. But the reverse problem—taking a protein shape and then predicting how to build it from a sequence of amino acids—has proved trickier. A series of three papers by biologists at the University of Washington School of Medicine now shows that a new machine learning algorithm can design protein molecules faster and more accurately than before. Designed proteins could help build new vaccines, drugs, and sustainable biomaterials.
At their simplest, proteins are chains of amino acids strung together with peptide bonds. The interplay of the various side chains along the length of the polymer—between each other and the surrounding environment—causes the floppy chains to twist and curl into different 3D shapes. But the forms found in nature are just a fraction of what UW’s David Baker thinks are possible. Baker has founded several companies to take designed proteins in different useful directions; the firms include Monod Bio, a protein-based diagnostics spin-off, and Vilya, for designing therapeutics.
The new algorithms created by Baker’s lab offer what postdoc Basile I. M. Wicky calls a “one-pot approach,” which can help researchers design a useful protein shape and then predict the amino acid sequence that will make it.
The first paper, published in July, describes a new tool that can produce protein designs in one of two ways. In the first, the AI can create a design by iteratively improving on simple prompts, such as needing a particular type of fold or binding motif. The AI is “trying to dream up a structure,” as postdoc Jue Wang puts it. The alternative approach involves taking parts of an existing structure and then asking the AI to fill in the gaps (Science 2022, DOI: 10.1126/science.abn2100).
In two new follow-up papers just published, researchers from the same lab demonstrate how another algorithm, called ProteinMPNN, can start from designed 3D shapes and assemblies of multiple protein subunits and determine in about 1 sec the protein sequences needed to make them efficiently. The team tested and refined the predicted sequences by running them through protein structure prediction algorithms and by synthesizing the proteins in the lab. The researchers then verified the protein structures using X-ray crystallography and measured the shapes the proteins combined to make using cryo-electron microscopy (Science 2022, DOI: 10.1126/science.add2187; 10.1126/science.add1964).
“The protein design field is undergoing a tremendous revolution,” says Christian Dallago, a computational biologist at NVIDIA who was not involved in the work. “Ultimately, through these new tools, we can become less reliant on time-consuming, classic approaches.” He says the hope is that such tools will reduce the distance between generating a hypothesis and providing solutions.
Researchers in Baker’s lab say that ProteinMPNN has become their go-to algorithm. They are now working to improve the tools and create proteins for a variety of uses.