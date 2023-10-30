Credit: Krishna et al/bioRxiv

Proteins are important molecules—but they aren’t the only molecules in a cell, and they don’t operate alone. In a new preprint, the team behind the protein structure prediction software RoseTTAFold has announced a tool that expands the types of chemistry that protein designers using deep learning will be able to incorporate, to reflect proteins’ environment better (bioRxiv 2023, DOI: 10.1101/2023.10.09.561603).

Credit: Protein Data Bank

Protein structure prediction algorithms, such as AlphaFold and RoseTTAFold, have swept through the field of structural biology in recent years. These machine learning tools, trained on protein structures that have been solved experimentally, predict new 3D structures based only on proteins’ amino acid sequences. Biochemists use those predictions to develop hypotheses about how proteins work and how they fit together, and they have also used the tools to design new proteins with desired functions.

The trouble is, these models overlook many types of chemistry that can influence a protein’s structure. “A lot of biology involves, for example, proteins interacting with small molecules,” says senior author David Baker, a University of Washington professor whose lab developed RoseTTAFold. The latest update to the model, RoseTTAFold All-Atom, can handle the more diverse chemistry that occurs when proteins bind small molecules or undergo covalent modifications that can dramatically influence their structure and function.

“It’s kind of a hypothesis that we wanted to test: Is it possible, even, to train a model that can represent all these different types of molecules?” says Rohith Krishna, a graduate student and first author of the paper. After about 2 years of tinkering, he and his colleagues found that they could, by combining two different modeling approaches for molecules. Like previous RoseTTAFold iterations, the new network represents polymer subunits, like amino acids or nucleobases, as single units, but it also represents every atom as a unit in each covalent modification and small-molecule binding partner. It’s hard to say exactly how it works—machine learning models are notoriously opaque—but the researchers think that it arranges all the units present until it reaches a plausible structure.

While prior updates to RoseTTAFold focused on specific problems, like predicting how proteins and nucleic acids interact, the team trained this one more broadly. “In principle, a network that’s been trained on more diverse sets of data should be able to generalize better,” Baker says, adding that the researchers plan for this network to supersede task-specific versions of RoseTTAFold.

The team used RoseTTAFold All-Atom to generate more-accurate predictions of proteins, such as enzymes bound to both their substrates and cofactors, and proteins with numerous covalent modifications. They also designed from scratch several proteins that bind to small molecules, using only the small molecules as inputs. Though he hasn’t tried more complex arrangements yet, Krishna predicts that the model may be able to help design proteins that undergo complicated interactions—for example with both a nucleic acid and a small molecule cofactor.