Clumps of protein fibrils known as amyloids play a role in a number of diseases, most famously Alzheimer’s and Parkinson’s. These fibrils are rich in β-sheets, a type of flat secondary protein structure. Being able to predict which proteins are likely to form amyloids could provide a way to discover drugs that prevent potential damage.
Scientists have now moved closer to being able to predict what portions of proteins have the ability to form amyloids and which of those are likely to actually carry through to disease. Two independent research teams have developed and validated algorithms for predicting amyloid formation.
Over the past decade, scientists have discovered that only a small part of a protein—as short as a six- or seven-amino-acid-residue stretch—is involved in amyloid formation. “For us, the question became ‘What bit of the protein is forming the spine?’ ” says David Eisenberg, an amyloid researcher at the University of California, Los Angeles.
Previous sequence-based algorithms for predicting amyloid formation suffered from shortcomings that prevented scientists from distinguishing amyloids from other types of aggregates. “Computational methods for making predictions from primary sequence information have been based either on biophysical properties of the amino acids or on a learning set of known amyloid sequences,” says Ronald Wetzel, an amyloid expert at the University of Pittsburgh. “The new methods incorporate knowledge of the discrete packing interactions observed in crystal structures of amyloidogenic peptides.”
The first comes from Eisenberg’s group, whose algorithm incorporates three-dimensional structural information using the structure of a fibril-forming hexapeptide as a template (Proc. Natl. Acad. Sci. USA, DOI: 10.1073/pnas.0915166107). The algorithm scans a protein sequence six residues at a time and maps the side chains of the amino acids onto the template peptide. It calculates the energy of fibril formation for each hexapeptide. Any sequence with energy lower than −23 kcal/mol is considered to have a high propensity to form fibrils.
All the sequences that form fibrils have a common characteristic: Their side chains jut out in such a way that when two sheets of this sequence face one another, they can hydrogen bond to form a “steric zipper.”
Confident that the algorithm actually works, Eisenberg’s team used it to scan the entire genome of several organisms, as well as a collection of proteins with known structures, looking for sequences able to form amyloids. They found that such sequences are ubiquitous. Although only about 15% of all possible hexapeptide sequences are capable of forming amyloids, one or more of those sequences are found in nearly all proteins.
But if such sequences are found throughout the genome, why don’t more proteins form amyloids?
“We believe that to form a fiber, it’s necessary to have one of these segments in a protein, but that’s not sufficient,” Eisenberg says. For example, ribonuclease A, a protein that doesn’t form fibrils, nevertheless contains several segments that the algorithm predicts should form them. Indeed, those sequences do form fibrils when they are not in the protein.
For such a segment to drive its parent protein to form an amyloid, “it has to be exposed and have enough flexibility that it can enter a steric zipper,” Eisenberg says.
Eisenberg’s team tested this hypothesis by finding a way to make even the usually recalcitrant ribonuclease A form fibrils by putting a fibril-forming segment in an exposed loop. “In fibril formation, just like in retailing, it’s location, location, location,” Eisenberg quips. If a segment has enough freedom, it will form a fibril. If it’s constrained, it won’t, he explains.
Eisenberg proposes that most proteins protect themselves against fibril formation by burying these high-propensity sequences in the middle of the protein. “Proteins have evolved to be self-chaperoning by constraining these segments so they can’t stack on one another,” he says. He and his group plan to use the algorithm to identify problem segments and then to find molecules that might block or reverse fibril formation.
Another new algorithm corroborates the ubiquity of amyloid-forming sequences. Joost Schymkowitz and Frederic Rousseau of the Free University of Brussels and coworkers developed the new algorithm, called Waltz, to specifically identify proteins capable of forming fibrils. An earlier algorithm—Tango—found aggregating sequences regardless of morphology.
With Tango, “all we’re doing is looking at those parts of the sequence that are more hydrophobic, have a high β-sheet propensity, and have a lower net charge. You’re not putting in specific requirements for any structure except that it’s got to be a hydrophobic β-sheet,” Schymkowitz says. As a result, Tango turned up unwanted amorphous aggregates, not just those with the characteristic cross-β-sheet structure of amyloids that the team was seeking.
So they set out to develop a predictive algorithm that is specific for amyloids (Nat. Methods, DOI: 10.1038/nmeth.1432). The researchers built a “sequence mask” using all known amyloid-forming peptides and then employed the mask to scan disease-causing and non-disease-causing proteins. They found approximately 50 new amyloid-forming peptides, which they incorporated into a new iteration of the mask that also includes physicochemical properties. They validated the new mask by predicting new amyloids in proteins not known to form such structures and then verifying amyloid formation experimentally. In this validation step, the research team found another 60 amyloid-forming peptides.
Schymkowitz is particularly interested in identifying what are becoming known as “functional” amyloids. “It’s become clear that amyloids have some utility,” Schymkowitz says. For example, some bacteria use amyloid structures to modulate their environment in the form of biofilms. Many of these functional amyloids remain to be found, Schymkowitz believes.
Schymkowitz would like to design prediction algorithms that can distinguish between functional and disease-causing amyloids. “At the moment, we’ve basically thrown all the amyloid sequences on one big pile,” he says. “I think functional amyloids are different. Can we find more examples of these and then refine predictions toward them?”
Such abilities could be used to design new materials, Schymkowitz says. “With Waltz, you can design sequences that have particular properties in terms of solubility that are now also amyloidogenic,” Schymkowitz says. “You can use these to build new nanowires and new materials.”
The new work represents “a big step forward in our desire to be able to predict amyloidogenicity from primary sequence information,” Wetzel says. But sometimes rates of fibril formation appear to be driven by sequence elements that are not part of the final amyloid core, he notes. “There is a lot more to do,” he adds, “before we have the ability to accurately predict amyloidogenic proteins from examination of sequence databases.”