In the mid-1990s, as the Human Genome Project was in full swing, scientists started thinking about the protein complement of the genome, and proteomics—the identification and characterization of all of an organism’s proteins—was born. Early proteomics methods used enzymes to digest proteins into pieces that could be easily analyzed by mass spectrometry. Those methods are now mature and routinely detect peptides from thousands of proteins in a single run.
But the great strength of those methods is also their greatest weakness. What’s being analyzed is no longer the actual biological actors but the pieces left after they’ve been broken apart. Biologists and chemists are deprived of crucial information such as the masses of intact proteins and the locations of behavior-controlling modifications, such as the addition of methyl groups, sugars, or phosphate, that occur after a protein leaves the ribosome that created it.
So-called top-down mass spectrometry and proteomics gives that information back to scientists. The name “top-down” comes from the fact that the analysis starts “from the top” using the intact protein. The approach skips the digestion step and instead puts intact proteins directly into the mass spectrometer, where the instrument breaks them down into smaller fragments.
By starting with intact proteins, rather than their pieces, top-down analysis more accurately reflects the structure and properties of actual biological systems than does bottom-up proteomics. For example, knowing the locations of multiple posttranslational modifications makes it possible to study the influence of those groups on protein function.
Top-down mass spectrometry was first proposed in the 1990s by chemistry professor Fred W. McLafferty and his team at Cornell University. For applications focused on individual proteins and complexes, such as in structural biology, top-down methods have become well established since then.
But the technology is only now reaching a point where it’s possible to start talking about full-fledged top-down analysis of all of an organism’s proteins in their various forms—true top-down proteomics. Researchers last year banded together to form the Consortium for Top Down Proteomics to continue to improve instruments and methods. And there is even talk of a top-down counterpart to the Human Proteome Project, an existing effort to catalog all human proteins.
Recent work has put top-down proteomics in striking range of more widespread use. In 2011, chemistry professor Neil L. Kelleher of Northwestern University and coworkers improved on previous top-down studies by about 20-fold when they identified more than 1,000 human gene products that were posttranslationally modified in different ways, resulting in a total of more than 3,000 protein species (Nature 2011, DOI: 10.1038/nature10575). To detect that many protein species, they needed to fractionate and separate the proteins using four types of separations in series. Mass spectrometrist Ljiljana Paša-Tolić of Pacific Northwest National Laboratory and coworkers also recently showed that they can detect more than 1,000 bacterial protein species in a single run.
Breaking that 1,000-protein barrier was a milestone that gives other researchers hope that further improvements aren’t far behind. “We just have one more order of magnitude to make up, and then we’re in the ballpark with the bottom-up people,” says Jeffrey N. Agar, a chemistry professor currently in transition from Brandeis University to Northeastern University. “There’s no doubt that in terms of metrics—how many proteins you can identify and how many you can quantify—we’re behind the bottom-up people. But once that gap isn’t there, there’s no good argument against top-down.”
The mission of the Consortium for Top Down Proteomics is “to promote innovative research, collaboration, and education accelerating the comprehensive analysis of intact proteins,” according to its website. Its membership consists of an organizing committee and researchers who are willing to devote more than 5% of their labs’ resources to the study of intact proteins by mass spectrometry.
One of the first tasks the consortium undertook when it was first established was standardizing some of the language and data-sharing methods for top-down proteomics. Some practices—such as the transfer of mass spectra between researchers—could be appropriated directly from bottom-up proteomics. Others, the consortium had to figure out for itself.
For example, through various cellular processes a single gene can code for many different protein species, but there was no term to describe such species. Earlier this year, Kelleher; chemistry professor Lloyd M. Smith of the University of Wisconsin, Madison; and their fellow consortium members proposed that each of these species be called a proteoform (Nat. Methods 2013, DOI:10.1038/nmeth.2369). They defined proteoforms as “the different molecular forms in which the protein product of a single gene can be found, including changes due to genetic variations, alternatively spliced RNA transcripts, and posttranslational modifications.”
The consortium is now undertaking pilot projects to assess the current state of the field and establish mechanisms for working together. The first project involves a sample that contains just endogenous core histones, primarily human histone H4. Histones are the proteins that serve as spools for DNA in the genetic packing material in cells. Histone H4, the simplest histone, is known to have more than 70 proteoforms.
Each of nine participating labs was asked to analyze the sample by top-down mass spectrometry and report all the proteoforms they found. Participants were allowed to use whatever top-down methods they wanted.
The data is still being analyzed. “The purpose is to see how much different laboratories agree on what proteoforms are present,” says Nicolas L. Young, a researcher at the National High Magnetic Field Laboratory in Tallahassee, Fla., who is coordinating the project. In work completed so far, “we agree on the most abundant proteoforms for histone H4.” But participants agree less when they dig past those most abundant proteoforms. The results suggest that some labs and experimental strategies are better able than others to find less abundant species. The consortium hopes to publish the results of the study later this summer.
“Top-down proteomics is demanding,” says Joseph A. Loo, a consortium member and professor of chemistry and biochemistry at the University of California, Los Angeles, whose lab did not participate in the first consortium project. “Very few labs can do it well. A lot of labs can do it poorly.”
“It’s fair to say that there’s a really large gap, much larger than for bottom-up, between what some groups can do and others can’t,” Agar says. His lab participated in the project, and he analyzed the samples himself. “I couldn’t believe how complicated they were,” he says.
Whereas the first project is looking to see whether labs can identify all the proteoforms of particular proteins, the next project, which is still in the planning stages, will look at how comprehensively labs can analyze an entire proteome. The sample will be a bacterium with a small genome.
One of the challenges facing the consortium is that few of its members are funded to do such work. That means they either have to work on their own time or raid funding from other projects.
“It seemed like many more laboratories would be willing to volunteer to try some of these test samples, but they couldn’t justify devoting a large chunk of someone’s time in their laboratory to do it because of funding issues,” Loo says. Nevertheless, Loo hopes he will be able to arrange funding to enable his group to participate in the bacterial proteome pilot study.
The consortium’s activity may eventually lead to a more ambitious project that Kelleher has proposed. At a September 2012 meeting of the Human Proteome Organisation (HUPO) and in a paper published online the same week, he outlined his vision: using the top-down approach to catalog multiple forms of every protein in every human cell type. (J. Am. Soc. Mass Spectrom. 2012, DOI: 10.1007/s13361-012-0469-9).
The goal of HUPO’s current Human Proteome Project is to identify and characterize at least one protein for each of the 20,000-plus human genes. The project is divided into chromosome-specific and disease-driven arms. Although the project is articulated from a bottom-up perspective, the participants recognize the need for top-down measurements.
William S. Hancock, a chemistry professor at Northeastern University and one of the leaders of the Human Proteome Project, uses the breast-cancer-related protein ErbB2 as an example to explain why top-down is important. That protein has 15 known alternative splice forms, only six of which have been characterized at the proteomics level. “It becomes very important to be able to do top-down,” Hancock says, because the different forms can be difficult to distinguish using peptides alone.
And Kelleher argues that it’s not enough to use the top-down approach only as a supplement to bottom-up techniques. Instead, the great majority of the effort should be top-down, he says. “If we agree that proteoforms are the actors in biology … cataloging them should be a proteome project,” he says. The bottom-up approach misses many proteoforms. Plus, because bottom-up proteomics infers protein identification from component peptides, a bottom-up project is “an indirect catalog of protein molecules in the human body,” Kelleher says.
Kelleher thinks that his top-down effort, just like the Human Genome Project, should include a substantial technology development effort done in parallel with the proteoform analysis. He expects that such a catalog could be completed by 2030.
As a pilot project, the approximately 300 cell types in human blood would be a good place to start, Kelleher says. Those cells are easily accessible. Plus, cell surface markers are already known for many of those cell types, making them easy to sort. A focus on blood cells would be useful for understanding cancers of the blood, Kelleher says.
However, funding could be a sticking point. “I don’t think it’s an easy sell,” says John R. Yates III, a proteomics expert at Scripps Research Institute, La Jolla, Calif. “I also don’t think the Human Proteome Project as it’s currently put together is an easy sell to funders.” His main criticism of the more detailed human cell-types proposal is that a catalog of proteoforms without information about their function adds little value beyond what’s already known.
Despite his criticism, Yates hopes that a top-down project that includes proteoform function will eventually happen. But “it’s got to be formulated in the right way,” he says. Kelleher adds that tying the project to cancers of the blood gives the effort the functional aspect it will need to transcend critiques like Yates’s.
Advances that have made it possible to think about the feasibility of large-scale top-down proteomics projects have been made in the mass analyzers themselves, the separations methods on the front end and the data analysis on the back end. But despite much progress, further improvements are still needed.
From its early days, top-down mass spectrometry has been the domain of the large-magnet-based instruments, particularly Fourier transform-ion cyclotron resonance, or FT-ICR, mass spectrometers. Those were the only instruments with sufficient resolving power to sort out the complex spectra of intact proteins and fragment ions generated during top-down analyses. But those instruments were—and still are—expensive and difficult to use.
Other mass analyzers, such as the Orbitrap and quadrupole time-of-flight instruments, have improved to the point that they are becoming contenders in top-down mass spectrometry and proteomics.
“The main problem with top-down proteomics is that if it requires a $2 million mass spectrometer with a 15-tesla magnet, it’s not going to be widely used,” Yates says. “To democratize it, we need mass spectrometers that ordinary people can afford” while still getting the kind of performance they would achieve with a 15-tesla FT instrument.
“High-performance top-down proteomics needs fast, high-resolution mass analyzers,” says Julian P. Whitelegge, head of the proteomics division of the Pasarow Mass Spectrometry Laboratory at UCLA. “The future of top-down lies with the Orbitrap mass analyzer. So few people are going to use large-magnet instruments that top-down will never be widespread if we rely on those.”
Whitelegge nonetheless also points to new time-of-flight instruments as important players. “You’re starting to see resolution in these instruments that will enable you to do top-down experiments with proteins up to about 20 kilodaltons,” he says. “That covers a lot of the proteome. There’s no reason newcomers to the field can’t get a less expensive instrument that’s relatively easy to use and start doing top-down experiments on smaller proteins.”
Another big need for improvements in top-down proteomics is in protein separations by high-performance liquid chromatography (HPLC). The greater ease of separating peptides relative to proteins is one of the reasons for the continued popularity of conventional bottom-up proteomics relative to top-down proteomics.
One of the reasons that bottom-up works so well is that when you digest a protein, you create a collection of peptides that can be separated effectively by standard ion-exchange or reversed-phase chromatography, Yates says. That same level of separability may not be available at the protein level.
“Peptides are a lot easier to measure” than proteins, Loo says. “They’re a lot easier to handle by HPLC. They’re easier to separate. They’re easier to ionize. They’re easier to sequence. Everything about peptides is a lot easier than big proteins.” In trying to develop improved HPLC and top-down mass spectrometry approaches, he and others are “trying to make proteins separate and fly as well as peptides,” Loo says.
Agar agrees. “Separation science for proteins is just not there,” he says. “With multidimensional chromatography, you can detect 100,000 peptides or more. You can’t do that for proteins yet,” he says.
Also in need of further development is data-handling software, according to several researchers C&EN interviewed. The bottom-up community has the Sequest and Mascot programs, robust pieces of software that work across mass spectrometry platforms, for querying peptide databases. However, “for top-down, the software isn’t ready for routine identifications,” Agar says. The three programs available for top-down proteomics are ProSightPC, MS-Align+, and Big Mascot.
The complexity of many proteins is what makes complete software support, including proteoform quantification, so challenging. Many proteins have multiple possible variants. Peptide-based bottom-up proteomics can easily identify one modification, but it breaks down when two or more modifications work together on a single protein.
Some people hope that in the future proteoform analysis will be simplified to measuring the mass of intact proteins without needing to do full-fledged tandem mass spectrometry.
The intact mass will be so powerful because proteins tend to have molecular masses separated by about a dalton. With an accurate mass measurement, “mass alone can often tell you what a protein is,” Agar says. “But if mass alone isn’t enough, then mass and retention time are.” That means, Agar says, that as proteoform databases are populated, analysis times for top-down measurements will only get faster. Currently researchers have to grind through tandem mass spectrometry analyses to carry out top-down proteomics studies, “but someday we won’t have to do that,” he hopes.
Many researchers also expect that in the future proteomics analyses won’t need to be either-or propositions. “It will be a natural strategy to do both top-down and bottom-up,” Loo says. “We might hope that top-down could replace bottom-up, but why should we neglect bottom-up when the technology is already mature? Maybe in the future, it will be more of a 50-50 situation.”