Advertisement

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

ENJOY UNLIMITED ACCES TO C&EN

Proteomics

Will single-molecule protein sequencing be the next generation of proteomics?

Start-ups have technical challenges to overcome before they can find their place in the market

by Laurel Oldach
August 13, 2024 | A version of this story appeared in Volume 102, Issue 25

 

Ribbon diagram of a protein structure.
Credit: Dr. Mark J. Winter/Science Source
The features that make proteins more biochemically versatile than nucleic acids also make them much more difficult to sequence.

Whether you’re interested in why a living system works as it does or the reasons it goes awry, proteins are a critical part of the molecular picture. The proteins circulating in a person’s plasma can reveal early warning signs of a variety of diseases—and the majority of drugs for those diseases act on protein targets. Beyond medicine, protein identification can inform ecology studies, detect food fraud, shed light on how natural products are made, and let archaeologists peer into the deep past.

These applications make proteins tantalizing analytes for high-throughput biology. Proteomic assays measure all of the proteins in a sample. A growing trend known as multiomics combines proteomic assays with genomic and transcriptomic studies, which assay DNA and RNA, respectively, to bring scientists insights that they couldn’t get otherwise. But multiomic studies are often limited by the throughput of current proteomic approaches.

Several companies are developing technologies to identify the sequence of amino acids in huge numbers of individual protein molecules from a sample. Those companies use a variety of different chemistries, but all aim to make single-molecule protein sequencing cheap, high scale, and user friendly. Who will be first in the scrum to achieve from-scratch single-protein sequencing—and who, if anyone, might win the market—remains to be determined.

A difficult problem

For large-scale studies of the proteome, researchers already have some options. They can turn to mass spectrometers to identify proteins based on their masses, or use the binding of affinity reagents to measure target proteins with a variety of readouts. Both approaches are delivering greater throughput and higher sensitivity than ever before, but each has its shortcomings. Mass spectrometry is very technically demanding; affinity-targeted assays can find only targets that researchers know to look for.

Neither technique can match what researchers are accustomed to in the world of DNA sequencing, where variations between genes or RNA transcripts can be pinpointed at the single-monomer level.

Protein sequencing, like we sequence genes today . . . doesn’t really exist.
Puneet Souda, analyst, Leerink Partners

Of course, proteins are encoded by DNA, and some protein changes are easily identified by looking at a gene or its transcripts. But protein sequencing has been a goal for years because of what is known as the proteoform problem. Different variants on the same protein, or proteoforms, can arise because of RNA splicing, posttranslational sequence changes like protein cleavage, or covalent modifications like phosphorylation. Such changes can dramatically affect a protein’s lifetime, location, and activity—all without appearing in the gene that encodes the protein. That’s why biochemists have hoped for decades to characterize thousands of protein sequences at a time.

Like DNA, proteins are linear polymers with a backbone that is repetitive and side chains that are not. Sequencing either type of molecule means determining the order of those side chains. But it is a truism in the field that proteins are exponentially more difficult to sequence than DNA or RNA.

For starters, proteins have a larger variety of monomers to identify: while geneticists have only four nucleotides to analyze, protein chemists must contend with 20 amino acids. Those amino acids are also chemically diverse and can be covalently modified in many ways. An amino acid’s nearest neighbors in a protein chain can affect its chemical behavior—which can alter the readouts of certain sequencing approaches.

And while researchers can create copies of DNA to amplify a signal, they cannot do the same with proteins. Cells generally contain many more copies of a protein than its corresponding RNA transcript, and proteins function in a wider range of concentrations; a eukaryotic cell might have hundreds of thousands of RNA molecules but tens of millions of proteins. For researchers aiming to understand a disease, or how a compound alters the levels of proteins in a cell, the lower-abundance proteins are often more interesting biologically than the common ones.

Because of these challenges, says Puneet Souda, a proteomics researcher turned investment analyst at Leerink Partners, “protein sequencing, like we sequence genes today . . . doesn’t really exist.” Instead, he says, “one has to find a middle ground” among existing technologies that can analyze single protein molecules in large numbers.

At the moment, companies rely on researchers collecting a little sequence information from a protein and using that to seek out a match within a reference proteome. That technique is better described as protein fingerprinting rather than sequencing. But fingerprinting still provides enough information to answer many questions about a proteome. In fact, some next-generation proteomics companies—notably including Nautilus Biotechnology—are developing affinity-based protein fingerprinting technologies as an end goal.

For companies in the field, says Talli Somekh, cofounder of start-up Erisyon, the key question is, “How can you bring the sensitivity, the fidelity, and the resolution of DNA sequencing to proteins?”

Proteins a-glow-glow

Since the 1950s, researchers have been able to use Edman sequencing to analyze purified proteins. The technique uses chemical reactions called Edman degradation to cyclize and remove only the terminal amino acid from an immobilized protein; the freed amino acid can then be analyzed with chromatography. But this approach requires a large, homogeneous protein sample, and it tends to become noisy after sequences of only a few dozen amino acids.

Somekh’s proposed solution is a technique inspired by Edman sequencing called fluorosequencing, which was developed in Edward Marcotte’s laboratory at the University of Texas at Austin. Although both techniques use the same chemistry to remove amino acids, when it comes to detecting those monomers, fluorosequencing flips the Edman script. “We don’t care about what’s liberated. We only look at what’s left,” Somekh says.

After labeling certain amino acids in immobilized peptides with fluorescent tags, Erisyon researchers use surface fluorescence microscopy to monitor cycles of Edman degradation (Nat. Biotechnol. 2018, DOI: 10.1038/nbt.4278). A reduction in fluorescence in one channel suggests that an amino acid labeled in that color must have been removed from the molecule.


Bright lights
Protein fingerprinting by fluorosequencing begins with labeling specific amino acids with fluorophores. Researchers then use Edman degradation to remove amino acids one by one from the end of an immobilized peptide, reducing the fluorescence signal in the process. Edman degradation works no matter what the amino acid side chain is—but using single-molecule imaging, researchers can determine which amino acids have been removed and in what order.
A figure with two panels. The first shows a series of peptide chains made of different shapes representing amino acids, some of which are labeled with other shapes representing two fluorophores. Each chain is one amino acid shorter than the one before. The second panel shows two horizontal line graphs corresponding to the fluorophore intensity. Each time a fluorophore is removed, the corresponding trace in the second panel drops in intensity.
Credit: Yang H. Ku/C&EN
Source: Trends Biochem. Sci. 2019, DOI: 10.1016/j.tibs.2019.09.005.

The number of amino acids that the process can label is limited because some side chains are more reactive than others. But one need not label every residue to identify a protein within a known proteome, Somekh says. “We refer to our approach as the peptide Wheel of Fortune. As long as you know the alphabet, the dictionary, and the category, the positional information of a few letters is enough to solve the puzzle.”

Erisyon is developing an instrument but has not announced plans to launch it. Only one company has launched a single-molecule protein sequencer so far: Quantum-Si.

Binders and clippers

Instead of using organic chemistry to remove or recognize amino acids, Quantum-Si’s system uses proteins for both those tasks. Proteases remove residues from one end of an immobilized peptide, and then six fluorescence-tagged binding proteins recognize the amino acids that now form the end of the chain.

It all takes place in one sequencing well, without sequential pipetting steps. Within a well, says CEO Jeff Hawkins, “you have a bit of a kinetic dance happening.” Each binder recognizes between one and three amino acids, but binds to each with different kinetics. A fluorescent binder might dwell on one amino acid for a few hundred milliseconds, and another for a few seconds, before diffusing away. Sooner or later a protease arrives and makes the next amino acid visible to other binders in the mix (Science 2022, DOI: 10.1126/science.abo7651).


Kinetic signatures
In Quantum-Si’s protein sequencing instrument, a group of binding proteins recognizes different terminal amino acids in a peptide. Each fluorescently labeled binder binds and releases 1 or several of the 20 possible amino acids, each with slightly different kinetics. In the resulting signal, a fluorophore’s characteristic fluorescence lifetime can identify the binder, and the rate at which the binder associates and dissociates from the peptide identifies the terminal amino acid. At irregular intervals, peptidases remove an amino acid, revealing the next one.
A multistep reaction scheme starts with a protein made of different amino acids. In each step, one of three binders connected to a fluorophore binds to the N-terminus and has a side reaction labeled “On-off binding.” A peptidase removes the amino acid that the binder bound to. Separately, graphs of fluorescence readouts show the different binding kinetics of the binders, as measured by fluorescence.
Credit: Yang H. Ku/C&EN/Shutterstock/Adapted from Science
Source: Science 2022, DOI: 10.1126/science.abo7651.

Quantum-Si has a semiconductor chip that measures binders’ fluorescence signals electronically instead of optically and that collects those signals frequently enough to measure differences in the duration of binding events.

According to Quantum-Si chief commercial officer Grace Johnston, the fact that each binder can interact with several substrates “allows us to scale the platform,” because, for example, the company does not need to develop a new binder for every posttranslational modification.

The company went public in 2021 and shipped its first instrument in 2023, making it the first of the protein sequencing companies to hit the market. Hawkins says that customers are currently using the instrument to analyze individual proteins and moderately complex mixtures, and for new types of experiment such as protein bar coding. But critics tell C&EN that the 2 million wells per run that Quantum-Si offers is nowhere near the scale that would be needed to look at a whole proteome in depth. In response to that criticism, Hawkins writes in an email to C&EN, “Quantum-Si is proud to be the only provider of next-generation protein sequencing (NGPS) technology currently available on the market. While others may be in the process of developing similar technologies, we are not aware of any publicly available data demonstrating a technology that is capable of probing the whole proteome in depth.”

Through the nanopore

Inspired by the success of nanopore devices sold by companies such as Pacific Biosciences of California (PacBio) and Oxford Nanopore Technologies (ONT) for DNA sequencing, several companies and academic laboratories are also now working on techniques to sequence proteins via nanopore.


Tight squeeze
In most approaches to nanopore protein sequencing, an unfolded protein is threaded into a protein nanopore, and the identity of its amino acids is determined by changes in the electrical current through the nanopore. Researchers are testing many pores and many strategies for shepherding proteins through the pore in an orderly way.
A protein analyte made of several shapes representing amino acids is shown threaded through a pore protein embedded in a membrane. A little Wi-Fi icon is labeled “Signal (current).” Below is a sequence of amino acids with the explanation “Signal processing determines amino acid sequence.”
Credit: Yang H. Ku/C&EN/Shutterstock

DNA sequencing purveyor ONT has an internal protein sequencing team and has funded many academic labs as well, including those of ONT founder Hagan Bayley, Jeff Nivala, and Stefan Howorka. At least half a dozen other companies in the field are working to develop other technologies for nanopore-based protein sequencing. (None has yet debuted a product.)

Nanopore sequencing depends on measuring current passing between two compartments bridged by a pore. The method distinguishes between similar molecules by measuring the way those molecules modify the current as they pass through. But while negatively charged nucleic acids can be coaxed into transiting a nanopore in single file when researchers place a positive electrode on the opposite side, proteins don’t carry a consistent negative charge—and they have 3D structures that can be rigid. So threading proteins through that small space is a challenge.

Researchers in academia and industry are working on various ways to solve that problem, such as using chaperones that unfold an analyte protein, enlisting motor proteins that crank it into or through a pore, or hitching proteins to DNA that can be used as a molecular tugboat (bioRxiv 2023, DOI: 10.1101/2023.10.19.563182). Others are pursuing an approach called chop and drop: fixing proteases near the mouth of a pore, where they cleave off single amino acids one at a time for analysis. Researchers in Shuo Huang’s laboratory at Nanjing University recently debuted a nanopore that can recognize all 20 amino acids and many covalent modifications by this method (Nat. Methods 2024, DOI: 10.1038/s41592-023-02021-8)

If a peptide enters the pore intact, the next hurdle is to decode the signal it generates. At any moment, a stretch of half a dozen amino acids may be contributing to the signal—and there is no guarantee that they will proceed through the pore at a constant pace. Nivala’s group has found that unfolding proteins and reading them repeatedly enables the team to smooth out such signals and match proteins to a reference proteome in a commercially available ONT DNA sequencing device (bioRxiv 2023, DOI: 10.1101/2023.10.19.563182). While this is still fingerprinting rather than full sequencing, ONT CEO Gordon Sanghera said at this year’s J.P. Morgan Healthcare Conference that full-length sequencing of whole proteins is the firm’s “ultimate vision,” according to GenomeWeb.

Reverse translation

Instead of borrowing measurement techniques from DNA, some researchers are aiming to measure proteins via DNA—but not using genomic methods. Instead, they want to build DNA barcodes that are directly informed by the protein molecules that are present.

San Diego–based biotech firm Encodia is working to convert protein sequences into DNA sequences through a reaction scheme known as reverse translation. The company, whose cofounders include Mark Chee, a cofounder of DNA sequencing firm Illumina, is currently coy about its chemistry. Company representatives will say that they use specific binders to recognize N-terminal amino acids—and that those binders also carry DNA barcodes that can be ligated together and later sequenced.


From protein to DNA
In one approach to reverse translation, a peptide’s terminal amino acid is given a DNA barcode unique to that peptide before being cleaved. Then the amino acid is recognized by an antibody with a DNA barcode of its own, which is added to the amino acid’s barcode. Later, DNA sequencing lets researchers match amino acid to peptide.
A cartoon shows a six-step process starting with a peptide made of different shapes representing amino acids. One end is connected to a DNA barcode. In the second step, a copy of the barcode is added to the other end of the peptide. Third, after cleavage, a single amino acid is connected to the DNA barcode. Fourth, an antibody connected to a DNA barcode is bound to the amino acid. Fifth, the amino acid’s DNA barcode has the antibody’s DNA barcode appended to it. Last is the message “Cleave and sequence DNA.”
Credit: Yang H. Ku/C&EN/Shutterstock
Source: bioRxiv 2024, DOI: 10.1101/2024.05.31.596913.

Advancing that ligation reaction at the same time as the Edman-related chemistry that removes each amino acid is a tricky proposition. “We are in a DNA and protein world at the same time,” says Encodia’s chief technical officer, Nigel Beard. “We’re trying to write a DNA barcode and aggressively chomp down the backbone of a peptide.”

Other groups have developed and described ways to accomplish these competing chemical goals. In a recent preprint, researchers in H. Tom Soh’s laboratory at Stanford University reported on a method for reverse translation of protein into DNA sequences. It relies on a DNA-friendly Edman degradation scheme that allows researchers to encode terminal amino acid identity in a DNA barcode (bioRxiv 2024, DOI: 10.1101/2024.05.31.596913).

Encodia has published no research on its own approach. But if its technology works, the payoff will be the ease of amplifying and reading out the resulting DNA molecule—which would make Encodia’s technology easy to adopt in the thousands of labs that already routinely sequence DNA.

“We can utilize the last 20 years that [the industry has] spent building this whole next-generation sequencing workflow,” Beard says. “We’re not building a reader.” Instead, he compares the company’s hypothetical future product to the sample-​preparation kits sold by 10x Genomics. 10x has become one of the biggest companies in sequencing not by selling hardware but by finding a way to turn single-cell information into DNA barcodes. Encoding protein information into DNA was also key to the business model of Olink Proteomics, an affinity-based proteomics company that Thermo Fisher Scientific recently purchased for billions.

Finding customers

In a 2021 report on the industry, investment analyst Souda described next-generation proteomics as a $75 billion market opportunity, split between research and diagnostics. Proteomics companies will succeed, he says today, if they can make their technologies accessible, easy to use, and reproducible—and scalable to a huge number of proteins.

We refer to our approach as the peptide Wheel of Fortune. As long as you know the alphabet, the dictionary, and the category, the positional information of a few letters is enough to solve the puzzle.
Talli Somekh, cofounder, Erisyon

“Everyone is looking for the dreaded killer app,” says Erisyon’s Somekh, adding that it is often a prospective investor’s first question. He and colleagues often search for analytical needs that current technologies don’t meet.

To meet such needs, Erisyon is prioritizing development of applications that involve finding and quantitating proteins at low concentrations in a sample, he says. Other founders envision their technologies differentiating between modified protein isoforms or fitting into multiomics studies run by biobanks, providing proteomes for hundreds of thousands of individuals to match the transcriptomes and genomes those initiatives also measure.

Whether there is room for all of these approaches to coexist in a diverse protein sequencing marketplace, or if one or more of them will nose ahead, is yet to be determined.

People in the field seem to agree that protein sequencing is posed for its big moment. But, says Zvi Kelman, whose laboratory group at the National Institute of Standards and Technology works in the field, “The question is who will win the day—and I don’t think we know yet.”

Advertisement

Article:

This article has been sent to the following recipient:

0 /1 FREE ARTICLES LEFT THIS MONTH Remaining
Chemistry matters. Join us to get the news you need.