Protein structures solved by cryo-electron microscopy (cryo-EM) or X-ray crystallography often miss the sugar molecules known as glycans that cover large swaths of the proteins’ surfaces. The recently reported structures of the spike protein on SARS-CoV-2, the novel coronavirus that causes COVID-19, are no different. Researchers are now working to fill in some of those sugary blanks to better understand the virus’s biology and to help drugmakers develop vaccines and treatments.
“People usually go for what they can analyze” and focus mostly on the protein components of a structure, says Andrea Thorn, a structural biologist at the Julius Maximilian University of Würzburg. She is a member of the Coronavirus Structural Task Force, a group of scientists who specialize in modeling and processing crystallographic and cryo-EM data. Because structural biology measurements don’t reveal much about glycans on proteins, “we often fail to recognize what an important part of the structure they are.”
The problem, says Elisa Fadda, a computational glycobiologist at Maynooth University, is that the sugars move around too much and can’t be captured by most structural biology techniques. “You can just see little bits like stumps of trees,” she says. The only time you can see them is when they interact strongly with the protein, which reduces their flexibility and movement.
X-ray crystallography and cryo-EM might not be able to see protein glycosylation—the addition of sugars to proteins—but mass spectrometry can. Multiple teams are now using mass spec to figure out what sugars decorate the SARS-CoV-2 spike protein and where they attach.
Those sugars can play multiple roles. They stabilize proteins and help them fold up properly. But perhaps they are most important for helping the virus evade the immune system.
“Viruses use glycosylation to hide their viral proteins,” says Max Crispin, a glycobiologist at the University of Southampton who is leading a team studying SARS-CoV-2 glycosylation. So studying these proteins’ glycosylation “is quite important in terms of understanding what the immune system sees during infection.”
Glycosylation can act as camouflage because the sugars on viral proteins come from the animal or person that has been infected. Viruses commandeer the enzymatic machinery that host cells use to add sugars to their own proteins and get those enzymes to attach glycans to viral proteins.
When a virus infects a human cell, the cell’s ribosome picks up viral RNA and translates it into proteins inside a part of the cell called the endoplasmic reticulum. Cellular enzymes start adding so-called high-mannose, or oligomannose, sugars to newly synthesized proteins as they exit the ribosome. Once synthesized, the proteins travel through the Golgi apparatus to be secreted into the cell. Along the way, those added glycans go through a maturation process in which other enzymes iteratively trim back high-mannose structures and decorate them with other types of sugars to yield complex, branched structures. The more processed the sugars are, the more they look like the host’s own sugars.
Targeting this glycan processing could be one way to fight viral infections because it would help expose an invading virus to the immune system. One company, Ansun Biopharma, is developing a drug that could undo some of the glycan processing by snipping off sialic acids, which are specific sugars that get added to viral proteins’ immune camouflage.
Meanwhile, scientists are trying to characterize where sugars get added to the SARS-CoV-2 spike protein. In general, sugars get added to proteins only at particular sites where there are signal sequences. In N-linked glycosylation, sugars attach to a protein at a nitrogen atom in asparagine. The signal for enzymes to install this type of glycosylation is in the protein’s amino acid sequence—a triad of residues starting with an asparagine, ending with a serine or threonine, and containing any other amino acid in between. In O-linked glycosylation, sugars attach to a protein through an oxygen atom on serine or threonine.
Sequence analysis of the SARS-CoV-2 spike protein shows that it has 22 possible N-linked glycosylation sites and 4 possible O-linked glycosylation sites. But the existence of a signal sequence doesn’t mean that there will be a glycan at all those sites. Mass spec analyses of the occupancy and composition of glycans on the SARS-CoV-2 spike protein have yielded different results, probably because of the variable nature of glycan synthesis and its dependence on the type of cell the proteins are made in. The conditions used to grow the cells can also affect the results.
All the mass spec data point to the SARS-CoV-2 spike protein’s being heavily glycosylated, Crispin says. “It’s covered in carbohydrates, but it’s slightly lower than in HIV,” he says. HIV is so densely glycosylated that the enzymes that process the sugars on its surface can’t easily reach them. SARS-CoV-2’s sparser glycosylation means that the sugars are more naturally processed than the ones in HIV. But it also suggests that the coronavirus’s glycan shield may not be as effective as that of HIV.
Three papers on the glycosylation of the coronavirus’s spike protein were recently posted on preprint servers and thus haven’t yet been peer reviewed. Crispin and his coworkers found that all 22 N-linked glycosylation sites are occupied most of the time (bioRxiv 2020, DOI: 10.1101/2020.03.26.010322). Another team, led by Hao Yang of Sichuan University, also found glycans at all 22 sites (bioRxiv 2020, DOI: 10.1101/2020.03.28.013276). In contrast, Parastoo Azadi and coworkers at the Complex Carbohydrate Research Center at the University of Georgia consistently found glycans at only 17 of the 22 sites (bioRxiv 2020, DOI: 10.1101/2020.04.01.020966). The identities of the glycans, especially the oligomannose content, varied in the three studies.
“All three reports are slightly different from each other,” Azadi says. “That shows the challenges in consistently producing a glycosylated spike protein for vaccine development due to the diversity in glycosylation depending upon the conditions under which the protein was produced.” To make vaccines, scientists use antigens, like the coronavirus’s spike protein, to train people’s immune systems against the virus.
One possible explanation for the differences in observed glycosylation is that the three studies looked at different versions of the spike protein. The spike protein contains three subunits, each of which has two sections. Crispin’s team looked at a version in which the subunits were all assembled together. The other two studies were performed on individual subunits. Protein structure can affect how the molecule is glycosylated, especially how the sugars get processed in the Golgi pathway of the host cell, so looking at all three subunits together or individually could explain why the studies arrived at different conclusions about glycosylation, Crispin says.
Whereas some of the studies didn’t report any O-linked glycosylation, Azadi and coworkers detected O-linked glycans at two sites. “N-glycosylation is so prevalent, so dominant in this protein that maybe the O-glycosylation wasn’t observed because it’s a minor component, or the cell line conditions used did not result in the presence of O-glycosylation,” Azadi says.
One of the occupied O-linked glycosylation sites that Azadi’s group found is near the protein’s binding domain, so it might play a role in how the spike protein binds to the human angiotensin-converting enzyme 2 (ACE2) receptor, the virus’s route into cells.
But Crispin suggests that such findings need to be interpreted with care. O-Glycosylation often occurs on “misfolded material,” such as subunits that haven’t been assembled into a full protein, he says. He points out that his group did see “tiny amounts” of O-glycosylation on its assembled spike protein, though.
Now that her team has developed a high-throughput method for analyzing SARS-CoV-2 glycosylation, Azadi plans to analyze proteins from viruses produced in different tissues and cell types. She wants to know how the glycosylation varies with tissue type and how such differences affect infection.
The heterogeneity of coronavirus proteins produced under different culture conditions will be of interest in developing treatments or vaccines. “We need to know the glycosylation of the recombinant materials we’re going to use for vaccines,” Azadi says. “If there are differences in glycosylation under different conditions, we need to know that before we produce a vaccine in large amounts.”
Meanwhile, computational biologists such as Chris Oostenbrink of the University of Natural Resources and Life Sciences, Vienna, are feeding the information about glycan location and composition into their models of the spike protein. “We looked at predictions of where glycosylation sites would be and decided to build complex glycans on all those sites,” Oostenbrink says. Now that the mass spec reports are showing him where high-mannose structures are more likely to be, he’s modifying his model to reflect that. He and his experimental collaborators are especially interested in figuring out how the glycosylation affects the binding between the spike protein and the ACE2 receptor.
Oostenbrink’s team is also preparing to simulate the dynamics of the spike protein. “Most experimental structures are static pictures. The proteins are never standing still,” he says. “By adding the dynamics to the whole picture, we will see what is possible.”
This story was originally published on April 17, 2020, and was revised for clarity and to emphasize to readers that the studies described within are published on preprint servers and therefore not yet peer reviewed, on April 22, 2020.