For many researchers, the rhythm of the workweek has changed amid the COVID-19 pandemic. When you work from home, the days can merge, but scientists can build routines around regular events. Structural biologist Andrea Thorn and the Coronavirus Structural Task Force have built their weeks around Wednesdays because that’s the day when the Protein Data Bank (PDB) publicly releases new protein structures.
Every week since March 2020, Thorn and the task force have checked the PDB for new protein structures relating to SARS-CoV-2, the virus that causes COVID-19. There are always some. Since the pandemic started, researchers working around the globe have deposited over 1,000 virus-related protein structures in the PDB. The task force works as a cleanup crew, checking those structural models and, if needed, improving them. It aims to ensure the data are sufficiently accurate for researchers modeling coronavirus proteins to look for weaknesses that therapeutics could target.
In many ways, the COVID-19 pandemic has underscored the importance of structural biology. Scientists designing potential antibody therapies and antiviral drugs rely on structural models of the virus’s proteins. But models of protein structures are just that: models based on experimental data. Errors in such models can easily have knock-on effects, hampering drug design and biological insight. It is crucial to extract the best possible model from any given data set.
Structure-based drug design is a “garbage-in, garbage-out operation,” says Robert Abel, chief computational scientist at Schrödinger. So to design drugs, researchers at the company use high-quality protein structural models produced from high-resolution structural data and modeling. Abel says that researchers also take time to refine those models to make sure amino acid side chains are in the right spots and that bond geometries are correct.
The task force is trying to do something similar and is making its data freely available for anyone who wants to use them. For example, the Folding@home initiative uses distributed computing to model the virus’s proteins and look for spots for drugs to target. John D. Chodera, a computational chemist at Memorial Sloan Kettering Cancer Center and a core member of Folding@home, says the initiative has switched to using the task force’s structural data rather than the PDB’s. The task force, Chodera says, is a “transformative resource.”
Thorn got the inkling to start the task force in January 2020. At the time, not everyone might have known that the novel coronavirus was going to change our lives, but many researchers were already rolling up their sleeves to study the new virus. Structural biologists were a big part of that push. “The community swung into action, fast,” Thorn says. “Many people stopped what they were doing and started to fight this virus in order for drug developers and vaccine specialists to have the material they needed downstream. And it was awesome.”
When Thorn looked at the protein structures circulating in January 2020, she saw that several could be improved. Perhaps, she realized, she could do that. After several conversations, Thorn assembled a team of like-minded scientists hailing from seven countries; they were all stuck at home because of the outbreak and looking for a way to help the global effort to study the coronavirus.
Last March, the task force started downloading protein structures of SARS-CoV-2 and SARS-CoV-1, an earlier, related coronavirus that caused the severe acute respiratory syndrome outbreak in 2002–3. The researchers systematically evaluate the structures, choosing key entries to improve. On their website (insidecorona.net), they also add context about the structures and how they function by publishing blog posts, 3D models, and illustrations. Sometimes, team members let others watch them work live via the streaming service Twitch.
Thorn already had an eye for spotting errors in protein models. When she started the task force, she was a junior group leader at the Julius Maximilian University of Würzburg. Her research focus was on why molecular models from structural biology do not always fit well with experimental data. To understand that question, Thorn says, you need to understand the limitations of the methods used to translate experimental data into a protein model.
Today, most structural biologists use one of a handful of technologies—X-ray crystallography, cryo-electron microscopy (cryo-EM), or nuclear magnetic resonance—to probe the structures of their proteins. But whatever the technique, the resulting data do not magically tell scientists everything they want to know about the protein. X-ray crystallography, for example, provides a 3D map of electron density. The data tell a scientist where some of the protein is but not which specific bits of the protein go where. Researchers must thread the protein’s long string of amino acids through that electron density map. They move the backbone and amino acid side chains around until they get a model that makes sense physically and chemically.
Often, Thorn explains, the raw data are not ideal. For example, crystallography techniques struggle to capture the electron density from loops and unstructured regions of proteins. Structural biologists have to estimate how many amino acids fit into those fuzzier spots in the data. Sometimes the decisions that researchers make when interpreting those maps lead to errors. Researchers might first assign a metal to a spot in the map that looks like an ion, but chemically, it might make more sense for that ion to be a different species. And other times, the steric interactions or bond isomerizations in the structure need tweaking.
Each time structural biologists make a wrong assumption when building a model, they can miss some crucial biological information. That becomes a problem if you try to use structural information for something like designing small molecules to bind to or interact with that protein. And errors in one structural model can be propagated in later structures because researchers often use earlier models when analyzing new data on a protein.
Over 1,000 SARS-CoV-2 structures are now available in the PDB, covering 18 of the virus’s 29 proteins. The task force automatically evaluates all the structures using different validation tools, but members don’t analyze every structure in detail. Instead, Thorn says, the team ensures there is at least one trusted structure for each protein.
If task force members spot something they want to improve, they use different tools to go through the models, amino acid by amino acid, to check bond orientations, side-chain arrangements, and the positions of ions and water molecules. For example, when inspecting a cryo-EM model of the virus’s RNA-dependent RNA polymerase, the task force realized one section was off because a previous model had poorly resolved a floppy loop toward one end of the protein. As Schrödinger’s Abel points out, a protein “is a dynamic object that will wiggle and jiggle.” Sometimes those wiggles and jiggles change a protein’s shape and function. In this case, fixing the model changed where amino acids ended up, with some landing at different twists and turns in the corrected structure compared with the original.
Once the task force finishes fixing a model, it adds the structure to its database on the online platform GitHub and contacts the original owners of the data. It is up to the scientists who first uploaded structural information to the PDB to update their entry, and many do; there is no mandate for them to do so or to make the raw data behind the structural work available.
Thorn says she hopes the task force’s work demonstrates how experimental structural biologists and downstream users of these structures need to work more closely together to improve these models’ quality. “I certainly hope this is the way in which the field is moving,” she says.
The task force is still working hard after a year of effort, and Thorn says that since the beginning of 2021, she has been working on transforming the organization into a more sustainable form. For example, she hopes to hire a scientific coordinator to help manage the project. In October, Thorn moved her lab to the University of Hamburg, but she is still a junior research group leader without tenure, which means she does not have long-term job security. She continues working from home and looking for a permanent research position in Germany.
The task force, she says, is a way to be valuable and productive through the pandemic lockdowns, and it is important to all the task force’s members that their work is freely available. But prioritizing fast and open data sharing means team members do not yet have high-impact publications to show for their work over the past year. Maybe, Thorn says, that was not so wise for their careers, but she says she would still make the same decision today. “We wanted to fight this pandemic,” she says. “And we did not want to put any kind of barrier between people and data.”