If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



Expanding the histone code

Researchers continue to find new chemical modifications on the proteins our genes are wrapped around. Why are they there?

by Celia Henry Arnaud
June 6, 2022 | A version of this story appeared in Volume 100, Issue 20
Strands of DNA wrapped around a histone.

Credit: Molekuul/Science Source


In brief

The DNA in our cells is wrapped around proteins to help compact it enough to fit. Chemical modifications on those proteins—known as histones—help regulate various processes, such as transcription. Those modifications form a code that researchers are working to crack. They are using mass spectrometry to identify how different modifications work together. But that task is complicated by new discoveries of modifications. Researchers disagree about whether those new modifications also serve a regulatory function or whether they are just by-​products of metabolic processes.

Histones are among the most abundant proteins in the body. They act as spools that help compact DNA so our enormous genomes will fit in the tight space of the nucleus. Each spool is made of eight histone proteins and is encircled by just 147 of the human genome’s more than 3 billion base pairs, meaning each cell needs more than 163 million histones to hold all the base pairs.

A spool and its accompanying DNA form what’s called a nucleosome. As the DNA winds around successive spools, the nucleosomes become like beads on a string. The beads and string then twist and compact into chromatin, the stuff that forms chromosomes.

Initially, biologists thought that histones were nothing more than structural supports for the DNA. But then they noticed that the proteins’ tails, which constitute nearly a third of each histone’s length, stick out of the chromatin like little flags, where enzymes can add chemical groups that can control how other proteins bind to the histones. The presence of these groups, called posttranslational modifications (PTMs), seemed to correlate with expression or silencing of the genes wrapped around the modified histones.

In 2000, C. David Allis and Brian Strahl, then at the University of Virginia, proposed that those PTMs formed a code. They were trying to understand a puzzle, Strahl says. A modification on a particular serine on one of the histone’s tails seemed to have different functions depending on the circumstances, says Strahl, now at the University of North Carolina at Chapel Hill. They couldn’t understand how a single modification could drive two functions.

When they looked beyond that particular serine’s site, they saw modifications that seemed to be present at the same time in the same nucleosome. They had a eureka moment. Instead of one modification coding for one function, maybe “it’s a cluster of modifications that uniformly work together to create a particular outcome,” Strahl says.

Allis and Strahl hypothesized that multiple modifications work together to form a language that helps regulate gene transcription and other DNA-templated processes, such as replication (Nature 2000, DOI: 10.1038/47412). This idea fit into the growing understanding that epigenetics—changes in gene expression caused by something other than changes to the underlying genes themselves—plays an important role in cellular processes. Despite some grumblings about calling histone modifications a “code,” biologists now generally accept that they play an important role in regulating gene expression as well as other chromatin functions, like DNA repair and DNA replication.

Interpreting that code remains challenging, partially because the number of modifications researchers have identified on histones continues to grow. Thanks to improvements in mass spectrometry, researchers continue to expand the histone code’s vocabulary. Some researchers suspect that the code works at a higher level than the histone proteins and must be read in the context of each nucleosome or chromatin chain. Others are increasingly finding cross talk between histone modifications and the cell’s metabolism, which has led some researchers to wonder whether certain newly found modifications store metabolites instead of acting as signals.

Multiple Choices
Mass spectrometry can help decode chemical modifications on the tails of histones—proteins that assemble into spools encircled by DNA in cells, forming a unit called a nucleosome. The three main methods researchers are using are bottom-up proteomics, in which histones are digested into peptides before MS analysis; top-down proteomics, in which the nucleosomes dissociate into histones before MS; and nucleosome-MS, in which a single nucleosome dissociates into histones and then peptides after initial analysis of several intact nucleosomes.
Credit: Adapted from Nat. Methods

One point of undisputed agreement is that epigenetics is complicated. “Just about anything you can say about it is an oversimplification,” says Tom Muir, a chemical biologist at Princeton University who studies chromatin. “It’s incredibly complicated, and everything’s interconnected. There is layer upon layer of regulatory mechanisms. One draws general, paradigm-level conclusions in this area at your peril.”

Histone basics

As researchers have delved into how histone proteins’ tails are modified, they have discovered specialized proteins that serve as writers, erasers, and readers that add, remove, and bind to modifications at specific amino acids. Given these specialized proteins’ roles in changing how genes are expressed, some scientists are trying to design drugs that target them.

Researchers have identified more than a dozen types of modifications. The most common ones are acetylation and methylation. Acetylation occurs at lysines, and methylation occurs at lysines and arginines. Methylation has an added layer of complexity in that an individual lysine can be modified with one, two, or three methyl groups. Other modifications include ubiquitylation, propionylation, butyrylation, and lactylation.

Researchers have begun to tease apart how each of these modifications functions. For example, acetylation is a dynamic modification that comes on and off relatively quickly. Acetyl groups add positive charges that help neutralize the nucleosomes’ negatively charged surfaces. That neutralization helps peel the histone tails away from the nucleosome surface and makes them accessible to other proteins that bind to the acetylations.

We have to change our thinking from simple on-off switches and inhibitors like we’ve done for years to more complicated systems.
Andrew J. Andrews, associate professor, Fox Chase Cancer Center

Methylation, on the other hand, is considered a stabler modification. “When the first lysine methylations were identified, they were thought to be irreversible marks,” says Michael-Christopher Keogh, chief scientific officer at EpiCypher, a company that develops designer nucleosomes as reagents for epigenetic research and drug development. Though researchers now know those methylations can be removed, the more methyl groups there are attached to an individual amino acid, the longer they remain. Many methylations are thought to silence genes because they are found in regions of the genome that are not being actively transcribed.

Other modifications are now catching researchers’ eyes. “We always talk about acetylation and methylation because those are much more abundant on chromatin than the others,” says Simone Sidoli, who studies histone modifications at the Albert Einstein College of Medicine. “That’s exactly why the others are so interesting. Probably these are the ones that really fine-tune gene expression and other functions.”

Researchers are also realizing that, as with the modifications themselves, the proteins that write, erase, and read the modifications work in highly nuanced ways that are hard to generalize. The proteins that act on histones are multidomain complexes with multiple functions. “Almost all of the reader domains have really weak binding capability,” Keogh says. “They have to have weak binding activity because the idea is that multiple modifications have to occur at the same place at the same time and then the weak binders synergize to become a strong binder. It gives you the ability to write some exceedingly complex codes.”

Sidoli points to a protein called BAF180, which has six domains that bind acetyl groups, as an example. If it can engage all its binding domains, it binds more efficiently. “I don’t deny that some of these acetylations could coexist randomly,” Sidoli says, “but it’s only when they’re all found together that this protein binds really tightly.” If that many modifications are required for tight binding, he says, it lessens the chances that they’re acting randomly.

Some of the proteins that act on histone modifications are almost like computational devices, says Andrew J. Andrews, who studies histone modifications at Fox Chase Cancer Center. “They’re able to take information on a hyperdimensional scale—like what metabolites are around, what histone PTMs already exist, and what bonding partners exist for them—and then boil that all down to a new PTM,” he says. “We have to change our thinking from simple on-off switches and inhibitors like we’ve done for years to more complicated systems.” For those who want to target diseases using these PTMs or their binding proteins, this means more nuanced pharmacology, he says.

Tools of the trade

Cracking the histone code requires tools that can reveal a clear picture of the PTMs on each of the histone proteins. Researchers quickly realized that using mass spectrometry “would be a good marriage of the technology and the biological questions,” says Benjamin A. Garcia, who uses mass spectrometry to study histone and DNA modifications at Washington University in St. Louis. He has been working on histones since he was a graduate student in the early 2000s. When he was a grad student, “it was very unclear which sites were getting modified and what this meant. There was a golden time of mass spectrometry contributing heavily to identifying new sites and new types of modifications on histones.”

Once people identified histone modifications, they needed to be able to quantify which modifications were on which amino acids. The method they used was bottom-up proteomics, in which proteins are digested into bits and then analyzed by mass spectrometry.

That approach has an advantage of sensitivity—allowing identification and quantification of infrequent modifications, for instance. But some information is lost by cutting the protein into such small bits. For example, if a histone protein has 10 modification sites, bottom-up proteomics can tell you that those modifications exist but not whether all 10 are on the same histone at the same time.

Top-down proteomics offers a way around those challenges. Instead of analyzing digested peptides, top-down proteomics starts with intact histone proteins. This way, researchers know the total mass of the histone proteins and their modifications. Further fragmentation and analysis occurs in the mass spectrometer after an initial snapshot has been taken. This approach enables researchers to characterize all the modifications present on a single histone.

But even this isn’t quite enough, because nucleosomes are clusters of eight histone proteins, and researchers increasingly think the combination of modifications at the nucleosome level matters for unlocking the histone code. They want to understand the complete picture of PTMs on a given nucleosome. They think nucleosome-level analysis might help them crack the highest levels of the histone code.

Neil L. Kelleher of Northwestern University is a longtime proponent of top-down proteomics. He and his colleagues are now advocating an approach called nucleosome-MS (Nuc-MS) that starts from intact nucleosomes instead of individual histone proteins. In this method, they use tandem mass spectrometry to analyze individual nucleosomes’ protein composition by successively fragmenting them and analyzing the masses at each stage (Nat. Methods 2021, DOI: 10.1038/s41592-020-01052-9).

The DNA (green) in cells is tightly wound around groups of eight histone proteins (blue spheres) to form nucleosomes.
Strands of DNA wrap themselves around groups of eight histone proteins.
Credit: Gunilla Elam/Science Source

First, they get the mass of intact nucleosomes, including the histone octamers and the DNA wrapped around them. Then, they select a specific nucleosome, dissociate it into its individual histone proteins, and determine the mass of those. Finally, they break the histone proteins into pieces and determine the identity and location of modifications on each of those proteins. “In a single mass spectrum, you get a view of the entire epigenetic landscape for the protein” portion of the nucleosome, Kelleher says.

Of course, part of the picture of how histones work is the DNA encircling the nucleosomes, so another goal in cracking the code is to better understand the interplay between the nucleosome proteins and the nearby genes. The current version of the Nuc-MS analysis removes the DNA when the nucleosome is dissociated into the eight histone proteins. Kelleher thinks that researchers could also see the DNA in the mass spectrometer, but it would be a tricky experiment, requiring rapid switching between positive and negative ion modes. “It would be fascinating to try to get DNA methylation data from mass spec,” Kelleher says. But he thinks a better approach would be to use Nuc-MS in parallel with DNA sequencing to tell the researchers what sequences are part of what nucleosomes.

One point of undisputed agreement is that epigenetics is complicated. “Just about anything you can say about it is an oversimplification.
Tom Muir, chemical biologist, Princeton University

One key question Nuc-MS may be able to help answer is symmetry’s role in nucleosomes. The eight proteins in a nucleosome are two copies each of four different proteins. Just because a modification is present on one copy of a histone protein within a nucleosome doesn’t mean that both copies are modified identically.

Princeton’s Muir is interested in this asymmetry. His lab makes designer nucleosomes by adding modifications to only one copy of a histone protein. This asymmetry can affect the function of proteins called remodelers, which can move nucleosomes to help provide access to sites in densely packed chromatin, Muir says.

Linking to metabolism

It’s not just the top-down approach that allows mass spectrometry to help crack the histone code. “With the instruments becoming more sensitive, I think we’re seeing a second wave of discovering new modifications on histones,” Garcia says.

Yingming Zhao and his colleagues at the University of Chicago have been riding the crest of that wave. They’ve identified a dozen previously unknown types of histone modifications, especially short-chain acylations. Many of these modifications are attached to lysines that are more typically acetylated or methylated. The low abundance of these newer marks—some of them are found on only a tiny fraction of histones—and the lack of enzymes known to install them has led some people to doubt that they have a regulatory function.


Zhao thinks that these modifications occur in response to cellular conditions and are physiologically relevant. For example, he finds that lysine lactylation happens under hypoxic conditions (Nature 2019, DOI: 10.1038/s41586-019-1678-1). The production of lactic acid in cells is a known response to low-oxygen conditions, and it’s so common in cancer cells that it’s even got a name—the Warburg effect. “People know lactate is related with cancer for several decades, but people don’t know the function of lactate,” Zhao says.

The question is whether these new modifications have a regulatory function—a theory Zhao supports—or are simply reporting on the cellular environment.

“The argument to support that lactyl is a regulatory mechanism instead of a by-product is that if you disrupt this modification, it will lead to physiological changes and even disease,” he says. “We have a lot of data that demonstrate that if we disrupt the lysine lactylation pathway, it leads to changes of cellular function and contributes to diseases, such as tumor growth.” Much of those data are not yet published, Zhao says.

Others, though, have noted that many of these newly discovered modifications could come from nonenzymatically added metabolites, especially ones linked to coenzyme A (CoA), which is part of the energy-making citric acid cycle and fatty acid metabolism, among other things. The abundant acetyl and methyl groups that decorate histones come from acetyl-CoA and S-adenosylmethionine, respectively, both of which are abundant metabolites. CoA is reactive enough that low levels of acetylation can happen even without enzymes. Perhaps other metabolites that can form thioesters with CoA are just reacting directly with histones.

Benjamin Tu, who studies the relationship between metabolism and histone modifications at the University of Texas Southwestern Medical Center leans toward this explanation.

Because the newly identified acylations are infrequent and acetyl-CoA can react at low levels with histones even without enzymes, Tu suspects that the new ​acylations, many of which involve other acyl-CoA metabolites, might be the result of background nonenzymatic reactions.

Muir says one sign that a modification is biologically significant is that there are both enzymes that add and enzymes that remove the modification. “If those two things exist, then for sure that modification is up to something,” he says. “That doesn’t mean to say that it isn’t up to something if those don’t exist. But I think there’s always going to be a question mark.”

In his own lab, Muir says, he would avoid studying a modification until he knows it’s enzymatically installed. “I just feel like you’re on firmer ground in terms of physiological impacts.”

Tiziana Bonaldi, who studies histone modifications at the European Institute of Oncology, thinks that some of the novel acylations could be by-products of metabolism but still play an important role in the cell. She thinks they may be a sensor of the metabolic state. She notes that the novel acylations compete for some of the same lysines as regulatory modifications, so cells may have developed methods for removing the novel ones to clear the way for the regulatory modifications.

She points to the acetylation of enzymes in mitochondria as a potentially analogous example. The addition of the acetyl groups happens nonenzymatically, but mitochondria-​specific deacetylase enzymes known as sirtuins remove the acetylations and allow the enzymes to function properly.

Tu comments on the same mitochondrial deacetylases. “It’s kind of like a detox,” he says. “Chemically it makes a lot of sense that you have reactive metabolites that are going to spuriously modify, and then sometimes you need a system to remove them.”

In the case of histones, Tu thinks that some of the acetylations on histones may be a way to store metabolites. With so many histones and so many potential sites of modification, “you quickly realize that the concentration of acetyl groups is on par with the concentration of acetyl-​CoA in cells,” he says. “That’s a huge pool that can be released as a result of deacetylation.”

More work is needed to bring together information about the histone modifications and the nearby DNA that they regulate. Mass spec can show that there are myriad forms of the individual histone proteins with various modifications, Garcia says. But the methods that look at the DNA select regions according to one histone modification at a time; they can’t look collectively at multiple modifications to see which ones are occurring together. “So we don’t have an idea if this entire code is found on a single gene. I think that’s the missing piece,” he says. By combining mass spec with genomic methods, researchers could start to determine which combinations of modifications are associated with genes that are on, off, or somewhere in between. “Once we have that connection, then we can really start predicting things,” Garcia says.


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.