If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



Prehistoric viruses smuggled genes into our DNA

About 8% of our DNA comes from viruses that infected our distant ancestors. Some of this DNA codes for proteins that affect our biology today

by Benjamin Plackett, special to C&EN
April 30, 2022 | A version of this story appeared in Volume 100, Issue 15


A crystal structure of the Arc protein.

Credit: Simon Erlendsson
The Arc protein came from a virus that infected a distant ancestor of mammals. Today it facilitates communication between cells in the brain and helps memory formation.

Viruses are ancient. They have been infecting animals, our ancestors included, for hundreds of millions of years before the first humans ever showed up. And the legacy of those primordial infections can still be found dwelling within our genomes today.

When a specific collection of events happens at just the right time, a virus can implant its genes into a host organism’s DNA, and those bits of foreign DNA can end up being passed down to future generations. It’s an exceptionally infrequent event. But the arc of evolutionary history is long, and these rare incidents have accrued over the millennia—so much so that these viral relics account for approximately 8% of human DNA.

Scientists used to think that most of these holdover viral genes, sometimes called endogenous retroviruses, were just junk DNA that didn’t code for anything of consequence. But an abundance of recent studies proves just the opposite. Some of these sequences from old viral genes code for proteins that get made in our cells. These descendants of viral proteins can be beneficial—for example, playing a role in forming memories or in the development of the placenta. Others are harmful and relate to conditions like cancer and heart disease.

Researchers are now studying how these viral genes, which sneaked into our ancestors’ genomes millions of years ago, affect our biology today.

Gene smuggling

Before endogenous retroviruses can pass from generation to generation of a host organism, a lot must go right for that genetic material. “It’s not like every time you get a cold your DNA balloons up with new DNA,” says Molly Hammell, a geneticist at Cold Spring Harbor Laboratory. “That’s not how it works.”

A subset of viruses known as retroviruses are especially well equipped to sneak their genes into a host’s DNA. These viruses reproduce by tricking host cells to produce all the proteins that the virus needs to make copies of itself. To hijack host cells, the viruses use enzymes called reverse transcriptases to convert their RNA genes into DNA. After this conversion, retroviruses deploy enzymes called integrases, which make strategic incisions at locations along an organism’s chromosomes where the viral DNA can embed itself.

Humans and other organisms have tools to undo these genetic sleights of hand. For example, human cells have robust quality-control mechanisms that verify the integrity of the cell’s DNA. These integrity checks eradicate most viral insertions, but a few manage to slip through.

It’s not like every time you get a cold your DNA balloons up with new DNA. That’s not how it works.
Molly Hammell, geneticist, Cold Spring Harbor Laboratory

A viral gene’s escape from a cell’s quality-control checks doesn’t guarantee the gene’s passage from host to offspring. The viral gene can’t land in any old cell. It needs to find its way into the nucleus of a reproductive cell, like a sperm cell or egg cell, to have a chance at becoming a heritable trait.

While a sexually transmitted virus might reasonably target those cells, biologists aren’t sure how respiratory viruses might infect them. “Viruses look for specific receptors to enter a certain type of host cell,” Hammell says. “Only a few viruses specifically target sperm or egg cells, but it might be that sperm or egg cells have the same surface receptors as the intended target cell.” It’s not clear, however, how or why a respiratory virus would spread from the lungs, for example, to the reproductive system.

And even if a viral gene does enter a reproductive cell’s DNA, the gene won’t be passed to subsequent generations unless that cell meets its counterpart and develops into an offspring. Every step in this process is unlikely. “That’s why it’s a rare event. But of course, if you look at the timescale of evolutionary life and the number of infections that occur, the chances accumulate with each generation,” says Jason Shepherd, a neurobiologist at the University of Utah who studies holdover viral genes involved in cognitive functioning. “That’s why our genomes are riddled with viral elements.”

Helpful viral genes

Once these endogenous retroviruses successfully integrated into our ancestors’ genomes, the genes evolved over time. Of those that remain active today, some are helpful for human biology, while others are harmful. “A lot of it is down to luck,” Shepherd says.

Shepherd studies one of the helpful ones: the Arc gene.

The Arc gene is crucial to the mammalian brain’s ability to store long-lasting memories. In 2018, Shepherd and colleagues showed that the protein that the gene codes for facilitates communication between cells in the nervous system (Cell 2018, DOI: 10.1016/j.cell.2017.12.024). The Arc protein does this by producing vesicles that can travel from cell to cell, carrying substances such as membrane proteins and genetic material that would otherwise struggle to enter cells.

The researchers were tipped off about the gene’s viral origins when they were purifying Arc proteins. They had engineered bacteria to make the protein and started to notice structures that resembled capsids, the protein shells that encase viruses. It wasn’t a result they were expecting.

“We thought we were just bad at purifying the protein,” Shepherd says. “Then we wondered if the bacteria were infected with viruses, so we used different bacteria, but it kept happening.”

If you look at the timescale of evolutionary life and the number of infections that occur, the chances accumulate with each generation.
Jason Shepherd, neurobiologist, University of Utah, referring to the chances of a viral gene sneaking into a host’s genome

That’s when they realized the Arc gene itself was responsible for making the capsids. When the researchers looked for genetic sequences that are similar to Arc, they found genes from an ancient family of viruses. The team estimates that a virus probably slipped Arc into the genome of an ancestor of mammals around 397 million years ago, when amphibians separated from fish.

Scientists like Shepherd are still working on what Arc proteins do in our brains and why we need capsid-like structures to signal between cells. “I’ve kind of been sidetracked by the virus connection,” he says. “But we’re trying to figure out the biology of Arc capsids, how they get out of the cell, and exactly what they do in recipient cells.”

Arc isn’t the only example of how viral genes in our DNA have become essential parts of our biochemistry.

Just as Shepherd’s team spotted something odd about the Arc protein, biologists noticed that the protein syncytin didn’t look like known human proteins. Syncytin is produced by cells in the epithelium of the human placenta and helps fuse those cells to form a layer that separates the developing fetus from its parent. Without this syncytin-produced layer, the parent’s immune system would recognize the fetus as foreign and attack it. Biologists looked for a structural match for syncytin in public databases and found that it was most similar to retroviral proteins. Researchers have since proved that the syncytin gene, which is now crucial to placental mammal reproduction, came from an ancient retrovirus.

Some ancient viral incursions have also ended up helping us fight viral infections. University of Utah scientists searched human DNA for any viral gene fragments that could code for proteins that help a cell mount an immune response when a pathogen invades. They found thousands of examples and used gene-editing tools to remove those fragments from cells’ DNA before infecting the cells with a virus. The gene-edited cells responded more weakly than the cells with viral genes (Science 2016, DOI: 10.1126/science.aad5497).

Other researchers have found that viral genes that code for envelope proteins, which sit on the outside of viruses, get expressed by our cells to fend off viral infections. These proteins block invaders by binding to the receptors that modern viruses use to get into our cells. “It’s kind of like putting gum in the keyhole,” Hammell says.

It’s not all good news

Not all the endogenous retroviruses that sneaked into our DNA have had positive effects, however. Hammell’s research looks at links between the neurological disease amyotrophic lateral sclerosis (ALS) and viral gene artifacts in our DNA. In our cells, a protein known as TDP-43 regulates gene expression. For example, it represses the expression of some genes with viral origins. In people with ALS, TDP-43 clumps and becomes ineffective, disrupting gene regulation. “About 30% of ALS patients show evidence that these viral-like genes are trying to reactivate themselves, as if they were fully competent viruses,” Hammell says. Her lab is trying to understand if reactivation of these viral genes is connected to the damage caused by ALS.

A scheme showing how a virus binds to a host cell, fuses with the cell, releases its RNA, converts its RNA to DNA, and incorporates that DNA into the host’s DNA.
Credit: Adapted from Nat. Rev. Drug Discovery
A retrovirus has a few tricks to incorporate its genes into a host cell’s genome. After it binds to receptors on the cell’s surface and fuses with the cell, it releases its RNA. Its reverse transcriptases (red circles) convert that RNA into DNA. Viral integrases (blue circles) then incorporate that DNA into the host’s chromosomes.

Louis Flamand, a microbiologist at Laval University, also studies the nefarious side of viral genes lurking in our DNA. His lab investigates DNA smuggled into our genomes by herpesvirus 6. About 1% of people have this virus’s genes in their chromosomes. In 2015, Flamand and his colleagues looked at the DNA of almost 20,000 people aged 40–69. The researchers discovered that the participants who had herpesvirus 6 genes in their genome were three times as likely as people without these genes to have angina, a type of chest pain caused by reduced blood flow to the heart (Proc. Natl. Acad. Sci. U.S.A. 2015, DOI: 10.1073/pnas.1502741112). This connection between the viral DNA and this symptom of coronary artery disease held true even after Flamand accounted for other confounding factors, such as body fat and high blood pressure.

Flamand and colleagues later reported a possible explanation for why people with the inherited herpesvirus 6 genes might be at a greater risk of heart problems. The team found that some people actively express a herpesvirus 6 gene called U90 in several organs, including the esophagus, adrenal glands, and brain. The U90 gene produces a protein known as IE1. Some of the people producing IE1 also have antibodies against the protein, which suggests that their immune systems still recognize the IE1 protein as a hallmark of a viral infection and respond accordingly (J. Virol. 2019, DOI: 10.1128/JVI.01418-19). Inflammation, a key immune response, can cause a myriad of health concerns, including cardiovascular problems. Inflammation thus possibly links the active viral genes and angina, Flamand says.

Researchers hope that by continuing to investigate the viral leftovers in our genome, they’ll learn more about our cells’ biochemistry and gain insights into diseases. It’s clear that random infections that occurred hundreds of millions of years ago still affect us in many ways today.

Benjamin Plackett is a freelance writer based in rural New South Wales, Australia.


This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.