If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.



DNA’s Destiny Lies In Its Solvation

Genomics: Researchers can predict whether a DNA sequence codes for mRNA or tRNA based on its interactions with water

by Erika Gebel
May 24, 2012

CORRECTION: This story was updated on May 24, 2012. The research team estimated solvation energies for 56,000 sequences coding for tRNA, not 5,600, as originally stated.

As scientists sequence more and more genomes, they need efficient tools to parse the mountains of data. For example, to uncover the genetic causes of disease, scientists often need to determine whether a DNA sequence codes for a protein. Using basic solvation properties of DNA, researchers have now developed a simple and accurate method for predicting whether a stretch of DNA codes for messenger RNA (mRNA), which translates into a protein, or transfer RNA (tRNA), which helps with protein synthesis (J. Am. Chem. Soc., DOI: 10.1021/ja3020956).

DNA’s Destiny
Credit: J. Am. Chem. Soc.
To help annotate genomes, scientists have developed a tool based on solvation energy to classify a DNA sequence as coding for mRNA or tRNA.
Cartoons of DNA, mRNA, and tRNA.
Credit: J. Am. Chem. Soc.
To help annotate genomes, scientists have developed a tool based on solvation energy to classify a DNA sequence as coding for mRNA or tRNA.

Genomes are a patchwork of biological information with sequences representing genes, regulatory elements, so-called junk DNA, and other exotic nucleic acid species. Scientists typically use existing sequence data to build predictive models that help them find the various genetic elements in a genome. But this bioinformatics approach can lead to mistakes, says B. Jayaram of the Indian Institute of Technology Delhi. “The error rates are pretty high,” he says, with most models producing false positives 50 to 60% of the time.

In a previous study, Jayaram successfully differentiated between coding and non-coding parts of genes based on chemical properties of the DNA sequence (PLoS One, DOI: 10.1371/journal.pone.0012433). As a next step, he wanted to develop a way to tell whether a sequence codes for mRNA or tRNA.

Jayaram and his colleague Garima Khandelwal used data produced by the Ascona B-DNA Consortium. This international group of scientists used a computer model to simulate the behavior of every possible four-nucleotide DNA sequence in a cell-like environment. The simulations estimated each sequence’s solvation energy, which describes its affinity for water; less favorable water-DNA interactions translate into higher solvation energy.

Using these data, the researchers estimated the solvation energies of more than 2 million DNA sequences coding for mRNA and about 56,000 sequences coding for tRNA. They found that tRNA sequences had greater solvation energies than mRNA ones: Over 99% of the genes with solvation energies greater than 1.2 kcal/mol coded for tRNA, while over 99% of genes below this threshold coded for mRNA.

Jayaram says the trend makes sense because tRNA forms a rigid three-dimensional structure with a core hidden from water, while mRNA is relatively flexible and, as a result, more exposed to water.

David Beveridge of Wesleyan University calls these results “quite interesting” and thinks Jayaram’s method will become another tool for analyzing genomes. Jayaram next wants to identify genes for other types of nucleic acids, such as microRNAs.



This article has been sent to the following recipient:

Chemistry matters. Join us to get the news you need.