Advertisement

If you have an ACS member number, please enter it here so we can link this account to your membership. (optional)

ACS values your privacy. By submitting your information, you are gaining access to C&EN and subscribing to our weekly newsletter. We use the information you provide to make your reading experience better, and we will never sell your data to third party members.

ENJOY UNLIMITED ACCES TO C&EN

Big Data

Researchers can identify a person by sequencing RNA from individual cells

Sparse, noisy single-cell sequencing data can still identify an individual

by Laurel Oldach
October 2, 2024 | A version of this story appeared in Volume 102, Issue 31

 

An individual’s genome is identifiable. But datasets of RNA sequences from single cells are like a blurry photocopy of the genome. The sequence numbers are small, the data are noisy, and the sequences represent only the small set of genes expressed in a given cell type. That’s why “there was a lot of debate” over whether one could identify a person based on data from a single-cell RNA sequencing experiment, says Columbia University bioinformatics professor Gamze Gürsoy.

A dot plot showing several brightly colored clusters of dots, with two unlabeled axes that represent dimensions in a principal component analysis.
Credit: Kyle McDonald/Flickr
Researchers use graphs like this one to visualize single-cell RNA sequencing experiments. Each dot represents a single cell, and colorful clusters show different cell types.

Researchers currently treat single-cell sequencing as biochemically anonymous and share datasets publicly. The technique catalogs RNA from individual cells to detect expression differences between cell types; as it has become more affordable, studies with hundreds or thousands of research participants have become available online.

Gürsoy and her lab found that they could robustly link single-cell transcriptomes with known genomes using two such datasets from studies of autoimmune disorders (Cell 2024, DOI: 10.1016/j.cell.2024.09.012). The researchers used known statistical relationships between genomic regions and the amount of expression of related RNAs to infer an individual’s genome from as few as 1,000 cells from a blood sample. The method could allow someone with access to the data to uncover individuals in the autoimmune or control groups.

Gürsoy stresses that there’s no evidence that anyone could benefit from uncovering a person’s health status in this way, especially given laws against genetic discrimination. Still, she says, the research community has a responsibility to be proactive about protecting participants’ privacy. “Maybe we need more safeguards,” such as updates to the informed consent process and better protocols for data encryption, she says.

CORRECTION:

This story was updated on Oct. 3, 2024, to correct an error in the spelling of Gamze Gürsoy's surname. It is Gürsoy, not Gursöy.

Advertisement

Article:

This article has been sent to the following recipient:

0 /1 FREE ARTICLES LEFT THIS MONTH Remaining
Chemistry matters. Join us to get the news you need.