Using a set of 32 small peptides, researchers have developed a chemical coding system that could offer a simple approach to molecule-based data storage (ACS Cent. Sci. 2019, DOI: 10.1021/acscentsci.9b00210).
Conventional data stores, including hard drives and tape, that encode information by magnetizing tiny grains can be bulky and expensive to maintain. In principle, molecules may be a more efficient way to archive vast amounts of data over the long term in a low-cost format that consumes far less energy and lasts for centuries.
Researchers have previously used the sequence of bases—A, G, C, and T—in strands of synthetic DNA to represent the digital ones and zeros of everyday computing. Researchers have already stored books and videos this way and decoded them with DNA sequencers. But DNA synthesis is slow and expensive, and recording more information requires building fresh strands each time.
A team led by George M. Whitesides of Harvard University has now developed an alternative approach to molecular data storage that encodes information using 32 short peptides. “The virtue is that it can be done pretty rapidly and really quite inexpensively,” Whitesides says.
The system uses a robotic dispenser to transfer peptides from stock solutions onto a flat metal plate covered with 1,536 gold spots. These 1.25 mm wide spots carry linker molecules that bind to the peptides, each of which contains up to 7 amino acid residues and represents a unit of binary data, known as a bit. If a particular peptide is fixed to a spot, it represents a digital one; if it is absent, the bit’s value is zero. With 32 possible peptides and 8 bits to a byte, the system can encode 4 bytes of data onto each gold spot.
To read the data, the researchers use matrix-assisted laser desorption ionization time-of-flight mass spectrometry to dislodge traces of the peptides and linkers from a spot. Since each peptide has a different mass, the team’s software can automatically decode the resulting mass spectrum into a series of ones and zeros. Scanning each spot in turn allows the system to read out 4 bytes of data at a time and reconstruct the whole dataset. “I think it’s highly innovative; it’s really out of the box,” says Tom F. A. de Greef of the Eindhoven University of Technology, who works on DNA-based data systems.
Whitesides’s team has used the method to encode 400 kilobits of text—Richard Feynman’s famous lecture, ”There’s Plenty of Room at the Bottom”—and store images such as the woodblock print entitled Under the Wave off Kanagawa. The system can write data onto the plates at a rate of 8 bits per second and read it at 20 bps, reliably recovering more than 99% of the information. Peptides dislodged from the spot can never be read again, but the team estimates that enough will remain for multiple scans.
The researchers say that their method is much faster than DNA data storage, which has an average write speed of about 0.001 bps, and it also avoids using any of the reagents needed for synthesis and sequencing. “Synthesis is just intrinsically slow,” Whitesides says.
Speed isn’t everything, though. “One of the reasons people want to do molecular data storage is that you can have a petabyte [1015 bytes] of data in a small test tube, so there’s almost no physical footprint,” de Greef says.
In contrast, the spacing of the gold spots in Whitesides’s system means that it currently stores a mere 64 bytes per square centimeter. But the researchers say they could boost this to megabytes by clustering the gold spots closer together and using a larger set of molecules. Meanwhile, an inkjet printer could write the data much faster, and incorporating cheaper molecules, such as alkanethiols, could cut costs. “Any molecule that can be immobilized and has a unique mass is a candidate for storing information with this approach,” says Milan Mrksich of Northwestern University, who was part of the research team.
Whitesides thinks that the method could eventually be used for secure, long-term data archiving of patient records or financial information, for example. For now, his team is exploring variations on the technique to optimize its speed, cost, and data density.
The caption of this article was updated on May 2, 2019, to remove an incorrect affiliation.