Researchers from the Henry and Marilyn Taub College of Laptop Science have developed an AI-based technique that accelerates DNA-based knowledge retrieval by three orders of magnitude whereas considerably enhancing accuracy. The analysis workforce included Ph.D. pupil Omer Sabary, Dr. Daniella Bar-Lev, Dr. Itai Orr, Prof. Eitan Yaakobi, and Prof. Tuvi Etzion.
The analysis is published within the journal Nature Machine Intelligence.
DNA knowledge storage is an rising area that leverages DNA as a platform for storing info. DNA affords vital benefits as a storage medium, together with:
- Lengthy-term preservation: In 2013, researchers in Denmark efficiently extracted DNA from a horse bone relationship again 700,000 years. In 2021, a global workforce recovered DNA from mammoths that lived over one million years in the past. Against this, magnetic disks utilized in knowledge facilities have lifespans measured in years or, at greatest, just a few a long time. This highlights DNA’s potential for long-term storage.
- Vitality and value effectivity: The “cloud” that powers most of in the present day’s computing providers depends on knowledge facilities that devour roughly 3% of worldwide electrical energy and emit round 2% of whole carbon emissions. With the exponential progress of knowledge, the environmental affect of present applied sciences is anticipated to extend considerably.
- Unmatched knowledge density: DNA storage affords knowledge density as much as 100 million occasions larger than conventional digital storage. Because of this a quantity presently holding one megabyte may theoretically retailer as much as 100 terabytes utilizing DNA.
DNA is a molecule composed of a sequence of natural compounds known as nucleotides. These nucleotides are labeled into 4 varieties, represented by the letters A, C, G, and T. Not like conventional computing, the place knowledge is encoded utilizing solely two digits (0 and 1), DNA storage relies on sequences of 4 letters, dramatically growing the variety of doable combos.
To write down (retailer) knowledge on this know-how, DNA synthesis is required—creating DNA molecules based mostly on the sequences encoding the knowledge. To learn the saved knowledge, DNA sequencing is critical.

Challenges in DNA knowledge storage
Creating DNA-based storage know-how presents a number of technological challenges:
- Each synthesis and sequencing are prolonged and error-prone processes, introducing deletion, insertion, and substitution errors
- As a result of limitations of the synthesis course of, a number of copies of every DNA molecule encoding the info are produced. These copies are saved collectively, unordered, in a storage container
- Throughout sequencing, many faulty copies of those molecules are retrieved—most containing errors, whereas some disappear totally
DNAformer: AI-powered knowledge retrieval
The present analysis presents a complete computational resolution for retrieving and correcting errors in advanced DNA-based storage techniques. Utilizing superior algorithms and encoding strategies, the researchers have demonstrated that their resolution reduces knowledge retrieval and studying time from a number of days to simply 10 minutes.
The Technion-developed technique, DNAformer, relies on a transformer mannequin educated on simulated knowledge (generated utilizing a simulator, which was additionally developed at Technion) to reconstruct correct DNA sequences from faulty copies. The tactic additionally features a customized error-correction code tailor-made for DNA, guaranteeing strong knowledge integrity.
Moreover, an additional security margin mechanism detects significantly noisy DNA sequences (undesirable alerts or errors that happen in the course of the sequencing course of, which might intervene with the correct interpretation of the info) and applies highly effective algorithmic instruments to deal with them effectively. On the finish of the method, the info is transformed again into digital info.
The brand new technique allows the studying of 100 megabytes of knowledge at a velocity 3,200 occasions sooner than essentially the most correct present technique—with none lack of accuracy. In comparison with beforehand identified quick strategies, DNAformer additionally improves accuracy by as much as 40% whereas considerably lowering processing time. This was demonstrated on a 3.1-megabyte dataset, which included:
- A shade nonetheless picture
- A 24-second audio clip of astronaut Neil Armstrong’s phrases on the moon
- A written textual content discussing DNA’s benefits as a promising knowledge storage technique
- Random knowledge for example the applicability to encrypted or compressed knowledge
The researchers plan to develop personalized variations of DNAformer tailor-made to totally different wants. They emphasize that their know-how is scalable and adaptable, which means it may be optimized for large-scale knowledge storage purposes, assembly market calls for and future DNA synthesis and sequencing developments.
Extra info:
Daniella Bar-Lev et al, Scalable and strong DNA-based storage through coding principle and deep studying, Nature Machine Intelligence (2025). DOI: 10.1038/s42256-025-01003-z
Quotation:
DNA knowledge storage: AI technique hastens knowledge retrieval by 3,200 occasions (2025, March 21)
retrieved 22 March 2025
from https://techxplore.com/information/2025-03-dna-storage-ai-method.html
This doc is topic to copyright. Aside from any honest dealing for the aim of personal examine or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for info functions solely.
