A team of researchers announces that they have stored six files (including a short film, an entire computer operating system and an Amazon gift card) in a single strand of DNA by reaching a record theoretical density of 215 petabytes per gram.
By approaching the theoretical maximum for the information stored per nucleotide, these scientists are thus opening up real leads on the long-term storage of information. In the journal Science, researchers Yaniv Erlich and Dina Zielinksi and their team have indeed developed a new technique called DNA Fountain allowing a theoretical storage of information of 215 petabytes per gram of DNA and an estimated number of reads before degradation at 10^15. In other words, virtually unlimited.
In the living world, DNA is designed to store, transport and transmit information, in this case the genetic heritage (genome). This information is stored thanks to pairs of nucleotides: adenine (A), cytosine (C), guanine (G) and thymine (T). All of the information is therefore represented by a sequence of these four letters. But errors can occur with DNA and in this sense, the technique used must be accompanied by a real tolerance for errors.
The new Fountain DNA technique makes it possible here to generate additional data for the information transmitted by packets. The more data is added, the greater the error tolerance becomes, since the real information can be reconstructed. The process begins with an encoding taking into account the « fountain codes ». Based on the nucleotides, the researchers then obtain a large quantity of small packets which are then coded into DNA sequences. The trick is then to analyze the code obtained and to remove all the sequences which could pose problems with the reading. Since the additional data is there as a fallback, this deletion has no impact on the information.
In a 2.1 MB compressed archive, the researchers were able to store a minimal operating system and its interface, a French short film from 1895 (the arrival of a train at La Ciotat), an Amazon gift card of $50 as well as a computer virus, a plaque from Pioneer and the academic work of theoretician Claude Shannon. But the most important thing was to test whether the DNA Algorithm Fountain was able to encode binary information into genetic data without losing any of the information. All the tests carried out by the researchers afterwards showed that the system worked very well. They themselves inserted errors or even randomly deleted DNA sequences, but the information could still be restored each time.
Why not build storage banks tomorrow and go all-DNA? Because there are of course limits and mainly one: the cost. The creation of these strands of information currently costs a small fortune: the researchers estimate the invoice – very steep – at around 3,500 dollars per MB. The technique will therefore have to be refined to be industrialized one day.
Recall that DNA data storage is not only an amazing space saver, but also that the technique could allow us to preserve knowledge with extreme robustness and longevity unlike traditional technologies which today succumb to all sorts of flaws over time. Specialists estimate that it will take at least another decade before this type of storage becomes accessible to the general public.