You Asked, We Sequenced, Because Googling 'DNA Storage' Can Get Weird.
Does DNA have life?
No. DNA is a chemical molecule that acts as a storage medium for information within living cells. DNA gives instructions to produce multiple proteins that are essential for an organism to survive (and possibly thrive). Cells cannot live without DNA (of course red blood cells and platelets are exceptions, they don’t even have DNA). But DNA does not need the cell to survive.
Is DNA unstable?
DNA is highly stable due to its double helix structure held together by hydrogen bonds. We have been able to retrieve DNA from fossils that are millions of years old and have been exposed to severe geophysical conditions.However, exposure to moisture and oxygen can break the phosphodiester bonds in DNA. So we need to maintain an inert environment to ensure stability.
How exactly can digital data go into DNA?
The digital data we have today is in the binary form - composed of 0s and 1s. This data is then converted to a DNA sequence with the four base pairs. There are different ways to do this conversion. One example is to map 00 to A, 01 to G, 10 to C and 11 to T (or any combination of this). This DNA sequence is then generated chemically and stored securely.
Does DNA data storage mean living organisms walking around with data in their bodies? Are we all turning into cyborgs?
No. Theoretically we could store data in human cells, but then DNA has to be extracted from the cell for us to be able to read and write. Grow Your Own Cloud stores data in plants - this is a series of art installations, but possibly cannot be scaled for commercial purposes. The DNA being used for data storage is usually chemically synthesized. At BioCompute, we are experimenting with a different approach (more on this in a later article) but nevertheless data is not going to be stored in living organisms.
How hard is it to read and write DNA?
Reading DNA: Done using sequencing equipment. There are three main techniques:
Sanger Sequencing
Next Generation Sequencing (NGS)
Nanopore Sequencing
Sequencing Methods:
Sanger and NGS rely on chemical reactions
Nanopore sequencing uses fluctuations in an electric field
Sequencing Cost:
2007: ~$1 million per human genome (1 GB)
2014: ~$1000
2023: ~$600
Writing DNA: Involves chemical synthesis using:
Phosphoramidite synthesis
Enzyme synthesis
Cost of writing DNA: ~$400 million for 1 TB of data
Bottlenecks: DNA synthesis is slow, but microarrays enable parallel synthesis
Read/Write Speed:
Read: 5 hrs 2 mins per billion base pairs (~1 GB)
Write: 18 MB/s
Cold Storage Focus: Since access time isn't critical, speed isn't the top priority
How fast are the read/write speeds in DNA? Can we make it faster?
As of today, the fastest read speed is 5 hours and 2 minutes to read a billion base pairs, equivalent to 1GB of data. The fastest write speed is 18 MB per second.
Yes, we can make it faster and that is a part of what we are working on at BioCompute.
Currently the focus is on leveraging DNA data storage for cold storage applications, where the data does not need to be accessed for extended periods of time. This implies that read and write speeds may not be the most important parameters for cold storage.
Is reading and writing DNA only applicable to data storage?
No. Reading of DNA is done on a regular basis to classify newly identified species, map family trees across generations, and diagnose diseases. The Human Genome Project is an interesting use case for sequencing.
Writing DNA is done for various applications such as gene cloning to introduce or knock off certain traits in different organisms especially bacteria, forensics and sensing. Reading DNA is more widely used than writing, this partly explains why DNA synthesis is more expensive today than sequencing.
Reading and writing DNA are on demand for multiple applications thus incentivizing different players in the market to innovate on the accuracy and speed of DNA read/write functions.
Is DNA storage the only way to get to biocompute?
Not really. There are groups working on making logic gates using different protein folding mechanisms. The Unconventional Computing Lab at the University of West England is working on using fungal networks and the electrical impulse conducted through these networks for data storage especially for sensing applications. There are multiple other interesting nature-derived compute options. We decided to start out with DNA because it is the most-widely studied (and hence more widely understood) biomolecule and because we now have the tools to control DNA in specifically engineered chemical environments.