It’s been a week since we announced BioCompute to the world. And I am so excited by folks reaching out to chat, ask questions and just cheer us on.
In this article, I break down the top eight questions we were asked over the last week about DNA, data storage and more. This primer is meant to set the stage for some interesting articles boiling in our ‘idea furnace’, jostling to be let out into the world.
1. Does DNA have life?
No. DNA is a chemical molecule that acts as a storage medium for information within living cells. DNA gives instructions to produce multiple proteins that are essential for an organism to survive (and possibly thrive). Cells cannot live without DNA (of course red blood cells and platelets are exceptions, they don’t even have DNA). But DNA does not need the cell to survive.
2. Is DNA unstable?
“Our DNA does not fade like an ancient parchment; it does not rust in the ground like the sword of a warrior long dead. It is not eroded by wind or rain, nor reduced to ruin by fire and earthquake. It is the traveller from an ancient land who lives within us all.” - Brian Syskes, The Seven Daughters of Eve: The Science That Reveals Our Genetic Ancestry
DNA is highly stable due to its double helix structure held together by hydrogen bonds. We have been able to retrieve DNA from fossils that are millions of years old and have been exposed to severe geophysical conditions.
However, exposure to moisture and oxygen can break the phosphodiester bonds in DNA. So we need to maintain an inert environment to ensure stability.
3. How exactly can digital data go into DNA?
The digital data we have today is in the binary form - composed of 0s and 1s. This data is then converted to a DNA sequence with the four base pairs. There are different ways to do this conversion. One example is to map 00 to A, 01 to G, 10 to C and 11 to T (or any combination of this). This DNA sequence is then generated chemically and stored securely.
4. Does DNA data storage mean living organisms walking around with data in their bodies? Are we all turning into cyborgs?
No. Theoretically we could store data in human cells, but then DNA has to be extracted from the cell for us to be able to read and write. Grow Your Own Cloud stores data in plants - this is a series of art installations, but possibly cannot be scaled for commercial purposes. The DNA being used for data storage is usually chemically synthesized. At BioCompute, we are experimenting with a different approach (more on this in a later article) but nevertheless data is not going to be stored in living organisms.
5. How hard is it to read and write DNA?
Reading DNA is done using sequencing equipment. There are multiple sequencing techniques:
a) Sanger Sequencing
b) Next Generation Sequencing
c) Nanopore sequencing
While Sanger and next generation sequencing rely on chemical methods, nanopore sequencing uses fluctuations in an electric field to read DNA. DNA sequencing is done on a regular basis in genomics labs across the world. The cost of sequencing a human genome (1 GB worth of data) decreased from an estimated $1 million in 2007, to $1000 in 2014, and as of 2023 is approximately $600. And the cost is steadily coming down as these equipment become more widely available.
Writing DNA is essentially chemical synthesis of DNA. The two broad approaches to this are:
a) phosphoramidite synthesis
b) enzyme synthesis
Writing DNA is fairly expensive - writing 1TB of data into DNA through chemical synthesis costs nearly $400 million. There are multiple attempts at tackling the bottleneck associated with DNA synthesis such as microarrays which enable parallel synthesis.
We will dive deeper into the different read/write techniques in an upcoming blog article. But if you are curious feel free to search for these terms and read up about it - they are extremely interesting.
6. How fast are the read/write speeds in DNA? Can we make it faster?
As of today, the fastest read speed is 5 hours and 2 minutes to read a billion base pairs, equivalent to 1GB of data. The fastest write speed is 18 MB per second.
Yes, we can make it faster and that is a part of what we are working on at BioCompute.
Currently the focus is on leveraging DNA data storage for cold storage applications, where the data does not need to be accessed for extended periods of time. This implies that read and write speeds may not be the most important parameters for cold storage.
7. Is reading and writing DNA only applicable to data storage?
No. Reading of DNA is done on a regular basis to classify newly identified species, map family trees across generations, and diagnose diseases. The Human Genome Project is an interesting use case for sequencing.
Writing DNA is done for various applications such as gene cloning to introduce or knock off certain traits in different organisms especially bacteria, forensics and sensing. Reading DNA is more widely used than writing, this partly explains why DNA synthesis is more expensive today than sequencing.
Reading and writing DNA are on demand for multiple applications thus incentivizing different players in the market to innovate on the accuracy and speed of DNA read/write functions.
8. Is DNA storage the only way to get to biocompute?
Not really. There are groups working on making logic gates using different protein folding mechanisms. The Unconventional Computing Lab at the University of West England is working on using fungal networks and the electrical impulse conducted through these networks for data storage especially for sensing applications. There are multiple other interesting nature-derived compute options. We decided to start out with DNA because it is the most-widely studied (and hence more widely understood) biomolecule and because we now have the tools to control DNA in specifically engineered chemical environments.
Hit me up if you have more question, tell us what you would like to hear more about. Stay tuned!
Hi @anagha Thanks for this incredible article, i understood that dna data read and write is very expensive and we are trying to do it in a dna.
am not sure if this question even makes sense.. but here it is
can we replicate it somewhere else?
like understand how dna stores the data (not the encoding, but the logic gates/structure/type) i dunoo.. and try to mimic it and try storing it there..
maybe the way the data is stored in dna could also be a cruicial optimiser in the software industry.