“You might think you know someone, but they can also be strangers”
- All We Imagine as Light
This heart-warming dialogue from the Cannes Grand Prix winning Indian movie (super super happy that this is a story of two Malayali women) rings true for bacteria as well. We never really know what they are capable of until we test it out, and the results of random experiments often leave us pleasantly surprised.
Here I describe one such experiment aptly titled ‘baccam’1 where light is used to create a bacterial camera that captures and stores images. This is one of the first few works that built my conviction in in vivo DNA data storage.
So how do we go about creating a baccam?
Step 1: Design proteins that can be controlled using light
The Cre-Lox recombinase protein is split into two, and each half has one part of a photo dimer attached to it. A photo dimer is produced by the covalent bonding of two identical molecules when they are exposed to light (for example, O2 is a dimer, if two atoms of oxygen can be made to come together in the presence of light, it can be called a photo dimer).
When this modified recombinase is exposed to light, it removes a small DNA sequence that is flanked by two LoxP sites (23 base pair sequences) in a genome. This removal of the DNA sequence corresponds to the encoding of a bit, say 1. So any bacterial cell can hold one bit - either a 0 or a 1.
Here is a pictorial representation of how these proteins work:
Step 2: Edit the bacterial genome
This involves two steps:
a) Use plasmids to introduce LoxP sites into the bacterial genome. LoxP sites are 23 base pairs long and can be identified by looking at the same 13 base pair sequences at the two ends, and a random 8 base pair sequence in the middle.
b) Introduce the light-sensitive protein designed in step 1 into the cell
Step 3: Encode an image into bacterial DNA using light
Take a 96 well plate and add bacterial cells to it- this is our camera. Now based on the image we need to encode, we need to determine which wells need to be exposed to light. If a cell is exposed to blue light (465 nanometer wavelength), it encodes a 1 and if it is not exposed it encodes a 0. This is similar to pixels in a photograph.
We then add barcodes (known short DNA sequences) that give the address of each well, so that we can reassemble the complete image after sequencing.
Step 4: Retrieve the data by sequencing
In this experiment they have used an Illumina Next Generation Sequencer to read the sequences from each well and reconstruct the image.
Figure: Encoding and decoding of an image in E.coli genome using blue ligh
How accurate is a baccam?
The encoded image was retrieved with an accuracy of 93/96 - the bits encoded in 93 of the 96 wells were retrieved accurately, while 3 of them were missing. Similar results were obtained for various images encoded in the same way on different 96 well plates.
The image retrieval was done under different kinds of environmental conditions - after drying DNA, exposing it to room temperature and a higher temperature of 60 °C and exposure to UV light. And it was observed that the accuracy of retrieval remained more or less the same. This showed me that DNA is a reliable storage medium.
Figure: Accuracy of image retrieval under different environmental conditions
What is the least amount of DNA we need to store data?
The initial concentration of DNA taken was 2.66 nano molar. The fidelity of data retrieval was maintained up to 1000x dilution. At 1000x dilution, the data retrieved is as low as 28/96. This means that the amount of DNA needed to implement a storage mechanism of this kind is low enough for this mechanism to be implemented in microfluidic channels.
Figure: Accuracy of image encoding corresponding to different dilutions
Can we encode multiple images simultaneously?
Yes, this particular experiment uses multiplexing of different wavelengths (aka colours) of light to encode different images. This means that we now edit the bacterial genome to become responsive to different wavelengths, with each kind of light and the corresponding protein changing a different part of the genome - so the same bacterial cell could hold two or more pixels belonging to different images (one pixel of each image).
Figure: Two Images Encoded Simultaneously, One using Blue Light and the Other Using Red Light
In addition, a threshold was set to distinguish between a 0 and a 1 in case of partial removal of a particular DNA sequence by the photo sensitive protein. Some machine learning techniques were used to improve the speed of data retrieval through automation of clustering.
How can we get E.coli to click better photos?
1. One bit per cell is a waste of the high storage capacity of DNA which we want to fully leverage. A better way to do this might be through CRISPR so that each DNA base in a cell can store some relevant information.
2. The entire process could be automated on a singular hardware platform so that the human errors that come through moving the DNA sample across processes can be minimized.
3. We need to identify how many different wavelengths can potentially be multiplexed at a time without affecting the accuracy of data encoding. And we need to optimize for that to improve the speed of writing data into DNA.
What if we could write millions of terabytes worth of data into biomolecules at the speed of light? Just the thought of that sends a shiver down my spine. There is nothing more exciting than building the future through BioCompute.
Lim, Cheng Kai, et al. ‘A Biological Camera That Captures and Stores Images Directly into DNA’. Nature Communications, vol. 14, no. 1, July 2023, p. 3921. DOI.org (Crossref), https://doi.org/10.1038/s41467-023-38876-w.
Can relate to some extent but still wondering HOW this writing / working with DNA / putting data inside would actually happen for terrabytes of data...
Really nice to be learning about this still