Using DNA To Boost Digital Data Storage And Processing

A blue strand of DNA across a black digital screen with code in the background — Credit: Shutterstock

You might be familiar with a gigabyte, one of the most popular units of measure for computer storage. A two-hour movie is 3 gigabytes on average, while your phone can probably store 256 gigabytes.

But did you know that your body also stores information in its own way?

We see this in DNA, which has the instructions needed for an organism to develop, survive, and reproduce. In computing storage terms, each cell of our body contains about 1.5 gigabytes worth of data. And with about 30 trillion cells in our bodies, we could theoretically store about 45 trillion gigabytes—also known as 45 zettabytes—which is equivalent to about one fourth of all the data in the world today.

Recently, a group of researchers was able to develop a technology that allows computer storage and processing using DNA’s ability to store information by turning genetic code into binary code. This technology could have a major impact on the way we do computing and digital storage.

To explain more about this technology, SciFri guest host Sophie Bushwick is joined by two professors from North Carolina State University’s Department of Chemical and Biomolecular Engineering, Dr. Albert Keung and Dr. Orlin Velev.

Segment Guests

Albert Keung

Dr. Albert Keung is an associate professor, University Faculty Scholar & Goodnight Distinguished Scholar in the Department of Chemical and Biomolecular Engineering at NC State University in Raleigh, North Carolina.

Segment Transcript

SOPHIE BUSHWICK: This is Science Friday. I’m Sophie Bushwick. Your genetic code contains a ton of data. It has all the instructions your body needs to develop, survive, and reproduce. And DNA is an incredibly compact way to store that information. Each one of our cells contains the equivalent of about a gigabyte of DNA data. That might not seem like too much. In comparison, it takes about three gigabytes to store a two-hour movie.

But with an estimated 30 trillion cells in our bodies, all that DNA adds up to roughly 30 trillion gigabytes or 30 zettabytes of storage. That’s enough to encode roughly one fifth of all the data in the world today. In recent years, researchers have developed technologies that tap into DNA storage capabilities. By converting genetic code into binary code, they can do things like encoding a book or even all of Wikipedia in the form of DNA base pairs.

And now, researchers are going beyond storage and using DNA as the basis for computers. To explain more about this groundbreaking technology, I’m joined by two professors from North Carolina State University’s Department of Chemical and Biomolecular Engineering, Dr. Albert Keung and Dr. Orlin Velev. Welcome to Science Friday. Thank you so much for being here.

ALBERT KEUNG: Thank you, Sophie, for having us.

ORLIN VELEV: It is a pleasure to be on this very interesting discussion.

SOPHIE BUSHWICK: Thanks. And how does DNA store information?

ALBERT KEUNG: The simplest way to think about it would be, DNA has is a string of letters, A, C, and G. And so you can have any length string of letters that you want. And you can have many of these strings. The simplest way to convert the letters into binary, or zeros and ones, would be an a could be a 00, a T could be a 01, a G could be a 10, and a C could be a 11. And so you just go letter by letter and convert that into these digits.

SOPHIE BUSHWICK: And what about if you want to go beyond storage and use DNA to process information as well? How does that work?

ALBERT KEUNG: So this has actually been relatively active field for over 20 years, since Leonard Adleman first created the first computation with DNA. And there’s many different flavors of this computation. You could use enzymes that can recognize and chew up certain pieces of DNA that have certain sequences. There are types of computations that use interactions between different DNA molecules to bind or unbind each other, and execute logical operations that way. The past two decades have actually generated many creative versions of computation.

SOPHIE BUSHWICK: What about your work? What did your latest study focus on?

ALBERT KEUNG: There’s been two decades of work in DNA computation. And in parallel, but somewhat disjoint, there’s been also work on storing information in DNA. What we wanted to do was create something that was compatible with both storing and computation, basically, try to create a early full computer, something that we think could help spark the imagination of young scientists out there that might be thinking about getting into research, and engineering, and science.

And so our focus is really on, can we create something that can both store, but it’s also warm enough, kind of flexible enough, to be used dynamically for things like computation?

SOPHIE BUSHWICK: Can you give an example of a computation that it could be used for?

ALBERT KEUNG: Yeah. So one of the computations I found really fun was work over a decade ago from Princeton, where they computed a chess problem. And so this was one type of computation that we emulated and another was sudoku. These puzzles basically have kind of similar rules. So basically it’s asking where can you put different chess pieces on a chess board so that they don’t attack each other or don’t attack a certain piece.

Or in sudoku, where can you put zero, ones, twos, threes, so that only one digit shows up in a row column at any one time, or every row, every column adds up to six. Things like that. So you have certain puzzles, certain board configurations that you’re searching for.

SOPHIE BUSHWICK: And if we want to use DNA as a computer, it has to be able, like you said, to store this information and also process it. But some of the techniques you talked about for processing information, like using enzymes to chew up the DNA, that doesn’t seem to be possible if you want to also store the DNA. So how did you get around that problem?

ALBERT KEUNG: Exactly. I think that’s kind of the disconnect that we were trying to find a solution for. So we needed a way to preserve and anchor the DNA without giving up its high density, information density, but also make it so that you can access the data and compute upon it without destroying the database. So this is where we linked up with Orlin Velev’s group, who pioneered a nanomaterial that maybe he can tell you about.

The key discovery was that DNA adhered to this material stably, but allowed enzymes to come in, make copies of the DNA into RNA. And then we could use that RNA to do computations without disturbing the original DNA.

SOPHIE BUSHWICK: Got it. So, yes, I’d love to hear more about this nanomaterial.

ORLIN VELEV: It can be really a pleasure to participate in this project, because this is a really multidisciplinary investigation. Some time ago, we got together with my colleague Albert, and we were discussing how we can basically use the innovative nanomaterials that we make in my group and we study, in order to manipulate and process DNA.

And we had just come across this new material, which we called soft dendritic colloids. It is a fibrillar material, which is made out of biopolymer and it is branched. It has this hierarchical structure. So you have a thicker branch in the middle, and then thinner and thinner branches, which come to be nanofibers all around. And nanofibers tend to be very sticky in physical perspectives.

The reason gecko legs can run on any surfaces– that is, gecko lizards can run on any surfaces with their legs, is that they have this sticky mats of nanofibers. So it turned out that our nanofibers in the in the new materials that we are making can be very sticky to DNA. Basically, we have a particle of fibrillar nature that can bind DNA. And in this way, we can immobilize the molecule, we can protect it in physical and mechanical sense, and we can even manipulate the whole cluster of DNA that has been collected by using magnetic particles which are also included in the structure.

So basically, while Albert has been providing the software, in a sense, we have been trying to provide a hardware that is going to allow the whole thing to be protected and manipulated.

SOPHIE BUSHWICK: When you describe the polymers as sort of like the tree branch with thinner and thinner pieces coming off it, it makes me picture the DNA sort of tangled up in a forest of trees, but on a very, very tiny scale. Is that an accurate way to think about it?

ORLIN VELEV: Well, that is an interesting analogy. Well, if you think of a DNA as a, let’s say, a delicate biological object, such as a bird. A tree is an ideal way to protect a bird in the sense that it can fly in. The branches would protect it, but then, it can still go out. So basically, we have access to the inside, but we have protection from the inside when the molecule is hosted within this hierarchical structure.

And the other thing that’s important about hierarchical structures of this type is kind of a little bit more scientifically said, they have very high surface to volume ratio. So we use a small amount of material, but we create lots of surface area that is then available for DNA molecules to bind. So we do not use too much material, but we can bind lots of DNA on those particles. And as I mentioned, we can also add magnetic nanoparticles during the formation. So the whole cluster at the end is going to be magnetic.

SOPHIE BUSHWICK: Got it. I mean, does that mean that I could sort of– I could plug a computer monitor into the DNA computer and it would theoretically run?

ALBERT KEUNG: No, that– well, actually, yes. You would need an electronic interface. And the time scales of the operations would be very slow, compared to what you’re used to.

SOPHIE BUSHWICK: How slow?

ALBERT KEUNG: On the order of probably a few hours to enter in a command and then get the result.

SOPHIE BUSHWICK: OK. So what is this whole set up look like? If I’ve got– I’ve got my computer monitor, I’m waiting on the results, but what is the DNA computer part of this setup look like?

ALBERT KEUNG: You probably will always need a electronic computer as an interface. And what it would do is basically act as an intermediary between a very high-density DNA setup and database. It would basically process whatever data that you want from it and then display it so that a human could see it. The setup would look something like microfluidics. So you have either tiny, very thin kind of capillary-like tubing or it could be microfluidics.

That’s what we used in this work. But you could also create microfluidic devices that are like little chips that have very small, narrow channels inside. And you could put different databases within these channels, and flow basically different solutions containing your enzymes, or just water, through these channels in order to to make copies or execute computations, access the data that you want.

You would then flow, say, the RNA copies of the data that you want out of the nanomaterial that’s linked to the DNA. The RNA would come out of that and flow into what we call a sequencer. There are several different technologies. The one that we use is called the Oxford Nanopore. So those RNA molecules would then flow through the nanopore and give off different electrical signals as it passes through that pore, and those signals would correspond to the different letters. And you would get that readout that would get sent to your computer.

SOPHIE BUSHWICK: And I know that DNA is a very compact way of storing information. I mean, each one of our cells has about two meters of it, and then it’s compressed into just about six microns. So how much smaller could we make our computers if they use DNA for some of this data storage and processing?

ALBERT KEUNG: Yeah. So theoretically, if you do what’s called freeze drying of the DNA, meaning you basically evaporate all of the water away and you’re left only with the DNA, it can be very, very compact. You could literally store all of the world’s information, square foot.

ORLIN VELEV: If I can go back to the material aspects of this work, DNA can store lots of information, but it is also a delicate molecule. And it is easy to– it is easy to encode the material, to encode information in DNA, but not that easy to then find the right molecule and pull it out of the rest. So that’s why you also need the materials component, which is how do we protect, immobilize, move around, sort out the DNA.

So that’s what makes this research, hopefully, interesting is that it really has all those informatics aspects, and molecular, and materials, and electronics even.

SOPHIE BUSHWICK: That’s a really good point because we think about data storage– I mean, I know that it doesn’t last forever. A USB drive might only work reliably for a decade or less. So as a computer and as a storage method, how does DNA compare to other forms of data storage? How long can it preserve itself?

ALBERT KEUNG: One of the really main drivers of the DNA storage field has been not only the incredible information density, but the longevity. There’s been fossils that have been discovered that are a million years old, and people have been able to extract DNA from it. I think that that– there’s caveats to that in that. The DNA is degraded, and you aren’t able to access all of the DNA in a pristine condition.

However, very simple storage methods can preserve the DNA for a million years. This is one of the key advantages of molecular storage, theoretically stored DNA for thousands of millions of years at near room temperature or in like a household freezer without having to expend very much energy to do that. In comparison, a lot of electronic media, like you mentioned, a USB stick or a lot of the long-term storage, media-like tape, magnetic tape, these are actually not very stable.

I think we often think about inorganic materials as very robust and hardy, right? But they actually have a lot of defects actually just coming out of the manufacturing plants. And a lot of these defects are engineered around in your devices. And over a few years, radiation that naturally comes from space can degrade your devices, just wear and tear from heat oxidation.

And so even the tape storage that’s used for long term, archival storage of data, those often you need to copy the material every 5 to 10 years onto a new tape reel. And you have to repeat that every every 5 to 10 years.

SOPHIE BUSHWICK: And you’ve said that DNA could revolutionize computing. What do you see this tech being used for?

ALBERT KEUNG: So, I do not really see it as, you know, replacing your laptops or personal computing. But I think that there’s a lot of things that are important for our everyday economy and society that rely on computing in the background that we don’t know about, we don’t see. So things that are happening at data centers, these really large buildings with massive energy land footprints that are executing computations for us.

So even when we do like a Google search flight, where is that computation happening? It’s not actually happening on your computer. It’s off somewhere else. And there’s a lot of these large-scale computations that are really important for industry, for academic, research. And I think DNA could be very powerful for those types of processes, where you need to make very, very complicated and demanding calculations that require a lot of storage, but also parallelized computation.

ORLIN VELEV: Yes. If I may add a little bit of a different angle away from informatics, the ability to store and manipulate and deliver DNA and RNA can also find applications in other areas, such as drug delivery, vaccines, plant treatments. So really, kind of combining materials and informatics, in this case, I mean, really have lots of potential future implications which are still to be understood, probably.

SOPHIE BUSHWICK: And what about the two of you? Where do you see your research heading now?

ALBERT KEUNG: In so many directions. We actually have a couple projects that are still ongoing related to just the properties and DNA as a material itself, as well as other types of materials that the develop group has been pioneering and how that interacts with DNA, whether we can protect it for millennia at room temperature, for example. There’s a lot of fundamental questions as well, just about how these materials work at the nanoscale that we’re also interested in.

ORLIN VELEV: I can say that we have been really very inspired by what we have learned from Albert’s group, in the sense that there are different methods for manipulation of particles on the nanoscale, and sorting out in microfluidic devices. That was mentioned, but we have been interested also in external fields, electrical fields especially. So this has been, really, a very productive collaboration in terms of combining ideas from different areas and finding out interesting new applications for both DNA and nanoscience.

SOPHIE BUSHWICK: Thank you so much for joining us.

ALBERT KEUNG: Thank you so much, Sophie, for having us.

ORLIN VELEV: Thank you.

SOPHIE BUSHWICK: Those were North Carolina State University’s Department of Chemical and Biomolecular Engineering Professors. Dr. Albert Keung and Dr. Orlin Velev.

Copyright © 2024 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producers and Host

About Sophie Bushwick

@sophiebushwick

Sophie Bushwick is senior news editor at New Scientist in New York, New York. Previously, she was a senior editor at Popular Science and technology editor at Scientific American.

About Andrea Valeria Diaz Tolivia

@AndreaValeriaDT

Andrea Valeria Diaz Tolivia was a radio production fellow at Science Friday. Her topics of interest include the environment, engineering projects, science policy and any science topic that could make for a great sci-fi plot.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description

Using DNA To Boost Digital Data Storage And Processing

Further Reading

Segment Guests

Segment Transcript

Meet the Producers and Host

About Sophie Bushwick

About Andrea Valeria Diaz Tolivia

Explore More

Further Reading

Segment Guests

Segment Transcript

Meet the Producers and Host

About Sophie Bushwick

About Andrea Valeria Diaz Tolivia

Explore More

A DNA Map You Can Touch—Or Walk Through

Is DNA the Future of Digital Data Storage?