AI Is Coming Up With Brand New Molecules, Fueling Drug Discovery
17:27 minutes
A recent study in the journal Nature unveiled new proteins that can neutralize the deadliest of snake venoms. They’re “new” in that they aren’t found in nature—they were created in a lab, dreamed up by AI.
Using AI to discover, or design, the building blocks of drugs is a fast-growing area of research. Another team of scientists out of Philadelphia is using AI to discover new antibiotics by resurrecting long-lost molecules from extinct species like neanderthals and woolly mammoths.
We know what you’re thinking: It sounds too sci-fi to be true.
Flora Lichtman talks with two pioneers in the field about how AI is supercharging drug discovery: Dr. César de la Fuente, bioengineer and presidential associate professor at the University of Pennsylvania in Philadelphia, and Nobel laureate Dr. David Baker, director of the Institute for Protein Design and professor at the University of Washington in Seattle.
Keep up with the week’s essential science news headlines, plus stories that offer extra joy and awe.
Dr. César de la Fuente is a presidential associate professor at the University of Pennsylvania in Philadelphia, Pennsylvania.
Dr. David Baker is a Nobel prize winner, director of the Institute for Protein Design, and professor at the University of Washington in Seattle, Washington.
FLORA LICHTMAN: This is Science Friday. I’m Flora Lichtman. Last week, we briefly touched on a recent Nature paper, where scientists unveiled new proteins that can neutralize deadly snake venom, a type of toxin found in cobras and their relatives that can be difficult to treat.
A new antivenom is a cool discovery on its own, but there is a twist to this snake tale. These new proteins were designed by AI. This is a fast-growing area of research, using AI to discover or design the building blocks of drugs. Another team at Penn is using AI to search for new potential antibiotics in the genomes of extinct species, like Neanderthals and woolly mammoths.
I know. It sounds almost too sci-fi to be on SciFri, but it is happening. Here to tell us more are two pioneers in this field. Dr. César de la Fuente, Bioengineer and Presidential Associate Professor at the University of Pennsylvania in Philadelphia, and Nobel Laureate Dr. David Baker, Director of the Institute for Protein Design and Professor at the University of Washington in Seattle. Welcome to you both.
CÉSAR DE LA FUENTE: It’s great to be here.
DAVID BAKER: Thank you.
FLORA LICHTMAN: David, you focus on designing new proteins, which I think can feel a little abstract for the nonprotein designers out there. How should we think about them? How are you interested in using them?
DAVID BAKER: Proteins in nature carry out essentially all the important functions in our bodies and in all living things. And they solve a really enormously broad range of problems, ranging from powering movement to thinking to capturing solar energy. So if you think about all the problems that we face today and you know a little bit about what proteins do in nature, you sort of are brought to the conclusion that a lot of the problems that we face could be solved by new proteins.
And we don’t want to wait hundreds of millions of years for new proteins to evolve. And so the really exciting thing about protein design now is we can make brand new proteins, and we can make them to solve a really wide range of different problems, ranging from snake bite to completely different problems, like degrading plastic or getting methane out of the atmosphere or making improved vaccines. So it’s just a really exciting time now.
FLORA LICHTMAN: Let’s talk about your antivenom proteins. Walk me through the process of how you got to them.
DAVID BAKER: We’ve been designing proteins to bind to other proteins for quite a few years. In the last three years or so, we’ve developed AI methods for doing this, which are analogous to the way that DALL-E, the image generator, works. So in DALL-E, you might say, generate an image of a cat sitting on a table.
Instead, in the case of the snake venom, what we did was or what Susana Vazquez Torres, the brilliant student who did the work, did was she took the snake venom proteins whose structures were known, and she basically told the generative AI, generate a protein which binds to this site on the venom. The generative AI then builds up a completely new protein, which kind of fits perfectly against the snake venom toxin, just like a key would fit into a lock.
Proteins in nature, they’re encoded in DNA in our genome, so each protein in our bodies is encoded by a gene. Since the proteins that Susana was making were brand new, she had to make brand new synthetic DNA that encoded them. She put them into bacteria, and the bacteria produced the proteins. And then she could determine whether they blocked the venom from killing animals.
FLORA LICHTMAN: OK, so the algorithm gives you possible proteins that would work. And how many outputs do you get? Is it like 10, 3, 1,000?
DAVID BAKER: You get many thousands. But you can’t test all of the designs that the computer suggests. And so Susana developed a way of selecting a smaller subset of designs to test, and she tested about 100 designs for each venom and, in some cases, less. And she was able to find, amongst those brand new proteins invented by the AI, very potent inhibitors of the toxin.
FLORA LICHTMAN: Do they always work, like right off the shelf like that?
DAVID BAKER: Well, no, because many designs don’t work at all. It’s a computer fantasy, but they don’t actually work in the real world. And sometimes they work, but they don’t work well enough. They don’t stick tightly enough to the venom.
And so then what Susana did in those cases was to carry out a second round of design, where you basically tell the AI, generate new proteins that look like this one but aren’t exactly the same. And then it will explore that region of that new protein it invented. And generally, when that is done, you find much better, much tighter binders than what you do in the first round.
FLORA LICHTMAN: César, your focus is on discovering new antibiotics. Tell me your process.
CÉSAR DE LA FUENTE: Yeah, so in our case, about a decade ago, we wanted to introduce computational methods in our ability to discover new antibiotics. And typically, using traditional methods to come up with new antibiotics, it’s a really physical process.
You go around nature, and you take soil samples or water samples. And then you try to purify active compounds from all of that complex organic matter. As you can imagine, this is a process that is very reliant on trial and error experimentation.
So instead of doing that, we proposed why not take advantage of the decades worth of biological data that we have at our disposal in the form of genomes, proteomes, metagenomes that have been sequenced? And why not try to use AI and develop the right algorithms to explore all this biological data digitally to be able to accelerate our ability to discover new antibiotics?
FLORA LICHTMAN: So I want to understand this better. Are you saying that in our genomes or in the genomes of organisms there are genes that create natural antibiotics? I think I’m familiar with how we fight infection using T cells and B cells. But are we producing antibiotics as well?
CÉSAR DE LA FUENTE: Exactly. We’re producing a lot of molecules, including proteins and including small proteins called peptides, that can be used as antibiotics or as immunomodulators. So the idea here was thinking of biology as an information source.
All of biology, you can conceptualize it as a bunch of code, essentially. If you think about DNA, it can be thought of a bunch of nucleotide code. Or if you think about proteins and peptides, they’re composed of amino acids. And so in the end, it’s all code, just like the code that we use to communicate with each other through the alphabet.
And so it’s code that can be searched using the correct algorithms. And if we can come up with those AIs, then we can systematically and very rapidly browse through all this vast amount of genetic data to try to find new molecules.
FLORA LICHTMAN: What genomes are you looking in?
CÉSAR DE LA FUENTE: So we started by exploring the human proteome for the first time. The human proteome are essentially all the proteins encoded in our genome. And there we found thousands of new antibiotics that were previously undescribed.
So we came up with this idea that perhaps we could identify similar compounds all throughout evolutionary history and including in our closest ancestors, Neanderthals and Denisovans. So we explored their genetic code, and we identified new antibiotics there, such as Neanderthalin. We had to come up with new names for all these new molecules because they had not really been described before.
FLORA LICHTMAN: Neanderthalin is a great name. I can hear the drug commercial in my mind.
CÉSAR DE LA FUENTE: Yeah, and then we don’t only do the AI work or the computational work. But we also do all the experimental work. So then the computer gives us a number of sequences, which is essentially code. And then we have this chemical robots that can make these small peptides.
Basically we tell them, make this sequence of amino acids. And the robots are capable of making that particular compound in the laboratory. And then we can actually test all those molecules against real bacteria that are clinically relevant that we have access to in my lab. And then if those work and if we do toxicity studies and the toxicity profiles look good, we can go to preclinical infection models, which is what we’ve done with Neanderthalin and many, many other compounds that we found all across the tree of life.
FLORA LICHTMAN: So you are resurrecting extinct proteins.
CÉSAR DE LA FUENTE: In some cases, we are. In some cases, we don’t see any homology, meaning we can’t find them in any living organism in the biological world today. So in those particular instances, we are sort of resurrecting them, if you will, using chemistry in the lab.
And we’ve gone beyond ancient humans. We’ve also developed a new AI model that we call APEX that enables us to actually sample every single extinct organism known to science. And so this APEX model has enabled us to discover new antibiotic compounds in ancient penguins, magnolia trees that disappear throughout evolution, and also even the woolly mammoth or giant sloths.
FLORA LICHTMAN: David and César, you both have been in this world using AI for biology, using AI to create these new proteins, to search for antibiotics. Did either of you encounter any skepticism when you were just starting out?
DAVID BAKER: Well, yes. When we first started trying to design new proteins completely from scratch with new functions, it seemed really crazy because the only proteins that humans knew about were the proteins that have come down through nature, through evolution. It’s kind of like the ancient elven ruins that have these magical properties. Natural proteins have these complicated names, and they’re very exotic.
And so the idea that you could make brand new proteins up on the computer that would actually solve hard problems seemed kind of crazy. And indeed, for a long time, we couldn’t make proteins that were very good at anything. But particularly with the latest generation of AI methods that we’ve developed, now the proteins we can make, they’re starting to look more and more interesting.
So I would say we’ve gone from a situation where I would say we were on the lunatic fringe and everyone thought it was crazy to now in the mainstream. It’s a little bit weird. Everyone’s talking about the protein design revolution, and there are companies starting up every day to try and design new proteins. So it’s really come full circle or maybe 180 degrees, I guess, I should say.
FLORA LICHTMAN: César.
CÉSAR DE LA FUENTE: Yeah. We also faced a lot of skepticism initially. I remember when I got recruited to MIT to do my postdoc, I originally proposed this idea that we could maybe create an antibiotic on the computer, and at the time, MIT was a mecca for AI research. But most people were applying AI systems to pattern recognition algorithms, things like recognizing faces and sounds.
But the idea of applying it to biology or to antibiotic discovery seemed kind of crazy at the time. The general consensus was that it was impossible, that biology was just too complex, too chaotic. There were too many variables for an algorithm to be of any use.
And perhaps because maybe I was younger at the time, I ignored that skepticism. And I continued with my collaborators and my colleagues working on this area, and we were able to actually design an antibiotic on the computer that, when synthesized, it was capable of killing some of the most dangerous pathogens in our society. And then we showed that it could produce infections in preclinical mouse models. And so that was the beginning of convincing ourselves and others that this could be a whole new area of research, where we could do antibiotic design and antibiotic discovery using machines.
FLORA LICHTMAN: And none of these antibiotics have made it to the drugstore yet. What will it take to get them there?
CÉSAR DE LA FUENTE: Not yet. I think what we’ve been able to do so far with AI is really dramatically accelerate our ability to discover new antibiotics. So instead of having to wait for years with traditional methods, which it can take more than the time that it takes to complete a PhD program to come up with some candidates, today, on the computer, we can discover hundreds of thousands of candidates within a few hours. So just on any given day–
FLORA LICHTMAN: Wow.
CÉSAR DE LA FUENTE: –just to give you a sense–
FLORA LICHTMAN: Hundreds of thousands of candidates in a few hours?
CÉSAR DE LA FUENTE: In a few hours. So it’s quite remarkable. So on any given day, like this morning, for example, I came into the lab. I had a cup of coffee, and by lunchtime, my team has already told me that we have thousands of new molecules to sort through. And by dinnertime, we’re going to have a lot more.
And so it’s an amazing playground for a scientist like myself that has been dreaming about really coming up with new antibiotics for so long. And now, with AI, for the last several years, we’ve been able to really help dramatically accelerate discovery.
FLORA LICHTMAN: David, you’re using AI to design good things, to help humanity. But would it be equally easy to use these algorithms to design bad stuff, like a bioweapon or something like that?
DAVID BAKER: It could be. But for better– or well, for worse, I guess I should say, nature has already perfected ways of doing bad things. If you take something like Ebola virus or the 1918 Spanish flu, whose sequence is now publicly available, it has this amazing and incredibly dangerous ability to infect a person and then spread in a population. And that involves many, many different biological functions that individually are quite a challenge to design.
So I think currently and for the foreseeable future, the methods I’m describing are going to be much more powerfully deployed to combat nature’s pandemic viruses and perhaps bioweapons because there’s plenty of stuff that’s bad already out in biology. What we don’t have are good ways to protect against viruses, for example.
FLORA LICHTMAN: David, what’s your bluest of blue sky ideas for using this approach?
DAVID BAKER: Well, I have a lot of them, and one of the fun things now with the rest of the world now using our methods to do the easier design problems, we’re really focusing on the bluer sky problems. But I’ll give you an example of one of them. What if we could design nanomachines that could circulate in our bodies and use a fuel that was present in our diet, so, for example, a triglyceride or something else that might be part of your diet?
And that nanomachine would do things like unclog arteries, untangle amyloid plaques, basically be a much more active cleaner upper than current drugs are. Current protein medicines are things like antibodies, which just bind to a target and block an interaction. But what if we could make medicines that actually actively reconstruct and fix damaged tissue and perhaps could help with some of the problems in aging?
FLORA LICHTMAN: Wow. César, where do you think this field is going to be in five years? Or where do you hope it will be?
CÉSAR DE LA FUENTE: Well, my greatest dream is hopefully some of the things that we’ve come up with can transition into the clinic and eventually help people. That’s what really drives us every single day to do the work that we do. I think one thing, perhaps, for the future is that we need better data sets to train AI models.
We want to see really an explosion of successful story of AI in biology and in chemistry. We’re going to need really good standardized, high-quality data sets. And I’ve been talking to NSF and NIH to try to convince them to maybe start funding data set generation projects, not only hypothesis-driven projects, in order to be able to train the next AI models that will continue fueling this revolution.
DAVID BAKER: Just echoing what César said about the importance of data sets, the work we’ve done on designing new protein rests entirely on the really hard work done by generations of graduate students and postdocs and scientists solving protein structures and putting them in the protein structure data bank. So they’re really the unsung heroes of the advances in AI for protein design and protein structure prediction, is that they generated the really high-quality data that the protein design methods, for example, that I described, were trained on.
CÉSAR DE LA FUENTE: Every crucial decision in my lab, it takes into account the recommendation from the machine but also the recommendation made by the human scientists that actually generate the data and then can inform decisions made by the algorithm.
DAVID BAKER: Yeah, that’s very true in our case. We have to decide what problem to try to solve, and that’s very much a human decision. And then the AI will generate some number of solutions, and then you have to decide which ones you’re going to test.
That’s the human decision. You have to make them in the lab. That’s a human action. And then you have to decide what to do with them once they work, and that’s a human decision as well.
CÉSAR DE LA FUENTE: It’s really a tight collaboration between humans and machines, at this point.
FLORA LICHTMAN: A tight collaboration for now, anyway.
DAVID BAKER: Yes.
CÉSAR DE LA FUENTE: For now. For now, we’re still very helpful.
DAVID BAKER: Yes. [LAUGHS]
FLORA LICHTMAN: Thank you both for taking time to talk to me today.
CÉSAR DE LA FUENTE: Thank you so much.
DAVID BAKER: Thank you.
FLORA LICHTMAN: Dr. César de la Fuente, Bioengineer and Presidential Associate Professor at the University of Pennsylvania in Philadelphia, Nobel Laureate Doctor David Baker, Director of the Institute for Protein Design and Professor at the University of Washington in Seattle.
Copyright © 2025 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/
Rasha Aridi is a producer for Science Friday and the inaugural Outrider/Burroughs Wellcome Fund Fellow. She loves stories about weird critters, science adventures, and the intersection of science and history.
As Science Friday’s director and senior producer, Charles Bergquist channels the chaos of a live production studio into something sounding like a radio program. Favorite topics include planetary sciences, chemistry, materials, and shiny things with blinking lights.
Flora Lichtman is a host of Science Friday. In a previous life, she lived on a research ship where apertivi were served on the top deck, hoisted there via pulley by the ship’s chef.