09/20/2024

How Are AI Chatbots Changing Scientific Publishing?

17:28 minutes

A human hand writing on a paper, with a robot arm writing alongside it.
Made with elements from Canva and Shutterstock.

Since ChatGPT was released to the public almost three years ago, generative AI chatbots have had many impacts on our society: They played a large role in the recent Hollywood strikes, energy usage is spiking because of them, and they’re having a chilling effect on various writing-related industries.

But they’re also affecting the world of research papers and scientific publishing. They do offer some benefits, like making technical research papers easier to read, which could make research more accessible to the public and also greatly aid non-English speaking researchers.

But AI chatbots also raise a host of new issues. Researchers estimate that a significant amount of papers from the last couple years were at least partially written by AI, and others suspect that they are supercharging the production of fake research papers, which has led to thousands of paper retractions across major journals in recent years. Major scientific journals are struggling with how to set guidelines for generative AI use in research papers, given that so-called AI-writing detectors are not as accurate as they were once thought to be.

So what does the future of scientific publishing look like in a world where AI chatbots are a reality? And how does that affect the level of trust that the public has with science?

Ira Flatow sits down with Dr. Jessamy Bagenal, senior executive editor at The Lancet and adjunct professor at University of North Carolina at Chapel Hill, to talk about how generative AI is changing the way scientific papers are written, how it’s fueling the fake-paper industry, and how she thinks publishers should adjust their submission guidelines in response.


Further Reading


Donate To Science Friday

Invest in quality science journalism by making a donation to Science Friday.

Donate

Segment Guests

Jessamy Bagenal

Dr. Jessamy Bagenal is senior executive editor at The Lancet and an adjunct professor at the University of North Carolina, Chapel Hill. She’s based in London, UK.

Segment Transcript

IRA FLATOW: This is Science Friday. I’m Ira Flatow. Since its debut almost three years ago, ChatGPT and other generative AI chat bots have changed how we think about the role artificial intelligence plays in all walks of life, right? Just think about it. They played a huge role in last year’s Hollywood strikes. Teachers report more students using them to write essays. And they suck up a lot of electricity, which is prompting AI companies to find cheaper sources.

But there’s another part of our society where their effects are now coming into clearer view. I’m talking about research papers and scientific publishing. According to a researcher at University College London, approximately 1% of scientific articles published in 2023 might have used generative AI. Using chat bots to help write research papers. This might be appealing to some, but they also pose existential threats to the industry.

So how are scientific journals navigating this new environment? Here to talk about the effects these chat bots are having on scientific publishing is my guest, Dr. Jessamy Bagenal, senior executive editor at The Lancet, physician and adjunct professor at University of North Carolina at Chapel Hill. Welcome to Science Friday.

JESSAMY BAGENAL: Hi. Thanks so much for having me on.

IRA FLATOW: You’re welcome. OK, tell us, where does this start for you? Tell us the first time you saw an example or an article that showed the power that AI chat bots could have on scientific publishing?

JESSAMY BAGENAL: Well, it’s been a couple of years now. But obviously, it’s a rapidly moving field. And I’ve come at it from an editor’s point of view, but also from a clinicians point of view in how we think about evidence, and how we think about knowledge. I followed the story very closely after its initial launch. And I think some of the things that I found most striking were those original small studies where, for example, trained researchers would look at abstracts that had been generated by a researcher, and abstracts that had been generated by a generative AI, large language model. And for the most part, they couldn’t tell the difference.

I think that study came about perhaps, you know, within the first six months of ChatGPT first being sort of released onto the market. And it was very clear that if experienced researchers aren’t able to tell the difference between a generative AI abstract and one that has been written by their colleagues, then this is a really big problem for us.

IRA FLATOW: Is that because they lack the effective tools to detect AI generated content?

JESSAMY BAGENAL: I mean, that’s right to an extent. We don’t have any effective tools that will reliably and sensitively pick up when generative AI has been used. But I was sitting on a panel recently– this is this is a huge discussion within the field. And in fact, my colleague, who’s our deputy editor, made a joke the other day that– we’d had an agenda item on a meeting for generative AI, and she said, we always know that we need at least 45 minutes to discuss anything about generative AI.

Because editors, researchers, we’re very alive to this topic in thinking about the best way that we can use it all the time. But a colleague of mine on this panel was saying that, we’re in the business of text. We’re in the business of language. And now we have this amazing tool which can generate language. But there’s actually no part of our value chain that might not be disrupted by this innovation. Peer review is done, for the most part, through the written word. Articles are still written in a way that hasn’t changed for a very long time.

All of this is based on text, on language. And so there’s no part of our work that could not be disrupted by this new tool.

IRA FLATOW: Interesting. Now, when you say disruption– and I hear you speaking about this as being negative.

JESSAMY BAGENAL: No. I don’t think it’s negative.

IRA FLATOW: No, you don’t?

JESSAMY BAGENAL: I don’t think it’s just negative. But it has to be thoughtfully and sensitively implemented. And that’s challenging, because it’s a very rapidly moving field. And we’re all just getting up to speed with how people are using it and what appropriate use looks like. So for example, at The Lancet, we implemented a new tick box about six months ago where we ask authors at the submission stage, have you used generative AI in any part of this study? And if they tick, yes, then we ask them how.

And then in our editorial manager, we have a little sort of red A which appears to alert editors to the fact that generative AI has been used in some way in this manuscript. And then we’re able to follow external, widespread policies on how generative AI should be acknowledged in a manuscript. But these things are changing all the time. And I think there’s a huge opportunity for generative AI to be a great positive influence on scientific publishing.

But there are also dangers. And so it has to be very carefully thought about.

IRA FLATOW: I understand that. I have been reading research papers for decades, and I’m always struck about how poorly some of them are written. Not that the data is bad. And I’m thinking, hey, if you unleash AI on this, maybe you can get a better written narrative here going. Would that be some positive way it could be used?

JESSAMY BAGENAL: Definitely, that’s one positive. And I think from an inclusion and diversity point of view, we still transmit so much knowledge through the English language, which excludes an enormous amount of people because English is not their first language. We’re very lucky at The Lancet to have a very large team of internal assistant editors who make everything into Lancet-style. But that’s not the same across the scientific publishing field. In many journals, there isn’t that internal expertise.

And so actually, if you submit something which isn’t written particularly well, then of course, that will impact the likelihood of whether that editor decides to send it out or not. Because they might have problems understanding what the actual research is saying. So I think there’s an enormous opportunity there to make it more inclusive– make scientific more inclusive and fairer. And I think also when we’re thinking about people who might be neurodivergent.

I’ve got lots of clinician friends who are dyslexic, and actually being able to use large language models to help them structure sentences and how to structure an article is a very efficient way of them being able to articulate themselves and their ideas in what most people consider a sort of socially acceptable manner.

IRA FLATOW: Yeah, because most people think of AI as cheating. But you’re not talking about that here. And you’re pointing out the positive aspects. And when you talk to scientists who do use generative AI, what did they say about using, like, ChatGPT to help write their research papers?

JESSAMY BAGENAL: I think scientists and clinicians across the world are, for the most part, doing amazing work under incredibly stressful situations. And they’re often overloaded with work. Their to-do lists are extraordinarily long. And so having ChatGPT as an efficiency tool, which can allow them to put together an article very quickly, or might allow them to write a cover letter in a more compelling manner.

And from our point of view, from a scientific publishers point of view, generative AI that might be able to allow our submission process to be easier for authors. And for us, as editors, to be able to interact with them in a kind of more slick and easy fashion. That type of efficiency could have real benefits to patients, and to people’s lives, and to scientific progress.

IRA FLATOW: I know that since ChatGPT came out, the major journals have provided some guidelines and policies to researchers about the use of generative AI in papers. But it also seems like a pretty messy landscape right now. I mean, are these guidelines standard across all the journals and research papers?

JESSAMY BAGENAL: Well, we obviously have external bodies which bring together a number of different journals. So for instance, the ICJME, which is sort of bringing together lots of medical editors and journals. They have published guidance. And so any journals that sort of consign to them also tend to take some of their guidance on from for generative AI. And equally, organizations like COPE, which help with editorial guidance for journals, also have their own sort of set of guidelines.

So I think it’s right that there are these external benchmarking places which are releasing loose guidance. But obviously, each journal is different. Each journal has a different topic. They have different article types. They have different things that they’re trying to do with those journals. And let’s not forget that journals are actually very human endeavors. They are how we, as humans, interpret scientific progress. And how we put it into context. And what that means for either patients, or for science, and for that field.

And so I also think that it’s right that each journal should be very clear on how they want generative AI to be used. So for example, at The Lancet, we have a section which includes commentaries, correspondence, perspectives, art of medicine. And this is an area of our journal which really requires human interpretation. And so we’re in the process, at the moment, of thinking about the fact that we would like to limit the use of generative AI for this section.

Because we feel passionately that human ingenuity, and putting things into context, and being able to see what’s new– not just trawling through what’s on the internet and putting together what sounds good about a particular topic, but actually expertise, experience, and vision. We are thinking about limiting the use of generative AI in that section to just using it for English grammar and spelling. So that we’re not excluding people who don’t speak English.

IRA FLATOW: Right. Interesting point. Of course, the elephant in the room here are paper mills. And I’m not talking about factories that make paper. Can you explain what those are, and why they’re such a big issue?

JESSAMY BAGENAL: Yeah. So paper mills are sort of nefarious organizations that essentially have understood the scientific publishing landscape and are gaming it, and selling authorship for manuscripts that often are filled with nonsense. And so when you– may all have heard of mass retractions from different publishers having to retract articles that essentially were not based in any scientific fact, and were not really science, but often complete nonsense. And they’re a huge problem.

They’re a problem for publishers. But in the wider context, they’re a problem for science because this really breaks down the trust.

IRA FLATOW: We’re talking about phony papers here, right?

JESSAMY BAGENAL: We’re talking about phony papers. So it could literally be a manuscript about complete nonsense where the results are fabricated, the context is fabricated. And authorship is sold to academics for these publishers. And so they’ve kind of got into the editorial process by perhaps having guest editors. They’ve manipulated the peer review process. And they’re an enormous problem for the scientific publishing world.

And so there’s a real question there as to, in the context of generative AI, how as editors do we make sure that what we’re reading is real?

IRA FLATOW: Yeah. And this may sound like a crazy question, but why not use– if they’re writing it in ChatGPT or AI, why not use ChatGPT or AI to find them to weed out some of these papers?

JESSAMY BAGENAL: That’s exactly right. This is a big data problem. And I think Elsevier, which is the company that owns The Lancet, is putting in an enormous amount of resources and effort into thinking about research misconduct and research integrity in the context of big data. How can we use some of these patterns across many, many different papers to be able to pick out what’s real and what’s not real?

But in the larger context, in some ways, generative AI will sort of turbocharge that. Because you’re able to very quickly put together a nonsense manuscript that looks and sounds like it should be published, but actually might be about nothing. On the other hand, paper mill business model is based on people paying for authorship. And actually, if people at home on their own can put together this type of paper, why would they pay a paper mill to do it for them? They might just do it themselves.

So I think this is a huge problem. And one that I know a lot of people are thinking about very seriously.

IRA FLATOW: This is science Friday from WNYC Studios. And some of the potential solutions here, can you offer any?

JESSAMY BAGENAL: I mean, I think they lie in sort of big and small changes. And for the most part, they’re probably going to be pretty costly and difficult. I think a major step is to recognize that over the past decade, two decades, we’ve had a major trend towards the open science movement. Which nobody can disagree with from an ethical or moral way. We all want science to be accessible and available to everybody.

But in reality, what that’s meant from a scientific publishing business model aspect, it’s meant that authors pay to get their articles published open access. And so there has been a focus on quantity over quality. And I think that that’s rapidly adjusting and changing. And many scientific publishers are changing their way that they’re thinking about that. So we need some better business models. We need some other ways of thinking about open access in the context of generative AI.

And then I think another major step is thinking about the environment within which we all work. There is a serious problem with academic environments which often reward the quantity that an academic has published over the quality that they’ve published.

IRA FLATOW: Publish or perish.

JESSAMY BAGENAL: Yeah, exactly. Publish or perish. So there’s an incentive there to publish, publish, publish, regardless of whether it might constitute a bit of research waste in terms of, does this question really need to be answered? Has it already been answered? But then on the other end of the scale, there is an incentive there to try and get things published which aren’t necessarily adding to human health or to scientific progress.

IRA FLATOW: Dr. Bagenal, you write about the steps necessary to take, and where you think this might be going. How well is this progressing? Or how well are we getting toward the goals that you talk about?

JESSAMY BAGENAL: I think when there’s been any huge innovation in technology, there’s always a bit of a policy gap between people trying to catch up with what’s happened and create policies and ways of working which will adapt to this huge new innovation. And that’s certainly what we’re seeing now. It’s been a couple of years since ChatGPT. And only really now, I think, are journals and editors really getting up to speed with the types of things that we might need.

But also, because large language models are incredible tools with the ability to improve all the time– and we’ve seen that with the versions that have already come out. Each time, there’s an improvement in how they are performing– we need to be very flexible and adaptable to change to those different things because we might we might start seeing more hallucinations. There was a very interesting paper in nature a couple of months ago about the fact that large language models are– when they come to the end of what’s already been published, what’s already on the internet, how do they get new data?

And what happens if you use synthetic data? And actually, for the most part, it looked like those models almost completely fell apart. They stopped being able to work. So there are lots of issues that are going to become clear over the coming months that we’ll need to be very alive to and be able to adapt to. But at the moment, I think we are– certainly at The Lancet, and I know many other journals– spending an awful lot of time thinking about this.

We are implementing practical, tangible policies which are meant to be able to improve the process for authors. But also to make the content that we publish very high quality and very useful and usable for our readers.

IRA FLATOW: So you have to keep up with it. As ChatGPT gets better, you have to–

JESSAMY BAGENAL: We have to get better. Exactly. Exactly. We must.

IRA FLATOW: Yeah, very interesting stuff. We’re going to keep track of all of this. Thank you very much for taking time to be with us today, Dr. Bagenal.

JESSAMY BAGENAL: No problem. It was lovely to chat to you.

IRA FLATOW: Dr. Jessamy Bagenal, senior executive editor at The Lancet, and adjunct professor at the University of North Carolina at Chapel Hill.

Copyright © 2024 Science Friday Initiative. All rights reserved. Science Friday transcripts are produced on a tight deadline by 3Play Media. Fidelity to the original aired/published audio or video file might vary, and text might be updated or amended in the future. For the authoritative record of Science Friday’s programming, please visit the original aired/published recording. For terms of use and more information, visit our policies pages at http://www.sciencefriday.com/about/policies/

Meet the Producers and Host

About D Peterschmidt

D Peterschmidt is a producer, host of the podcast Universe of Art, and composes music for Science Friday’s podcasts. Their D&D character is a clumsy bard named Chip Chap Chopman.

About Ira Flatow

Ira Flatow is the host and executive producer of Science FridayHis green thumb has revived many an office plant at death’s door.

Explore More