How to Identify Generative AI Text: Key Words to Watch For

### Detecting AI-Generated Scientific Writing

#### Introduction
AI companies ⁤have struggled to ⁣create tools that can reliably identify AI-generated text. However, researchers have developed a new method to estimate the use of large language models (LLMs) in scientific writing by analyzing the frequency of certain “excess words” that became more‌ common during the LLM era (2023 and ⁢2024). According to⁣ their findings, at least 10 percent of 2024 abstracts were processed with ⁣LLMs.

####⁢ Research Inspiration
Researchers ⁤from Germany’s University of Tübingen⁣ and Northwestern University were inspired by studies⁢ that measured the⁤ impact of the Covid-19 pandemic by looking at ⁢excess deaths. They applied a ⁣similar approach to “excess word usage” after LLM writing tools became widely available in late 2022. Their study found that the appearance ‍of LLMs led to a significant increase in the frequency of⁣ certain style words, which was unprecedented ⁢in both quality and quantity.

#### Methodology

##### Data Collection
The researchers analyzed 14 million paper abstracts published on [PubMed](https://pubmed.ncbi.nlm.nih.gov/) between 2010 ‍and 2024. ⁣They tracked the relative frequency of each word as‍ it appeared each year‌ and compared the expected frequency (based on pre-2023 trends) to⁣ the actual frequency ⁢in 2023 and 2024.

##### Findings
The study found that certain words, which were rare ⁢before 2023, surged in popularity⁣ after LLMs were introduced.⁤ For example, the⁢ word “delves” appeared 25 times more ⁤frequently in 2024 papers than expected. Words like ‍”showcasing” and “underscores” increased ninefold.⁢ Other common words‍ also saw notable⁤ increases: “potential” by ⁤4.1‍ percentage ⁣points, “findings” by 2.7 percentage points, and “crucial” by 2.6 ‍percentage points.

#### Natural Language ‍Evolution vs. LLM Influence
While language ‍naturally ⁣evolves, ⁢the researchers noted⁤ that such ⁢massive ‌and sudden increases in‍ word usage were previously only seen for⁢ words related to major world health⁢ events, like “ebola” in 2015 and ‍”coronavirus” during the Covid-19 pandemic. In the post-LLM period, hundreds of ⁤words ‍saw ‌sudden increases in scientific usage without⁤ any common link to world events. These words were mostly “style words” like verbs, adjectives, and adverbs.

#### Previous Findings and New Insights
The ‌increased prevalence of words like “delve” in scientific‌ papers has been noted before. However, previous studies⁤ relied on comparisons with human writing samples‍ or predefined LLM markers. In this study, the pre-2023 ‌abstracts served as an effective control group to show how vocabulary choice has changed in the post-LLM era.

### Identifying‍ LLM⁢ Usage

#### Marker Words
By highlighting ⁤hundreds of “marker words” that became⁤ more ⁣common in‌ the post-LLM era, the researchers ⁢could identify telltale signs of LLM use. For example, an abstract⁤ line with ⁣marker words ‌highlighted: ‍

“A comprehensive grasp of the intricate interplay between […] and […] is‍ pivotal for effective ‍therapeutic strategies.”

#### Statistical Measures
After⁢ statistical analysis of marker word appearances across individual ‍papers, the researchers estimate that at least ‌10 percent⁤ of post-2022 papers in the PubMed corpus were written with some LLM assistance. The actual number could be higher, as their set might miss LLM-assisted abstracts⁤ that don’t include⁢ any identified marker words.

### Conclusion
This study provides a novel method for ⁢detecting LLM usage in ⁢scientific writing by analyzing changes in word frequency.⁢ The findings suggest a significant impact of LLMs on scientific ⁣vocabulary, offering a new ⁤perspective⁣ on the⁤ influence of AI in academic research.

View 6 Comments

6 Comments

WordWanderer on July 7, 2024 5:37 am

Why bother? Human creativity is irreplaceable!

zircon on July 7, 2024 5:37 am

Finding generative AI text is the new age talent show, isn’t it?

sablei on July 7, 2024 5:38 am

Do we really need a guide for this.

ReflectionRover on July 7, 2024 5:38 am

So, are we now detectives for AI prose?

sablei on July 7, 2024 5:38 am

Just what we need, more paranoia in the digital age!

Krugler on July 7, 2024 5:38 am

Krugler: It’s like playing detective, but for words.

Subscribe to Updates

What's Hot

How to Identify Generative AI Text: Key Words to Watch For

Related Posts

6 Comments

Subscribe to Updates