We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Digital Libraries

Title: ChatGPT "contamination": estimating the prevalence of LLMs in the scholarly literature

Authors: Andrew Gray
Abstract: The use of ChatGPT and similar Large Language Model (LLM) tools in scholarly communication and academic publishing has been widely discussed since they became easily accessible to a general audience in late 2022. This study uses keywords known to be disproportionately present in LLM-generated text to provide an overall estimate for the prevalence of LLM-assisted writing in the scholarly literature. For the publishing year 2023, it is found that several of those keywords show a distinctive and disproportionate increase in their prevalence, individually and in combination. It is estimated that at least 60,000 papers (slightly over 1% of all articles) were LLM-assisted, though this number could be extended and refined by analysis of other characteristics of the papers or by identification of further indicative keywords.
Comments: 12 pages, 6 figures
Subjects: Digital Libraries (cs.DL)
Cite as: arXiv:2403.16887 [cs.DL]
  (or arXiv:2403.16887v1 [cs.DL] for this version)

Submission history

From: Andrew Gray [view email]
[v1] Mon, 25 Mar 2024 15:56:37 GMT (228kb)

Link back to: arXiv, form interface, contact.