Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Layton, Seth; De Andrade, Thiago; Olszewski, Daniel; Warren, Kevin; Butler, Kevin; Traynor, Patrick

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2404

Computer Science > Sound

Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Authors: Seth Layton, Thiago De Andrade, Daniel Olszewski, Kevin Warren, Kevin Butler, Patrick Traynor

(Submitted on 23 Apr 2024 (v1), last revised 26 Apr 2024 (this version, v2))

Abstract: Deepfake speech represents a real and growing threat to systems and society. Many detectors have been created to aid in defense against speech deepfakes. While these detectors implement myriad methodologies, many rely on low-level fragments of the speech generation process. We hypothesize that breath, a higher-level part of speech, is a key component of natural speech and thus improper generation in deepfake speech is a performant discriminator. To evaluate this, we create a breath detector and leverage this against a custom dataset of online news article audio to discriminate between real/deepfake speech. Additionally, we make this custom dataset publicly available to facilitate comparison for future work. Applying our simple breath detector as a deepfake speech discriminator on in-the-wild samples allows for accurate classification (perfect 1.0 AUPRC and 0.0 EER on test data) across 33.6 hours of audio. We compare our model with the state-of-the-art SSL-wav2vec model and show that this complex deep learning model completely fails to classify the same in-the-wild samples (0.72 AUPRC and 0.99 EER).

Comments:	Submitted to ACM journal -- Digital Threats: Research and Practice
Subjects:	Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2404.15143 [cs.SD]
	(or arXiv:2404.15143v2 [cs.SD] for this version)

Submission history

From: Seth Layton [view email]
[v1] Tue, 23 Apr 2024 15:48:51 GMT (771kb,D)
[v2] Fri, 26 Apr 2024 21:14:24 GMT (771kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.15143

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Every Breath You Don't Take: Deepfake Speech Detection Using Breath

Submission history