Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Eliav, Amit; Gannot, Sharon

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2403

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Authors: Amit Eliav, Sharon Gannot

(Submitted on 11 Mar 2024)

Abstract: We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD) using a modified transformer model. Our model is designed to handle multi-microphone data but can also work in the single-microphone case. The method can classify audio segments into one of three classes: 1) no speech activity (noise only), 2) only a single speaker is active, and 3) more than one speaker is active. We incorporate a Cost-Sensitive (CS) loss and a confidence calibration to the training procedure. The approach is evaluated using three real-world databases: AMI, AliMeeting, and CHiME 5, demonstrating an improvement over existing approaches.

Comments:	5 pages, 6 tables, 2 figures
Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2403.06856 [eess.AS]
	(or arXiv:2403.06856v1 [eess.AS] for this version)

Submission history

From: Amit Eliav [view email]
[v1] Mon, 11 Mar 2024 16:12:08 GMT (725kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2403.06856

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Submission history