We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Abstract: We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD) using a modified transformer model. Our model is designed to handle multi-microphone data but can also work in the single-microphone case. The method can classify audio segments into one of three classes: 1) no speech activity (noise only), 2) only a single speaker is active, and 3) more than one speaker is active. We incorporate a Cost-Sensitive (CS) loss and a confidence calibration to the training procedure. The approach is evaluated using three real-world databases: AMI, AliMeeting, and CHiME 5, demonstrating an improvement over existing approaches.
Comments: 5 pages, 6 tables, 2 figures
Subjects: Audio and Speech Processing (eess.AS)
Cite as: arXiv:2403.06856 [eess.AS]
  (or arXiv:2403.06856v1 [eess.AS] for this version)

Submission history

From: Amit Eliav [view email]
[v1] Mon, 11 Mar 2024 16:12:08 GMT (725kb,D)

Link back to: arXiv, form interface, contact.