EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

Mun, Sung Hwan; Han, Min Hyun; Moon, Canyeong; Kim, Nam Soo

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2312

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

Authors: Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

(Submitted on 11 Dec 2023)

Abstract: In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel framework utilizing demultiplexed speaker embeddings. In this work, we focus on disentangling speaker-relevant information in the latent space and then transform each separated latent variable into its corresponding speech activity. EEND-DEMUX can directly obtain separated speaker embeddings through the demultiplexing operation in the inference phase without an external speaker diarization system, an embedding extractor, or a heuristic decoding technique. Furthermore, we employ a multi-head cross-attention mechanism to capture the correlation between mixture and separated speaker embeddings effectively. We formulate three loss functions based on matching, orthogonality, and sparsity constraints to learn robust demultiplexed speaker embeddings. The experimental results on the LibriMix dataset show consistently improved performance in both a fixed and flexible number of speakers scenarios.

Comments:	Submitted to IEEE Signal Processing Letters
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2312.06065 [eess.AS]
	(or arXiv:2312.06065v1 [eess.AS] for this version)

Submission history

From: Sung Hwan Mun [view email]
[v1] Mon, 11 Dec 2023 02:14:55 GMT (3089kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2312.06065

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

Submission history