Speaker attribution with voice profiles by graph-based semi-supervised learning

Wang, Jixuan; Xiao, Xiong; Wu, Jian; Ramamurthy, Ranjani; Rudzicz, Frank; Brudno, Michael

doi:10.21437/Interspeech.2020-1950

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2102

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Speaker attribution with voice profiles by graph-based semi-supervised learning

Authors: Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno

(Submitted on 6 Feb 2021)

Abstract: Speaker attribution is required in many real-world applications, such as meeting transcription, where speaker identity is assigned to each utterance according to speaker voice profiles. In this paper, we propose to solve the speaker attribution problem by using graph-based semi-supervised learning methods. A graph of speech segments is built for each session, on which segments from voice profiles are represented by labeled nodes while segments from test utterances are unlabeled nodes. The weight of edges between nodes is evaluated by the similarities between the pretrained speaker embeddings of speech segments. Speaker attribution then becomes a semi-supervised learning problem on graphs, on which two graph-based methods are applied: label propagation (LP) and graph neural networks (GNNs). The proposed approaches are able to utilize the structural information of the graph to improve speaker attribution performance. Experimental results on real meeting data show that the graph based approaches reduce speaker attribution error by up to 68% compared to a baseline speaker identification approach that processes each utterance independently.

Comments:	Interspeech 2020
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
DOI:	10.21437/Interspeech.2020-1950
Cite as:	arXiv:2102.03634 [eess.AS]
	(or arXiv:2102.03634v1 [eess.AS] for this version)

Submission history

From: Jixuan Wang [view email]
[v1] Sat, 6 Feb 2021 18:35:56 GMT (225kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2102.03634

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Speaker attribution with voice profiles by graph-based semi-supervised learning

Submission history