Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Glarner, Thomas; Ebbers, Janek; Häb-Umbach, Reinhold

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2105

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Authors: Thomas Glarner, Janek Ebbers, Reinhold Häb-Umbach

(Submitted on 4 May 2021)

Abstract: Discovering speaker independent acoustic units purely from spoken input is known to be a hard problem. In this work we propose an unsupervised speaker normalization technique prior to unit discovery. It is based on separating speaker related from content induced variations in a speech signal with an adversarial contrastive predictive coding approach. This technique does neither require transcribed speech nor speaker labels, and, furthermore, can be trained in a multilingual fashion, thus achieving speaker normalization even if only few unlabeled data is available from the target language. The speaker normalization is done by mapping all utterances to a medoid style which is representative for the whole database. We demonstrate the effectiveness of the approach by conducting acoustic unit discovery with a hidden Markov model variational autoencoder noting, however, that the proposed speaker normalization can serve as a front end to any unit discovery system. Experiments on English, Yoruba and Mboshi show improvements compared to using non-normalized input.

Comments:	Submitted to Interspeech 2021
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2105.01786 [eess.AS]
	(or arXiv:2105.01786v1 [eess.AS] for this version)

Submission history

From: Thomas Glarner [view email]
[v1] Tue, 4 May 2021 22:40:41 GMT (589kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2105.01786

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Submission history