EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Chen, Yu-Wen; Hung, Kuo-Hsuan; Chuang, Shang-Yi; Sherman, Jonathan; Huang, Wen-Chin; Lu, Xugang; Tsao, Yu

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2102

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Authors: Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao

(Submitted on 7 Feb 2021 (v1), last revised 9 Jun 2021 (this version, v2))

Abstract: Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to-speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-training, incorporating spectrogram, mel-spectrogram, and deep features. The experimental results confirm that the multimodal approach of EMA2S outperforms the baseline system in terms of both objective evaluation and subjective evaluation metrics. Moreover, results demonstrate that joint mel-spectrogram and deep feature loss training can effectively improve system performance.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:2102.03786 [eess.AS]
	(or arXiv:2102.03786v2 [eess.AS] for this version)

Submission history

From: Yu-Wen Chen [view email]
[v1] Sun, 7 Feb 2021 12:14:14 GMT (1397kb,D)
[v2] Wed, 9 Jun 2021 05:40:18 GMT (1395kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2102.03786

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Submission history