Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Sundararaman, Mukuntha Narayanan; Kumar, Ayush; Vepa, Jithendra

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2102

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Authors: Mukuntha Narayanan Sundararaman, Ayush Kumar, Jithendra Vepa

(Submitted on 1 Feb 2021 (v1), last revised 15 Jun 2021 (this version, v2))

Abstract: Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where substitution and deletion errors are prevalent in the transcribed text. These errors significantly degrade the performance of downstream tasks. In this work, we propose a BERT-style language model, referred to as PhonemeBERT, that learns a joint language model with phoneme sequence and ASR transcript to learn phonetic-aware representations that are robust to ASR errors. We show that PhonemeBERT can be used on downstream tasks using phoneme sequences as additional features, and also in low-resource setup where we only have ASR-transcripts for the downstream tasks with no phoneme information available. We evaluate our approach extensively by generating noisy data for three benchmark datasets - Stanford Sentiment Treebank, TREC and ATIS for sentiment, question and intent classification tasks respectively. The results of the proposed approach beats the state-of-the-art baselines comprehensively on each dataset.

Comments:	Accepted to Interspeech 2021 conference
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
Cite as:	arXiv:2102.00804 [eess.AS]
	(or arXiv:2102.00804v2 [eess.AS] for this version)

Submission history

From: Ayush Kumar [view email]
[v1] Mon, 1 Feb 2021 12:45:15 GMT (431kb,D)
[v2] Tue, 15 Jun 2021 18:19:02 GMT (1380kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2102.00804

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Submission history