Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Li, Dongyuan; Wang, Yusong; Funakoshi, Kotaro; Okumura, Manabu

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2310

Computer Science > Sound

Title: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Authors: Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

(Submitted on 30 Sep 2023)

Abstract: Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in real-world scenes with large-scale noisy data. To address these issues, we propose an active learning (AL) based Fine-Tuning framework for SER that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training and the downstream task. Then, AL methods are used to iteratively select a subset of the most informative and diverse samples for fine-tuning, reducing time consumption. Experiments demonstrate that using only 20\%pt. samples improves 8.45\%pt. accuracy and reduces 79\%pt. time consumption.

Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2310.00283 [cs.SD]
	(or arXiv:2310.00283v1 [cs.SD] for this version)

Submission history

From: Dongyuan Li [view email]
[v1] Sat, 30 Sep 2023 07:23:29 GMT (2934kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2310.00283

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Submission history