We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Deep functional multiple index models with an application to SER

Abstract: Speech Emotion Recognition (SER) plays a crucial role in advancing human-computer interaction and speech processing capabilities. We introduce a novel deep-learning architecture designed specifically for the functional data model known as the multiple-index functional model. Our key innovation lies in integrating adaptive basis layers and an automated data transformation search within the deep learning framework. Simulations for this new model show good performances. This allows us to extract features tailored for chunk-level SER, based on Mel Frequency Cepstral Coefficients (MFCCs). We demonstrate the effectiveness of our approach on the benchmark IEMOCAP database, achieving good performance compared to existing methods.
Comments: 5 pages, 1 figure
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applications (stat.AP)
Cite as: arXiv:2403.17562 [cs.SD]
  (or arXiv:2403.17562v1 [cs.SD] for this version)

Submission history

From: Matthieu Saumard [view email]
[v1] Tue, 26 Mar 2024 10:10:56 GMT (176kb,D)

Link back to: arXiv, form interface, contact.