We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 26

[ total of 108 entries: 1-25 | 2-26 | 27-51 | 52-76 | 77-101 | 102-108 ]
[ showing 25 entries per page: fewer | more | all ]

Thu, 6 Jun 2024 (continued, showing 25 of 41 entries)

[27]  arXiv:2406.02925 [pdf, other]
Title: SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[28]  arXiv:2406.02887 [pdf, other]
Title: USM RNN-T model weights binarization
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29]  arXiv:2406.02859 [pdf, ps, other]
Title: ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
Comments: Accepted by Interspeech 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30]  arXiv:2406.02652 [pdf, other]
Title: RepCNN: Micro-sized, Mighty Models for Wakeword Detection
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[31]  arXiv:2406.02649 [pdf, other]
Title: Keyword-Guided Adaptation of Automatic Speech Recognition
Comments: Accepted to InterSpeech 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32]  arXiv:2406.02608 [pdf, other]
Title: PPINtonus: Early Detection of Parkinson's Disease Using Deep-Learning Tonal Analysis
Authors: Varun Reddy
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[33]  arXiv:2406.02572 [pdf, other]
Title: Selfsupervised learning for pathological speech detection
Comments: in Intersection of Book Chapter in Machine Leanring and Computational Social Sciences CRC (in progress) 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[34]  arXiv:2406.02569 [pdf, other]
Title: Cluster-to-Predict Affect Contours from Speech
Comments: 8 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC)
[35]  arXiv:2406.02566 [pdf, other]
Title: Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[36]  arXiv:2406.02563 [pdf, other]
Title: A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[37]  arXiv:2406.02562 [pdf, other]
Title: Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices
Comments: Table 2 is revised
Journal-ref: ICASSP 2024 Workshop(HSCMA 2024) paper
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[38]  arXiv:2406.02561 [pdf, ps, other]
Title: Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm
Comments:
Subjects: Audio and Speech Processing (eess.AS)
[39]  arXiv:2406.02560 [pdf, other]
Title: Less Peaky and More Accurate CTC Forced Alignment by Label Priors
Comments: Accepted by ICASSP 2024. Github repo: this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[40]  arXiv:2406.02555 [pdf, ps, other]
Title: PhoWhisper: Automatic Speech Recognition for Vietnamese
Comments: Accepted to ICLR 2024 Tiny Papers Track
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[41]  arXiv:2406.02554 [pdf, other]
Title: Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[42]  arXiv:2406.03407 (cross-list from cs.LG) [pdf, other]
Title: Physics and geometry informed neural operator network with application to acoustic scattering
Comments: 20 pages of main text, 9 figures
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Computational Physics (physics.comp-ph)
[43]  arXiv:2406.03344 (cross-list from cs.SD) [pdf, other]
Title: Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Comments: Code is available at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[44]  arXiv:2406.03251 (cross-list from cs.SD) [pdf, other]
Title: ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
Comments: 5 pages, 2 figures, 2 tables, accepted at Interspeech 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[45]  arXiv:2406.03247 (cross-list from cs.SD) [pdf, other]
Title: Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
Comments: Accepted by INTERSPEECH 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[46]  arXiv:2406.03240 (cross-list from cs.SD) [pdf, other]
Title: Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion strategy
Comments: Accepted by INTERSPEECH 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[47]  arXiv:2406.03237 (cross-list from cs.SD) [pdf, other]
Title: Generalized Fake Audio Detection via Deep Stable Learning
Comments: accepted by INTERSPEECH2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48]  arXiv:2406.03138 (cross-list from cs.SD) [pdf, other]
Title: A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection
Comments: 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49]  arXiv:2406.03049 (cross-list from cs.CL) [pdf, other]
Title: StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
Comments: Accepted to ACL 2024 main conference, Project Page: this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2406.02963 (cross-list from cs.SD) [pdf, other]
Title: Dataset-Distillation Generative Model for Speech Emotion Recognition
Comments: Accepted at Interspeech 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51]  arXiv:2406.02951 (cross-list from cs.CV) [pdf, other]
Title: AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Comments: Accepted to CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 108 entries: 1-25 | 2-26 | 27-51 | 52-76 | 77-101 | 102-108 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2406, contact, help  (Access key information)