We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024, skipping first 50

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]
[51]  arXiv:2403.14246 [pdf, other]
Title: CATSE: A Context-Aware Framework for Causal Target Sound Extraction
Comments: Submitted to EUSIPCO 2024
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[52]  arXiv:2403.14268 [pdf, ps, other]
Title: Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[53]  arXiv:2403.14817 [pdf, other]
Title: Crowdsourced Multilingual Speech Intelligibility Testing
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[54]  arXiv:2403.15336 [pdf, other]
Title: Dialogue Understandability: Why are we streaming movies with subtitles?
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[55]  arXiv:2403.15442 [pdf, other]
Title: Advanced Artificial Intelligence Algorithms in Cochlear Implants: Review of Healthcare Strategies, Challenges, and Perspectives
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[56]  arXiv:2403.16610 [pdf, ps, other]
Title: Distributed collaborative anomalous sound detection by embedding sharing
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[57]  arXiv:2403.16973 [pdf, other]
Title: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Comments: Data, code, and model weights are available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[58]  arXiv:2403.17402 [pdf, other]
Title: Infrastructure-less Localization from Indoor Environmental Sounds Based on Spectral Decomposition and Spatial Likelihood Model
Comments: 6 pages, 6 figures, accepted to IEEE/SICE SII 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59]  arXiv:2403.17514 [pdf, other]
Title: Speaker Distance Estimation in Enclosures from Single-Channel Audio
Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[60]  arXiv:2403.17864 [pdf, other]
Title: Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification
Comments: Submitted to EUSIPCO 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[61]  arXiv:2403.18257 [pdf, other]
Title: Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation
Comments: work in progress
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62]  arXiv:2403.18560 [pdf, other]
Title: Noise-Robust Keyword Spotting through Self-supervised Pretraining
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[63]  arXiv:2403.18636 [pdf, other]
Title: A Diffusion-Based Generative Equalizer for Music Restoration
Comments: Submitted to DAFx24. Historical music restoration examples are available at: this http URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64]  arXiv:2403.18638 [pdf, other]
Title: Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection
Subjects: Audio and Speech Processing (eess.AS)
[65]  arXiv:2403.19207 [pdf, other]
Title: LV-CTC: Non-autoregressive ASR with CTC and latent variable models
Subjects: Audio and Speech Processing (eess.AS)
[66]  arXiv:2403.19217 [pdf, other]
Title: Blind Identification of Binaural Room Impulse Responses from Smart Glasses
Subjects: Audio and Speech Processing (eess.AS)
[67]  arXiv:2403.19709 [pdf, other]
Title: Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models
Comments: 5 pages, 3 figures, 5 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[68]  arXiv:2403.19971 [pdf, other]
Title: 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[69]  arXiv:2403.20090 [pdf, other]
Title: Non-Exponential Reverberation Modeling Using Dark Velvet Noise
Comments: Accepted for publication in the Journal of Audio Engineering Society
Subjects: Audio and Speech Processing (eess.AS)
[70]  arXiv:2403.20184 [pdf, other]
Title: Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context
Comments: Accepted at LREC-COLING 2024
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[71]  arXiv:2403.03762 (cross-list from eess.SP) [pdf, ps, other]
Title: Room Impulse Response Estimation using Optimal Transport: Simulation-Informed Inference
Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)
[72]  arXiv:2403.10329 (cross-list from eess.SP) [pdf, ps, other]
Title: Multi-Source Localization and Data Association for Time-Difference of Arrival Measurements
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)
[73]  arXiv:2403.00212 (cross-list from cs.CL) [pdf, other]
Title: Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2403.00274 (cross-list from cs.CV) [pdf, other]
Title: CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Comments: Accepted by CVPR 2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75]  arXiv:2403.00370 (cross-list from cs.CL) [pdf, other]
Title: Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help  (Access key information)