Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024, skipping first 50

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

[51] arXiv:2403.14246 [pdf, other]: Title: CATSE: A Context-Aware Framework for Causal Target Sound Extraction

Authors: Shrishail Baligar, Mikolaj Kegler, Bryce Irvin, Marko Stamenovic, Shawn Newsam

Comments: Submitted to EUSIPCO 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[52] arXiv:2403.14268 [pdf, ps, other]: Title: Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

Authors: PeiYing Lee, HauYun Guo, Berlin Chen

Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[53] arXiv:2403.14817 [pdf, other]: Title: Crowdsourced Multilingual Speech Intelligibility Testing

Authors: Laura Lechler, Kamil Wojcicki

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[54] arXiv:2403.15336 [pdf, other]: Title: Dialogue Understandability: Why are we streaming movies with subtitles?

Authors: Helard Becerra Martinez, Alessandro Ragano, Diptasree Debnath, Asad Ullah, Crisron Rudolf Lucas, Martin Walsh, Andrew Hines

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[55] arXiv:2403.15442 [pdf, other]: Title: Advanced Artificial Intelligence Algorithms in Cochlear Implants: Review of Healthcare Strategies, Challenges, and Perspectives

Authors: Billel Essaid, Hamza Kheddar, Noureddine Batel, Abderrahmane Lakas, Muhammad E.H.Chowdhury

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[56] arXiv:2403.16610 [pdf, ps, other]: Title: Distributed collaborative anomalous sound detection by embedding sharing

Authors: Kota Dohi, Yohei Kawaguchi

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[57] arXiv:2403.16973 [pdf, other]: Title: VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Authors: Puyuan Peng, Po-Yao Huang, Abdelrahman Mohamed, David Harwath

Comments: Data, code, and model weights are available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[58] arXiv:2403.17402 [pdf, other]: Title: Infrastructure-less Localization from Indoor Environmental Sounds Based on Spectral Decomposition and Spatial Likelihood Model

Authors: Satoki Ogiso, Yoshiaki Bando, Takeshi Kurata, Takashi Okuma

Comments: 6 pages, 6 figures, accepted to IEEE/SICE SII 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59] arXiv:2403.17514 [pdf, other]: Title: Speaker Distance Estimation in Enclosures from Single-Channel Audio

Authors: Michael Neri, Archontis Politis, Daniel Krause, Marco Carli, Tuomas Virtanen

Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[60] arXiv:2403.17864 [pdf, other]: Title: Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification

Authors: Francesca Ronchini, Luca Comanducci, Fabio Antonacci

Comments: Submitted to EUSIPCO 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[61] arXiv:2403.18257 [pdf, other]: Title: Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

Authors: Xilin Jiang, Cong Han, Nima Mesgarani

Comments: work in progress

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62] arXiv:2403.18560 [pdf, other]: Title: Noise-Robust Keyword Spotting through Self-supervised Pretraining

Authors: Jacob Mørk, Holger Severin Bovbjerg, Gergely Kiss, Zheng-Hua Tan

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[63] arXiv:2403.18636 [pdf, other]: Title: A Diffusion-Based Generative Equalizer for Music Restoration

Authors: Eloi Moliner, Maija Turunen, Filip Elvander, Vesa Välimäki

Comments: Submitted to DAFx24. Historical music restoration examples are available at: this http URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64] arXiv:2403.18638 [pdf, other]: Title: Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection

Authors: Jinhua Liang, Ines Nolasco, Burooj Ghani, Huy Phan, Emmanouil Benetos, Dan Stowell

Subjects: Audio and Speech Processing (eess.AS)
[65] arXiv:2403.19207 [pdf, other]: Title: LV-CTC: Non-autoregressive ASR with CTC and latent variable models

Authors: Yuya Fujita, Shinji Watanabe, Xuankai Chang, Takashi Maekaku

Subjects: Audio and Speech Processing (eess.AS)
[66] arXiv:2403.19217 [pdf, other]: Title: Blind Identification of Binaural Room Impulse Responses from Smart Glasses

Authors: Thomas Deppisch, Nils Meyer-Kahlen, Sebastià V. Amengual Garí

Subjects: Audio and Speech Processing (eess.AS)
[67] arXiv:2403.19709 [pdf, other]: Title: Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

Authors: Tsendsuren Munkhdalai, Youzheng Chen, Khe Chai Sim, Fadi Biadsy, Tara Sainath, Pedro Moreno Mengibar

Comments: 5 pages, 3 figures, 5 tables

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[68] arXiv:2403.19971 [pdf, other]: Title: 3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[69] arXiv:2403.20090 [pdf, other]: Title: Non-Exponential Reverberation Modeling Using Dark Velvet Noise

Authors: Jon Fagerström, Sebastian J. Schlecht, Vesa Välimäki

Comments: Accepted for publication in the Journal of Audio Engineering Society

Subjects: Audio and Speech Processing (eess.AS)
[70] arXiv:2403.20184 [pdf, other]: Title: Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context

Authors: Tuan Nguyen, Corinne Fredouille, Alain Ghio, Mathieu Balaguer, Virginie Woisard

Comments: Accepted at LREC-COLING 2024

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[71] arXiv:2403.03762 (cross-list from eess.SP) [pdf, ps, other]: Title: Room Impulse Response Estimation using Optimal Transport: Simulation-Informed Inference

Authors: David Sundström, Anton Björkman, Andreas Jakobsson, Filip Elvander

Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)
[72] arXiv:2403.10329 (cross-list from eess.SP) [pdf, ps, other]: Title: Multi-Source Localization and Data Association for Time-Difference of Arrival Measurements

Authors: Gabrielle Flood, Filip Elvander

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)
[73] arXiv:2403.00212 (cross-list from cs.CL) [pdf, other]: Title: Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

Authors: Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2403.00274 (cross-list from cs.CV) [pdf, other]: Title: CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Authors: Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

Comments: Accepted by CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:2403.00370 (cross-list from cs.CL) [pdf, other]: Title: Post-decoder Biasing for End-to-End Speech Recognition of Multi-turn Medical Interview

Authors: Heyang Liu, Yu Wang, Yanfeng Wang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024, skipping first 50