Audio and Speech Processing

Authors and titles for recent submissions, skipping first 13

Tue, 28 May 2024
Mon, 27 May 2024
Fri, 24 May 2024
Wed, 22 May 2024
Tue, 21 May 2024

[ total of 55 entries: 1-50 | 14-55 ]
[ showing 50 entries per page: fewer | more | all ]

Mon, 27 May 2024

[14] arXiv:2405.15093 [pdf, other]: Title: Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis

Authors: Hui Li, Hongyu Wang, Zhijin Chen, Bohan Sun, Bo Li

Comments: 5 pages,4 figures

Subjects: Audio and Speech Processing (eess.AS)
[15] arXiv:2405.15655 (cross-list from cs.SD) [pdf, other]: Title: HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System

Authors: Zhisheng Zhang, Pengyang Huang

Comments: Accepted by IJCNN 2024

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:2405.15338 (cross-list from cs.SD) [pdf, other]: Title: SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

Authors: Xinlei Niu, Jing Zhang, Christian Walder, Charles Patrick Martin

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2405.15216 (cross-list from cs.LG) [pdf, other]: Title: Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Authors: Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly

Comments: under review

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2405.15103 (cross-list from cs.SD) [pdf, other]: Title: The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation

Authors: Nick Collins

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19] arXiv:2405.15096 (cross-list from cs.SD) [pdf, other]: Title: Music Genre Classification: Training an AI model

Authors: Keoikantse Mogonediwa

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[20] arXiv:2405.15085 (cross-list from eess.SP) [pdf, other]: Title: Acoustical Features as Knee Health Biomarkers: A Critical Analysis

Authors: Christodoulos Kechris, Jerome Thevenot, Tomas Teijeiro, Vincent A. Stadelmann, Nicola A. Maffiuletti, David Atienza

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 24 May 2024

[21] arXiv:2405.13514 [pdf, other]: Title: Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Authors: Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe

Comments: Accepted to IEEE ICASSP 2024 workshop Hands-free Speech Communication and Microphone Arrays (HSCMA 2024)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[22] arXiv:2405.13344 [pdf, other]: Title: Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Authors: Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[23] arXiv:2405.13166 [pdf, other]: Title: FairLENS: Assessing Fairness in Law Enforcement Speech Recognition

Authors: Yicheng Wang, Mark Cusick, Mohamed Laila, Kate Puech, Zhengping Ji, Xia Hu, Michael Wilson, Noah Spitzer-Williams, Michael Wheeler, Yasser Ibrahim

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[24] arXiv:2405.12983 [pdf, other]: Title: Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer

Authors: Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, Radu Timofte

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[25] arXiv:2405.14679 (cross-list from cs.SD) [pdf, other]: Title: Leveraging Electric Guitar Tones and Effects to Improve Robustness in Guitar Tablature Transcription Modeling

Authors: Hegel Pedroza, Wallace Abreu, Ryan Corey, Iran Roman

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2405.14598 (cross-list from cs.CV) [pdf, other]: Title: Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2405.14489 (cross-list from cs.SD) [pdf, other]: Title: End-to-End User-Defined Keyword Spotting using Shifted Delta Coefficients

Authors: Kesavaraj V, Anuprabha M, Anil Kumar Vuppala

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[28] arXiv:2405.14290 (cross-list from cs.SD) [pdf, other]: Title: Frequency-Domain Sound Field from the Perspective of Band-Limited Functions

Authors: Takahiro Iwami, Akira Omoto

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2405.14161 (cross-list from cs.CL) [pdf, other]: Title: Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

Comments: 23 pages, Preprint

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30] arXiv:2405.13762 (cross-list from cs.CV) [pdf, other]: Title: A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2405.13661 (cross-list from cs.SD) [pdf, ps, other]: Title: Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

Authors: Hong Zhang, Jie Lin, Shengxuan Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32] arXiv:2405.13636 (cross-list from cs.SD) [pdf, other]: Title: Audio Mamba: Pretrained Audio State Space Model For Audio Tagging

Authors: Jiaju Lin, Haoxuan Hu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[33] arXiv:2405.13527 (cross-list from cs.SD) [pdf, other]: Title: End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding

Authors: Wei Zeng, Xian He, Ye Wang

Comments: 8 pages, 5 figures, accepted by IJCAI 2024 - AI, Arts & Creativity Track

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2405.13477 (cross-list from cs.HC) [pdf, other]: Title: A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction

Authors: Yue Li, Florian A. Kunneman, Koen V. Hindriks

Comments: 8 pages,16 figures, Under review by RoMan 2024 conference

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2405.13428 (cross-list from cs.SD) [pdf, other]: Title: Ambisonizer: Neural Upmixing as Spherical Harmonics Generation

Authors: Yongyi Zang, Yifan Wang, Minglun Lee

Comments: Under review

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2405.13379 (cross-list from cs.CL) [pdf, ps, other]: Title: You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish

Authors: Ronald Cumbal, Birger Moell, Jose Lopes, Olof Engwall

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37] arXiv:2405.13162 (cross-list from cs.SD) [pdf, ps, other]: Title: Non-autoregressive real-time Accent Conversion model with voice cloning

Authors: Vladimir Nechaev, Sergey Kosyakov

Comments: 8 pages, 6 figures, 3 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[38] arXiv:2405.13018 (cross-list from cs.CL) [pdf, other]: Title: Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings

Authors: Ahmed Adel Attia, Dorottya Demszky, Tolulope Ogunremi, Jing Liu, Carol Espy-Wilson

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Wed, 22 May 2024

[39] arXiv:2405.12609 [pdf, other]: Title: Mamba in Speech: Towards an Alternative to Self-Attention

Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:2405.12496 [pdf, other]: Title: A Survey of Integrating Wireless Technology into Active Noise Control

Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

Subjects: Audio and Speech Processing (eess.AS); Networking and Internet Architecture (cs.NI); Sound (cs.SD); Signal Processing (eess.SP)
[41] arXiv:2405.12957 (cross-list from cs.SD) [pdf, other]: Title: Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models

Authors: Rudolf Herdt, Louisa Kinzel, Johann Georg Maaß, Marvin Walther, Henning Fröhlich, Tim Schubert, Peter Maass, Christian Patrick Schaaf

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42] arXiv:2405.12899 (cross-list from math.FA) [pdf, other]: Title: On a time-frequency blurring operator with applications in data augmentation

Authors: Simon Halvdansson

Comments: 22 pages, 4 figures

Subjects: Functional Analysis (math.FA); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2405.12847 (cross-list from cs.IR) [pdf, other]: Title: A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

Authors: Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang

Journal-ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, 174-181. Milan, Italy, November 5-9, 2023

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2405.12774 (cross-list from cs.LG) [pdf, ps, other]: Title: Blind Separation of Vibration Sources using Deep Learning and Deconvolution

Authors: Igor Makienko, Michael Grebshtein, Eli Gildish

Comments: 20 pages, 13 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[45] arXiv:2405.12666 (cross-list from cs.SD) [pdf, other]: Title: SYMPLEX: Controllable Symbolic Music Generation using Simplex Diffusion with Vocabulary Priors

Authors: Nicolas Jonason, Luca Casini, Bob L.T. Sturm

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Tue, 21 May 2024

[46] arXiv:2405.11831 [pdf, other]: Title: SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Authors: Siavash Shams, Sukru Samet Dindar, Xilin Jiang, Nima Mesgarani

Comments: Code at this https URL

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[47] arXiv:2405.11792 [pdf, other]: Title: Source Localization by Multidimensional Steered Response Power Mapping with Sparse Bayesian Learning

Authors: Wei-Ting Lai, Lachlan Birnie, Xingyu Chen, Amy Bastine, Thushara D. Abhayapala, Prasanga N. Samarasinghe

Subjects: Audio and Speech Processing (eess.AS)
[48] arXiv:2405.11767 [pdf, other]: Title: Multi-speaker Text-to-speech Training with Speaker Anonymized Data

Authors: Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda

Comments: 5 pages. Submitted to Signal Processing Letters. Audio sample page: this https URL

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[49] arXiv:2405.11592 [pdf, other]: Title: Speech-dependent Data Augmentation for Own Voice Reconstruction with Hearable Microphones in Noisy Environments

Authors: Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

Comments: 19 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS)
[50] arXiv:2405.11413 [pdf, other]: Title: Exploring speech style spaces with language models: Emotional TTS without emotion labels

Authors: Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman

Comments: Accepted at Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[51] arXiv:2405.11093 [pdf, other]: Title: AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations

Authors: David Xu

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[52] arXiv:2405.11078 [pdf, ps, other]: Title: Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Authors: Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

Comments: Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Journal-ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 6665-6669

Subjects: Audio and Speech Processing (eess.AS)
[53] arXiv:2405.12221 (cross-list from cs.CV) [pdf, other]: Title: Images that Sound: Composing Images and Sounds on a Single Canvas

Authors: Ziyang Chen, Daniel Geng, Andrew Owens

Comments: Project site: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54] arXiv:2405.12031 (cross-list from cs.SD) [pdf, other]: Title: Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification

Authors: Nian Li, Jianguo Wei

Comments: 8 pages, 2 figures, 3 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2405.11554 (cross-list from cs.SD) [pdf, other]: Title: DAC-JAX: A JAX Implementation of the Descript Audio Codec

Authors: David Braun

Comments: 5 pages, 3 figures, 2 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 28 May 2024
Mon, 27 May 2024
Fri, 24 May 2024
Wed, 22 May 2024
Tue, 21 May 2024

[ total of 55 entries: 1-50 | 14-55 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 13

Mon, 27 May 2024

Fri, 24 May 2024

Wed, 22 May 2024

Tue, 21 May 2024