Audio and Speech Processing

Authors and titles for recent submissions, skipping first 9

[ total of 53 entries: 1-25 | 10-34 | 35-53 ]
[ showing 25 entries per page: fewer | more | all ]

Tue, 28 May 2024 (continued, showing last 12 of 13 entries)

[10] arXiv:2405.16952 [pdf, other]: Title: A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

Subjects: Audio and Speech Processing (eess.AS)
[11] arXiv:2405.16834 [pdf, other]: Title: Speech enhancement deep-learning architecture for efficient edge processing

Authors: Monisankha Pal, Arvind Ramanathan, Ted Wada, Ashutosh Pandey

Subjects: Audio and Speech Processing (eess.AS)
[12] arXiv:2405.16677 [pdf, other]: Title: Crossmodal ASR Error Correction with Discrete Speech Units

Authors: Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[13] arXiv:2405.17413 (cross-list from cs.SD) [pdf, ps, other]: Title: Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization

Authors: Navin Kamuni, Dheerendra Panwar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[14] arXiv:2405.17100 (cross-list from cs.CR) [pdf, other]: Title: Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, Jing Chen, Kun He, Ruiying Du

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2405.17028 (cross-list from cs.SD) [pdf, other]: Title: RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2405.16797 (cross-list from cs.SD) [pdf, ps, other]: Title: A Real-Time Voice Activity Detection Based On Lightweight Neural

Authors: Jidong Jia, Pei Zhao, Di Wang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[17] arXiv:2405.16687 (cross-list from cs.SD) [pdf, other]: Title: Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

Authors: Xavier Riley, Simon Dixon

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2405.16136 (cross-list from cs.AI) [pdf, other]: Title: C3LLM: Conditional Multimodal Content Generation Using Large Language Models

Authors: Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2405.16000 (cross-list from cs.SD) [pdf, other]: Title: Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Authors: Sanjay Natesan, Homayoon Beigi

Comments: 7 pages, 2 tables, 3 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2024), RTI-20240524-01

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20] arXiv:2405.15923 (cross-list from eess.SP) [pdf, ps, other]: Title: Spiketrum: An FPGA-based Implementation of a Neuromorphic Cochlea

Authors: MHD Anas Alsakkal, Jayawan Wijekoon

Comments: Submitted to "IEEE Transactions on Circuits and Systems"

Subjects: Signal Processing (eess.SP); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2405.15863 (cross-list from cs.SD) [pdf, other]: Title: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Mon, 27 May 2024

[22] arXiv:2405.15093 [pdf, other]: Title: Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis

Authors: Hui Li, Hongyu Wang, Zhijin Chen, Bohan Sun, Bo Li

Comments: 5 pages,4 figures

Subjects: Audio and Speech Processing (eess.AS)
[23] arXiv:2405.15655 (cross-list from cs.SD) [pdf, other]: Title: HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System

Authors: Zhisheng Zhang, Pengyang Huang

Comments: Accepted by IJCNN 2024

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2405.15338 (cross-list from cs.SD) [pdf, other]: Title: SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

Authors: Xinlei Niu, Jing Zhang, Christian Walder, Charles Patrick Martin

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2405.15216 (cross-list from cs.LG) [pdf, other]: Title: Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Authors: Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly

Comments: under review

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2405.15103 (cross-list from cs.SD) [pdf, other]: Title: The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation

Authors: Nick Collins

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:2405.15096 (cross-list from cs.SD) [pdf, other]: Title: Music Genre Classification: Training an AI model

Authors: Keoikantse Mogonediwa

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[28] arXiv:2405.15085 (cross-list from eess.SP) [pdf, other]: Title: Acoustical Features as Knee Health Biomarkers: A Critical Analysis

Authors: Christodoulos Kechris, Jerome Thevenot, Tomas Teijeiro, Vincent A. Stadelmann, Nicola A. Maffiuletti, David Atienza

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 24 May 2024 (showing first 6 of 18 entries)

[29] arXiv:2405.13514 [pdf, other]: Title: Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Authors: Muhammad Shakeel, Yui Sudo, Yifan Peng, Shinji Watanabe

Comments: Accepted to IEEE ICASSP 2024 workshop Hands-free Speech Communication and Microphone Arrays (HSCMA 2024)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[30] arXiv:2405.13344 [pdf, other]: Title: Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Authors: Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Shinji Watanabe

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[31] arXiv:2405.13166 [pdf, other]: Title: FairLENS: Assessing Fairness in Law Enforcement Speech Recognition

Authors: Yicheng Wang, Mark Cusick, Mohamed Laila, Kate Puech, Zhengping Ji, Xia Hu, Michael Wilson, Noah Spitzer-Williams, Michael Wheeler, Yasser Ibrahim

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
[32] arXiv:2405.12983 [pdf, other]: Title: Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer

Authors: Maxime Burchi, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg, Radu Timofte

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)
[33] arXiv:2405.14679 (cross-list from cs.SD) [pdf, other]: Title: Leveraging Electric Guitar Tones and Effects to Improve Robustness in Guitar Tablature Transcription Modeling

Authors: Hegel Pedroza, Wallace Abreu, Ryan Corey, Iran Roman

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2405.14598 (cross-list from cs.CV) [pdf, other]: Title: Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Comments: 10 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 53 entries: 1-25 | 10-34 | 35-53 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 9

Tue, 28 May 2024 (continued, showing last 12 of 13 entries)

Mon, 27 May 2024

Fri, 24 May 2024 (showing first 6 of 18 entries)