Sound

Authors and titles for recent submissions, skipping first 20

[ total of 39 entries: 1-50 | 21-39 ]
[ showing up to 50 entries per page: fewer | more ]

Wed, 29 May 2024 (continued, showing last 2 of 7 entries)

[21] arXiv:2405.17809 (cross-list from cs.CL) [pdf, other]: Title: TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2405.17569 (cross-list from cs.LG) [pdf, other]: Title: Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Authors: Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo Cândido Jr, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger

Comments: 5 pages, 2 figures, 1 table. Published in Artificial Intelligence in Medicine (AIME) 2023

Journal-ref: Artificial Intellingence in Medicine Proceedings 2023, page 271-275

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 28 May 2024

[23] arXiv:2405.17413 [pdf, ps, other]: Title: Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization

Authors: Navin Kamuni, Dheerendra Panwar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[24] arXiv:2405.17206 [pdf, other]: Title: A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings

Authors: Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, MD Saiful Islam, Ehsan Hoque

Comments: 25 pages, 5 figures, and 4 tables

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[25] arXiv:2405.17028 [pdf, other]: Title: RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2405.16797 [pdf, ps, other]: Title: A Real-Time Voice Activity Detection Based On Lightweight Neural

Authors: Jidong Jia, Pei Zhao, Di Wang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[27] arXiv:2405.16687 [pdf, other]: Title: Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

Authors: Xavier Riley, Simon Dixon

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2405.16000 [pdf, other]: Title: Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Authors: Sanjay Natesan, Homayoon Beigi

Comments: 7 pages, 2 tables, 3 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2024), RTI-20240524-01

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[29] arXiv:2405.15863 [pdf, other]: Title: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[30] arXiv:2405.17100 (cross-list from cs.CR) [pdf, other]: Title: Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, Jing Chen, Kun He, Ruiying Du

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2405.16677 (cross-list from eess.AS) [pdf, other]: Title: Crossmodal ASR Error Correction with Discrete Speech Units

Authors: Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[32] arXiv:2405.16136 (cross-list from cs.AI) [pdf, other]: Title: C3LLM: Conditional Multimodal Content Generation Using Large Language Models

Authors: Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2405.15923 (cross-list from eess.SP) [pdf, ps, other]: Title: Spiketrum: An FPGA-based Implementation of a Neuromorphic Cochlea

Authors: MHD Anas Alsakkal, Jayawan Wijekoon

Comments: To be published at "IEEE Transactions on Circuits and Systems"

Subjects: Signal Processing (eess.SP); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 27 May 2024

[34] arXiv:2405.15655 [pdf, other]: Title: HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System

Authors: Zhisheng Zhang, Pengyang Huang

Comments: Accepted by IJCNN 2024

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:2405.15338 [pdf, other]: Title: SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

Authors: Xinlei Niu, Jing Zhang, Christian Walder, Charles Patrick Martin

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2405.15103 [pdf, other]: Title: The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation

Authors: Nick Collins

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[37] arXiv:2405.15096 [pdf, other]: Title: Music Genre Classification: Training an AI model

Authors: Keoikantse Mogonediwa

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38] arXiv:2405.15216 (cross-list from cs.LG) [pdf, other]: Title: Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

Authors: Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly

Comments: under review

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2405.15085 (cross-list from eess.SP) [pdf, other]: Title: Acoustical Features as Knee Health Biomarkers: A Critical Analysis

Authors: Christodoulos Kechris, Jerome Thevenot, Tomas Teijeiro, Vincent A. Stadelmann, Nicola A. Maffiuletti, David Atienza

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 39 entries: 1-50 | 21-39 ]
[ showing up to 50 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2406, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 20

Wed, 29 May 2024 (continued, showing last 2 of 7 entries)

Tue, 28 May 2024

Mon, 27 May 2024