Sound

Authors and titles for recent submissions, skipping first 6

Fri, 31 May 2024
Thu, 30 May 2024
Wed, 29 May 2024
Tue, 28 May 2024
Mon, 27 May 2024

[ total of 39 entries: 1-25 | 7-31 | 32-39 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 31 May 2024 (continued, showing last 5 of 11 entries)

[7] arXiv:2405.19342 [pdf, other]: Title: Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants

Authors: Chloé Sekkat, Fanny Leroy, Salima Mdhaffar, Blake Perry Smith, Yannick Estève, Joseph Dureau, Alice Coucke

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2405.20336 (cross-list from cs.CV) [pdf, other]: Title: RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

Authors: Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

Comments: Project website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2405.20064 (cross-list from eess.AS) [pdf, other]: Title: 1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[10] arXiv:2405.19497 (cross-list from eess.AS) [pdf, other]: Title: Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data

Authors: Eloi Moliner, Sebastian Braun, Hannes Gamper

Comments: Submitted to IWAENC 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[11] arXiv:2405.19426 (cross-list from cs.CL) [pdf, other]: Title: Deep Learning for Assessment of Oral Reading Fluency

Authors: Mithilesh Vaidya, Binaya Kumar Sahoo, Preeti Rao

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 30 May 2024

[12] arXiv:2405.18726 [pdf, other]: Title: Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI

Authors: Che Liu, Changde Du, Xiaoyu Chen, Huiguang He

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13] arXiv:2405.18503 [pdf, other]: Title: SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Authors: Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14] arXiv:2405.19041 (cross-list from cs.CL) [pdf, other]: Title: BLSP-KD: Bootstrapping Language-Speech Pre-training via Knowledge Distillation

Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Jiajun Zhang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2405.18639 (cross-list from q-bio.NC) [pdf, other]: Title: Improving Speech Decoding from ECoG with Self-Supervised Pretraining

Authors: Brian A. Yuan, Joseph G. Makin

Subjects: Neurons and Cognition (q-bio.NC); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 29 May 2024

[16] arXiv:2405.18386 [pdf, other]: Title: Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Authors: Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Comments: Code and demo are available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[17] arXiv:2405.18213 [pdf, other]: Title: NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Authors: Amandine Brunetto, Sascha Hornauer, Fabien Moutarde

Comments: Project Page: this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[18] arXiv:2405.18153 [pdf, other]: Title: Practical aspects for the creation of an audio dataset from field recordings with optimized labeling budget with AI-assisted strategy

Authors: Javier Naranjo-Alcazar, Jordi Grau-Haro, Ruben Ribes-Serrano, Pedro Zuccarello

Comments: Submitted to ICML 2024 Workshop on Data-Centric Machine Learning Research

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19] arXiv:2405.17615 [pdf, other]: Title: Listenable Maps for Zero-Shot Audio Classifiers

Authors: Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[20] arXiv:2405.17842 (cross-list from cs.CV) [pdf, other]: Title: Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation

Authors: Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2405.17809 (cross-list from cs.CL) [pdf, other]: Title: TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Sheng Zhao, Michael Zeng

Comments: Work in progress

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2405.17569 (cross-list from cs.LG) [pdf, other]: Title: Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese

Authors: Marcelo Matheus Gauy, Larissa Cristina Berti, Arnaldo Cândido Jr, Augusto Camargo Neto, Alfredo Goldman, Anna Sara Shafferman Levin, Marcus Martins, Beatriz Raposo de Medeiros, Marcelo Queiroz, Ester Cerdeira Sabino, Flaviane Romani Fernandes Svartman, Marcelo Finger

Comments: 5 pages, 2 figures, 1 table. Published in Artificial Intelligence in Medicine (AIME) 2023

Journal-ref: Artificial Intellingence in Medicine Proceedings 2023, page 271-275

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 28 May 2024 (showing first 9 of 11 entries)

[23] arXiv:2405.17413 [pdf, ps, other]: Title: Enhancing Music Genre Classification through Multi-Algorithm Analysis and User-Friendly Visualization

Authors: Navin Kamuni, Dheerendra Panwar

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[24] arXiv:2405.17206 [pdf, other]: Title: A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings

Authors: Tariq Adnan, Abdelrahman Abdelkader, Zipei Liu, Ekram Hossain, Sooyong Park, MD Saiful Islam, Ehsan Hoque

Comments: 25 pages, 5 figures, and 4 tables

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[25] arXiv:2405.17028 [pdf, other]: Title: RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2405.16797 [pdf, ps, other]: Title: A Real-Time Voice Activity Detection Based On Lightweight Neural

Authors: Jidong Jia, Pei Zhao, Di Wang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[27] arXiv:2405.16687 [pdf, other]: Title: Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

Authors: Xavier Riley, Simon Dixon

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2405.16000 [pdf, other]: Title: Carnatic Raga Identification System using Rigorous Time-Delay Neural Network

Authors: Sanjay Natesan, Homayoon Beigi

Comments: 7 pages, 2 tables, 3 figures

Journal-ref: Recognition Technologies, Inc. Technical Report (2024), RTI-20240524-01

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[29] arXiv:2405.15863 [pdf, other]: Title: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[30] arXiv:2405.17100 (cross-list from cs.CR) [pdf, other]: Title: Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, Jing Chen, Kun He, Ruiying Du

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2405.16677 (cross-list from eess.AS) [pdf, other]: Title: Crossmodal ASR Error Correction with Discrete Speech Units

Authors: Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Fri, 31 May 2024
Thu, 30 May 2024
Wed, 29 May 2024
Tue, 28 May 2024
Mon, 27 May 2024

[ total of 39 entries: 1-25 | 7-31 | 32-39 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 6

Fri, 31 May 2024 (continued, showing last 5 of 11 entries)

Thu, 30 May 2024

Wed, 29 May 2024

Tue, 28 May 2024 (showing first 9 of 11 entries)