Sound

Authors and titles for cs.SD in Mar 2024

[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:2403.00529 [pdf, other]: Title: VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

Authors: Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

Comments: preprint

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2403.00790 [pdf, ps, other]: Title: Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations

Authors: Tofara Moyo

Comments: 3 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3] arXiv:2403.00977 [pdf, other]: Title: Scaling Up Adaptive Filter Optimizers

Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2403.01255 [pdf, other]: Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey

Authors: Hamza Kheddar, Mustapha Hemis, Yassine Himeur

Journal-ref: Information Fusion, Elsevier, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[5] arXiv:2403.01278 [pdf, other]: Title: Enhancing Audio Generation Diversity with Visual Information

Authors: Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2403.01700 [pdf, other]: Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

Authors: Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

Comments: Accepted by ICASSP 2024

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[7] arXiv:2403.01785 [pdf, other]: Title: What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution

Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2403.01792 [pdf, other]: Title: ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2403.01960 [pdf, other]: Title: A robust audio deepfake detection system via multi-view feature

Authors: Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:2403.02002 [pdf, other]: Title: Fine-Grained Quantitative Emotion Editing for Speech Generation

Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

Comments: This paper is submitted to IEEE Signal Processing Letters

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2403.02010 [pdf, other]: Title: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Authors: Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2403.02701 [pdf, other]: Title: Fighting Game Adaptive Background Music for Improved Gameplay

Authors: Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas

Comments: This is an updated version of our IEEE CoG 2023 paper (this https URL). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[13] arXiv:2403.03395 [pdf, other]: Title: Interactive Melody Generation System for Enhancing the Creativity of Musicians

Authors: So Hirawata, Noriko Otani

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[14] arXiv:2403.03411 [pdf, other]: Title: CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation

Authors: Vahid Ahmadi Kalkhorani, DeLiang Wang

Comments: 9 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2403.03510 [pdf, other]: Title: METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D

Authors: Stefan Schoder, Paul Maurerlehner

Comments: 4

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[16] arXiv:2403.03522 [pdf, other]: Title: Non-verbal information in spontaneous speech -- towards a new framework of analysis

Authors: Tirza Biron, Moshe Barboy, Eran Ben-Artzy, Alona Golubchik, Yanir Marmor, Smadar Szekely, Yaron Winter, David Harel

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:2403.03538 [pdf, other]: Title: RADIA -- Radio Advertisement Detection with Intelligent Analytics

Authors: Jorge Álvarez, Juan Carlos Armenteros, Camilo Torrón, Miguel Ortega-Martín, Alfonso Ardoiz, Óscar García, Ignacio Arranz, Íñigo Galdeano, Ignacio Garrido, Adrián Alonso, Fernando Bayón, Oleg Vorontsov

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[18] arXiv:2403.03947 [pdf, other]: Title: Can Audio Reveal Music Performance Difficulty? Insights from the Piano Syllabus Dataset

Authors: Pedro Ramoneda, Minhee Lee, Dasaem Jeong, J.J. Valero-Mas, Xavier Serra

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2403.04111 [pdf, ps, other]: Title: Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication

Authors: Yejin Jeon, Gary Geunbae Lee

Comments: Accepted to EACL Main 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2403.04245 [pdf, other]: Title: A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Authors: Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

Comments: the paper is accepted by CVPR2024

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21] arXiv:2403.04594 [pdf, other]: Title: A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Authors: Xuenan Xu, Xiaohang Xu, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2403.05010 [pdf, other]: Title: RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction

Authors: Peng Liu, Dongyang Dai

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23] arXiv:2403.05380 [pdf, other]: Title: Spectrogram-Based Detection of Auto-Tuned Vocals in Music Recordings

Authors: Mahyar Gohari, Paolo Bestagini, Sergio Benini, Nicola Adami

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[24] arXiv:2403.05772 [pdf, other]: Title: sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

Authors: Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li

Comments: Accepted by ICASSP 2024

Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[25] arXiv:2403.05820 [pdf, other]: Title: An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data

Authors: Yudong Yang, Rongfeng Su, Xiaokang Liu, Nan Yan, Lan Wang

Comments: ICASSP2024 Accept

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)

[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Mar 2024