Sound

Authors and titles for cs.SD in Mar 2024, skipping first 50

[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]

[51] arXiv:2403.10904 [pdf, other]: Title: Urban Sound Propagation: a Benchmark for 1-Step Generative Modeling of Complex Physical Systems

Authors: Martin Spitznagel, Janis Keuper

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[52] arXiv:2403.11091 [pdf, other]: Title: Multitask frame-level learning for few-shot sound event detection

Authors: Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

Comments: 6 pages, 4 figures, conference

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[53] arXiv:2403.11706 [pdf, other]: Title: Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

Authors: Emilian Postolache, Giorgio Mariani, Luca Cosmo, Emmanouil Benetos, Emanuele Rodolà

Comments: Accepted at ICASSP 2024

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54] arXiv:2403.11732 [pdf, other]: Title: Hallucination in Perceptual Metric-Driven Speech Enhancement Networks

Authors: George Close, Thomas Hain, Stefan Goetze

Comments: Submitted to EUSIPCO 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2403.11778 [pdf, other]: Title: Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms

Authors: Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[56] arXiv:2403.11780 [pdf, other]: Title: Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

Authors: Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao

Comments: Accepted by NAACL 2024 (main conference)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[57] arXiv:2403.11827 [pdf, other]: Title: Sound Event Detection and Localization with Distance Estimation

Authors: Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros

Comments: This paper has been submitted for the 32nd European Signal Processing Conference EUSIPCO 2024 in Lyon

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[58] arXiv:2403.11879 [pdf, other]: Title: Unimodal Multi-Task Fusion for Emotional Mimicry Prediction

Authors: Tobias Hallmen, Fabian Deuser, Norbert Oswald, Elisabeth André

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[59] arXiv:2403.12000 [pdf, other]: Title: Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance

Authors: Victor Shepardson, Jack Armitage, Thor Magnusson

Comments: 12 pages, 6 figures. Proceedings of the 3rd Conference on AI Music Creativity (2022, September 17)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[60] arXiv:2403.12477 [pdf, other]: Title: Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation

Authors: Yuto Ishikawa, Kohei Konaka, Tomohiko Nakamura, Norihiro Takamune, Hiroshi Saruwatari

Comments: 5 pages, 3 figures, accepted at HSCMA 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:2403.13086 [pdf, other]: Title: Listenable Maps for Audio Classifiers

Authors: Francesco Paissan, Mirco Ravanelli, Cem Subakan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[62] arXiv:2403.13252 [pdf, other]: Title: Frequency-aware convolution for sound event detection

Authors: Tao Song

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2403.13254 [pdf, other]: Title: Onset and offset weighted loss function for sound event detection

Authors: Tao Song

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64] arXiv:2403.13353 [pdf, other]: Title: Building speech corpus with diverse voice characteristics for its prompt-based representation

Authors: Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. arXiv admin note: text overlap with arXiv:2309.13509

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:2403.13423 [pdf, other]: Title: Advanced Long-Content Speech Recognition With Factorized Neural Transducer

Authors: Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

Comments: Accepted by TASLP 2024

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66] arXiv:2403.13720 [pdf, other]: Title: UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge

Authors: Wataru Nakata, Kazuki Yamauchi, Dong Yang, Hiroaki Hyodo, Yuki Saito

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:2403.14048 [pdf, ps, other]: Title: The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data

Authors: Alice Baird, Rachel Manzelli, Panagiotis Tzirakis, Chris Gagne, Haoqi Li, Sadie Allen, Sander Dieleman, Brian Kulis, Shrikanth S. Narayanan, Alan Cowen

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[68] arXiv:2403.14083 [pdf, other]: Title: emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

Authors: Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Berrak Sisman, Bjorn W. Schuller, Carlos Busso

Comments: Submitted to IEEE Transactions on Affective Computing on February 19, 2024. arXiv admin note: text overlap with arXiv:2305.14402

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[69] arXiv:2403.14286 [pdf, other]: Title: Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization

Authors: Nikhil Raghav, Md Sahidullah

Comments: Manuscript Under Review

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[70] arXiv:2403.14290 [pdf, other]: Title: Exploring Green AI for Audio Deepfake Detection

Authors: Subhajit Saha, Md Sahidullah, Swagatam Das

Comments: This manuscript is under review in a conference

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71] arXiv:2403.14402 [pdf, other]: Title: XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

Authors: HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[72] arXiv:2403.15569 [pdf, other]: Title: Music to Dance as Language Translation using Sequence Models

Authors: André Correia, Luís A. Alexandre

Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[73] arXiv:2403.16078 [pdf, other]: Title: Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

Authors: Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng

Comments: Accepted by IJCNN 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2403.16331 [pdf, other]: Title: Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models

Authors: Hanzhi Yin, Gang Cheng, Christian J. Steinmetz, Ruibin Yuan, Richard M. Stern, Roger B. Dannenberg

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[75] arXiv:2403.16464 [pdf, other]: Title: Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka

Comments: Accepted to ICASSP 2024. Project page: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Mar 2024, skipping first 50