We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Mar 2024, skipping first 50

[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]
[51]  arXiv:2403.10904 [pdf, other]
Title: Urban Sound Propagation: a Benchmark for 1-Step Generative Modeling of Complex Physical Systems
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[52]  arXiv:2403.11091 [pdf, other]
Title: Multitask frame-level learning for few-shot sound event detection
Comments: 6 pages, 4 figures, conference
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[53]  arXiv:2403.11706 [pdf, other]
Title: Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
Comments: Accepted at ICASSP 2024
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54]  arXiv:2403.11732 [pdf, other]
Title: Hallucination in Perceptual Metric-Driven Speech Enhancement Networks
Comments: Submitted to EUSIPCO 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55]  arXiv:2403.11778 [pdf, other]
Title: Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[56]  arXiv:2403.11780 [pdf, other]
Title: Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Comments: Accepted by NAACL 2024 (main conference)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[57]  arXiv:2403.11827 [pdf, other]
Title: Sound Event Detection and Localization with Distance Estimation
Comments: This paper has been submitted for the 32nd European Signal Processing Conference EUSIPCO 2024 in Lyon
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[58]  arXiv:2403.11879 [pdf, other]
Title: Unimodal Multi-Task Fusion for Emotional Mimicry Prediction
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[59]  arXiv:2403.12000 [pdf, other]
Title: Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance
Comments: 12 pages, 6 figures. Proceedings of the 3rd Conference on AI Music Creativity (2022, September 17)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[60]  arXiv:2403.12477 [pdf, other]
Title: Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation
Comments: 5 pages, 3 figures, accepted at HSCMA 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61]  arXiv:2403.13086 [pdf, other]
Title: Listenable Maps for Audio Classifiers
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[62]  arXiv:2403.13252 [pdf, other]
Title: Frequency-aware convolution for sound event detection
Authors: Tao Song
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63]  arXiv:2403.13254 [pdf, other]
Title: Onset and offset weighted loss function for sound event detection
Authors: Tao Song
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64]  arXiv:2403.13353 [pdf, other]
Title: Building speech corpus with diverse voice characteristics for its prompt-based representation
Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. arXiv admin note: text overlap with arXiv:2309.13509
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65]  arXiv:2403.13423 [pdf, other]
Title: Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Comments: Accepted by TASLP 2024
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1803-1815, 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66]  arXiv:2403.13720 [pdf, other]
Title: UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67]  arXiv:2403.14048 [pdf, ps, other]
Title: The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[68]  arXiv:2403.14083 [pdf, other]
Title: emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition
Comments: Submitted to IEEE Transactions on Affective Computing on February 19, 2024. arXiv admin note: text overlap with arXiv:2305.14402
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[69]  arXiv:2403.14286 [pdf, other]
Title: Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
Comments: Manuscript Under Review
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[70]  arXiv:2403.14290 [pdf, other]
Title: Exploring Green AI for Audio Deepfake Detection
Comments: This manuscript is under review in a conference
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71]  arXiv:2403.14402 [pdf, other]
Title: XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[72]  arXiv:2403.15569 [pdf, other]
Title: Music to Dance as Language Translation using Sequence Models
Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[73]  arXiv:2403.16078 [pdf, other]
Title: Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
Comments: Accepted by IJCNN 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2403.16331 [pdf, other]
Title: Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[75]  arXiv:2403.16464 [pdf, other]
Title: Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
Comments: Accepted to ICASSP 2024. Project page: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help  (Access key information)