We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Mar 2024

[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]
[1]  arXiv:2403.00529 [pdf, other]
Title: VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Comments: preprint
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2]  arXiv:2403.00790 [pdf, ps, other]
Title: Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations
Authors: Tofara Moyo
Comments: 3 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3]  arXiv:2403.00977 [pdf, other]
Title: Scaling Up Adaptive Filter Optimizers
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2403.01255 [pdf, other]
Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
Journal-ref: Information Fusion, Elsevier, 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[5]  arXiv:2403.01278 [pdf, other]
Title: Enhancing Audio Generation Diversity with Visual Information
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:2403.01700 [pdf, other]
Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[7]  arXiv:2403.01785 [pdf, other]
Title: What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8]  arXiv:2403.01792 [pdf, other]
Title: ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:2403.01960 [pdf, other]
Title: A robust audio deepfake detection system via multi-view feature
Comments: 5 pages, 2 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2403.02002 [pdf, other]
Title: Fine-Grained Quantitative Emotion Editing for Speech Generation
Comments: This paper is submitted to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:2403.02010 [pdf, other]
Title: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2403.02701 [pdf, other]
Title: Fighting Game Adaptive Background Music for Improved Gameplay
Comments: This is an updated version of our IEEE CoG 2023 paper (this https URL). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[13]  arXiv:2403.03395 [pdf, other]
Title: Interactive Melody Generation System for Enhancing the Creativity of Musicians
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[14]  arXiv:2403.03411 [pdf, other]
Title: CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation
Comments: 9 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2403.03510 [pdf, other]
Title: METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D
Comments: 4
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[16]  arXiv:2403.03522 [pdf, other]
Title: Non-verbal information in spontaneous speech -- towards a new framework of analysis
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:2403.03538 [pdf, other]
Title: RADIA -- Radio Advertisement Detection with Intelligent Analytics
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[18]  arXiv:2403.03947 [pdf, other]
Title: Can Audio Reveal Music Performance Difficulty? Insights from the Piano Syllabus Dataset
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2403.04111 [pdf, ps, other]
Title: Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Comments: Accepted to EACL Main 2024
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2403.04245 [pdf, other]
Title: A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Comments: the paper is accepted by CVPR2024
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21]  arXiv:2403.04594 [pdf, other]
Title: A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2403.05010 [pdf, other]
Title: RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23]  arXiv:2403.05380 [pdf, other]
Title: Spectrogram-Based Detection of Auto-Tuned Vocals in Music Recordings
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[24]  arXiv:2403.05772 [pdf, other]
Title: sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[25]  arXiv:2403.05820 [pdf, other]
Title: An Audio-textual Diffusion Model For Converting Speech Signals Into Ultrasound Tongue Imaging Data
Comments: ICASSP2024 Accept
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[ total of 170 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-170 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help  (Access key information)