We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Dec 2023

[ total of 214 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-214 ]
[ showing 25 entries per page: fewer | more | all ]
[1]  arXiv:2312.00091 [pdf, ps, other]
Title: Sound Terminology Describing Production and Perception of Sonification
Authors: Tim Ziemer
Comments: 16 pages, 0 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2312.00476 [pdf, other]
Title: Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer
Authors: Bing Yang, Xiaofei Li
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3]  arXiv:2312.00834 [pdf, other]
Title: AV-RIR: Audio-Visual Room Impulse Response Estimation
Comments: Accepted to CVPR 2024
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV)
[4]  arXiv:2312.01062 [pdf, ps, other]
Title: Acoustic Signal Analysis with Deep Neural Network for Detecting Fault Diagnosis in Industrial Machines
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[5]  arXiv:2312.01092 [pdf, other]
Title: A Semi-Supervised Deep Learning Approach to Dataset Collection for Query-By-Humming Task
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6]  arXiv:2312.01479 [pdf, other]
Title: OpenVoice: Versatile Instant Voice Cloning
Comments: Technical Report
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7]  arXiv:2312.01554 [pdf, other]
Title: Building Ears for Robots: Machine Hearing in the Age of Autonomy
Authors: Xuan Zhong
Comments: 11 pages, 6 figures. The materials covered in this article were presented and discussed at the Hearing Seminar at Stanford University organized by Malcolm Slaney in October, 2023
Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[8]  arXiv:2312.01645 [pdf, ps, other]
Title: A text-dependent speaker verification application framework based on Chinese numerical string corpus
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:2312.01842 [pdf, other]
Title: Exploring the Viability of Synthetic Audio Data for Audio-Based Dialogue State Tracking
Comments: Accepted in ASRU 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[10]  arXiv:2312.02229 [pdf, other]
Title: Synthetic Data Generation Techniques for Developing AI-based Speech Assessments for Parkinson's Disease (A Comparative Study)
Comments: 6, 5 Tables, 5 Figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11]  arXiv:2312.02773 [pdf, other]
Title: Integrating Plug-and-Play Data Priors with Weighted Prediction Error for Speech Dereverberation
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2312.03410 [pdf, other]
Title: Detecting Voice Cloning Attacks via Timbre Watermarking
Comments: NDSS 2024
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13]  arXiv:2312.03455 [pdf, other]
Title: Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data
Comments: Machine Learning for Audio Workshop, NeurIPS 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[14]  arXiv:2312.03479 [pdf, other]
Title: JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
Comments: Conference: 24th International Society for Music Information Retrieval. Late Breaking Demo. 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[15]  arXiv:2312.03491 [pdf, other]
Title: Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16]  arXiv:2312.03632 [pdf, other]
Title: Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:2312.03666 [pdf, other]
Title: Towards small and accurate convolutional neural networks for acoustic biodiversity monitoring
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18]  arXiv:2312.04846 [pdf, other]
Title: Sound Source Localization for a Source inside a Structure using Ac-CycleGAN
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:2312.04919 [pdf, other]
Title: Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2312.05415 [pdf, ps, other]
Title: An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis
Comments: 8 pages, 1 figure, 4 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[21]  arXiv:2312.05640 [pdf, other]
Title: Keyword spotting -- Detecting commands in speech using deep learning
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[22]  arXiv:2312.05815 [pdf, other]
Title: Voice Activity Detection (VAD) in Noisy Environments
Authors: Joshua Ball
Comments: 7 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2312.05994 [pdf, other]
Title: mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks
Comments: Machine Learning for Audio Workshop, Neural Information Processing Systems (NeurIPS) 2023, New Orleans, LA
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[24]  arXiv:2312.06055 [pdf, other]
Title: Speaker-Text Retrieval via Contrastive Learning
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25]  arXiv:2312.06118 [pdf, other]
Title: ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[ total of 214 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-214 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help  (Access key information)