We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Aug 2023

[ total of 219 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-219 ]
[ showing 25 entries per page: fewer | more | all ]
[1]  arXiv:2308.00010 [pdf, ps, other]
Title: Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
Comments: 5 pages, 6 figures, 2 tables, study conducted as major project for B.E. (Computer Engineering), IOE Tribhuvan University 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2]  arXiv:2308.00015 [pdf, other]
Title: Exploring how a Generative AI interprets music
Comments: 16 pages, 12 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3]  arXiv:2308.01187 [pdf, other]
Title: Music De-limiter Networks via Sample-wise Gain Inversion
Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2308.01327 [pdf, other]
Title: Careful Whisper -- leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5]  arXiv:2308.01531 [pdf, ps, other]
Title: Optimizing multi-user indoor sound communications with acoustic reconfigurable metasurfaces
Journal-ref: Nature Communications (2024)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[6]  arXiv:2308.01546 [pdf, other]
Title: MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Comments: 16 pages, 3 figures, 2 tables, demo page: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[7]  arXiv:2308.01573 [pdf, ps, other]
Title: Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Journal-ref: IEEE Open Journal of Signal Processing, vol. 5, pp. 577-587, 2024
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8]  arXiv:2308.02013 [pdf, other]
Title: Federated Representation Learning for Automatic Speech Recognition
Comments: Accepted at ISCA SPSC Symposium 3rd Symposium on Security and Privacy in Speech Communication, 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9]  arXiv:2308.02190 [pdf, other]
Title: Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
Comments: Accepted by ACM MM 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[10]  arXiv:2308.02249 [pdf, other]
Title: Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song
Comments: Accepted at 24th International Society for Music Information Retrieval Conference (ISMIR 2023)
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11]  arXiv:2308.02263 [pdf, other]
Title: Efficient Monaural Speech Enhancement using Spectrum Attention Fusion
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[12]  arXiv:2308.02560 [pdf, other]
Title: From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion
Comments: 10 pages
Journal-ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13]  arXiv:2308.02723 [pdf, other]
Title: Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction
Comments: 7 pages, 4 figures, 2 tables, Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[14]  arXiv:2308.02867 [pdf, other]
Title: A Systematic Exploration of Joint-training for Singing Voice Synthesis
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2308.02898 [pdf, other]
Title: Elucidate Gender Fairness in Singing Voice Transcription
Comments: Camera-ready version of ACM MM2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16]  arXiv:2308.03019 [pdf, ps, other]
Title: Characterization of cough sounds using statistical analysis
Authors: Naveenkumar Vodnala (VNR Vignana Jyothi Institute of Engineering and Technology), Pratap Reddy Lankireddy (Jawaharlal Nehru Technological University Hyderabad), Padmasai Yarlagadda (VNR Vignana Jyothi Institute of Engineering and Technology)
Comments: 19 pages, 8 figures, paper submitted to journal Biomedical Signal Processing and Control which is under review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[17]  arXiv:2308.03266 [pdf, other]
Title: SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability
Comments: accepted by ICASSP2024
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[18]  arXiv:2308.03300 [pdf, other]
Title: Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
Comments: 40th Internation Conference on Machine Learning (ICML 2023)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19]  arXiv:2308.03332 [pdf, ps, other]
Title: Improving Deep Attractor Network by BGRU and GMM for Speech Separation
Journal-ref: Journal of Harbin Institute of Technology (New Series), vol. 28, no. 3, pp. 90-96, 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20]  arXiv:2308.04025 [pdf, other]
Title: MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition
Comments: 12 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21]  arXiv:2308.04169 [pdf, other]
Title: Dual input neural networks for positional sound source localization
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22]  arXiv:2308.04244 [pdf, other]
Title: Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
[23]  arXiv:2308.04517 [pdf, other]
Title: Capturing Spectral and Long-term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques
Comments: the research paper is still in progress
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[24]  arXiv:2308.04666 [pdf, other]
Title: Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation
Comments: 9 pages, 4 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25]  arXiv:2308.04729 [pdf, other]
Title: JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[ total of 219 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-219 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help  (Access key information)