Sound

Authors and titles for cs.SD in Aug 2023

[ total of 219 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-219 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:2308.00010 [pdf, ps, other]: Title: Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

Authors: S. Rijal, R. Neupane, S. P. Mainali, S. K. Regmi, S. Maharjan

Comments: 5 pages, 6 figures, 2 tables, study conducted as major project for B.E. (Computer Engineering), IOE Tribhuvan University 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2308.00015 [pdf, other]: Title: Exploring how a Generative AI interprets music

Authors: Gabriela Barenboim, Luigi Del Debbio, Johannes Hirn, Veronica Sanz

Comments: 16 pages, 12 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2308.01187 [pdf, other]: Title: Music De-limiter Networks via Sample-wise Gain Inversion

Authors: Chang-Bin Jeon, Kyogu Lee

Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2308.01327 [pdf, other]: Title: Careful Whisper -- leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification

Authors: Laurin Wagner, Mario Zusag, Theresa Bloder

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:2308.01531 [pdf, ps, other]: Title: Optimizing multi-user indoor sound communications with acoustic reconfigurable metasurfaces

Authors: Hongkuan Zhang, Qiyuan Wang, Mathias Fink, Guancong Ma

Journal-ref: Nature Communications (2024)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[6] arXiv:2308.01546 [pdf, other]: Title: MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

Authors: Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov

Comments: 16 pages, 3 figures, 2 tables, demo page: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[7] arXiv:2308.01573 [pdf, ps, other]: Title: Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

Authors: Myeongjin Ko, Yong-Hoon Choi

Journal-ref: IEEE Open Journal of Signal Processing, vol. 5, pp. 577-587, 2024

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2308.02013 [pdf, other]: Title: Federated Representation Learning for Automatic Speech Recognition

Authors: Guruprasad V Ramesh, Gopinath Chennupati, Milind Rao, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo

Comments: Accepted at ISCA SPSC Symposium 3rd Symposium on Security and Privacy in Speech Communication, 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9] arXiv:2308.02190 [pdf, other]: Title: Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

Authors: Jiaxin Ye, Yujie Wei, Xin-Cheng Wen, Chenglong Ma, Zhizhong Huang, Kunhong Liu, Hongming Shan

Comments: Accepted by ACM MM 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[10] arXiv:2308.02249 [pdf, other]: Title: Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song

Authors: Danbinaerin Han, Rafael Caro Repetto, Dasaem Jeong

Comments: Accepted at 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:2308.02263 [pdf, other]: Title: Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Authors: Jinyu Long, Jetic Gū, Binhao Bai, Zhibo Yang, Ping Wei, Junli Li

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[12] arXiv:2308.02560 [pdf, other]: Title: From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Authors: Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez

Comments: 10 pages

Journal-ref: Thirty-seventh Conference on Neural Information Processing Systems (2023)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2308.02723 [pdf, other]: Title: Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

Authors: Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov

Comments: 7 pages, 4 figures, 2 tables, Proceedings of the 24th International Society for Music Information Retrieval Conference, ISMIR 2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[14] arXiv:2308.02867 [pdf, other]: Title: A Systematic Exploration of Joint-training for Singing Voice Synthesis

Authors: Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2308.02898 [pdf, other]: Title: Elucidate Gender Fairness in Singing Voice Transcription

Authors: Xiangming Gu, Wei Zeng, Ye Wang

Comments: Camera-ready version of ACM MM2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:2308.03019 [pdf, ps, other]: Title: Characterization of cough sounds using statistical analysis

Authors: Naveenkumar Vodnala (VNR Vignana Jyothi Institute of Engineering and Technology), Pratap Reddy Lankireddy (Jawaharlal Nehru Technological University Hyderabad), Padmasai Yarlagadda (VNR Vignana Jyothi Institute of Engineering and Technology)

Comments: 19 pages, 8 figures, paper submitted to journal Biomedical Signal Processing and Control which is under review

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[17] arXiv:2308.03266 [pdf, other]: Title: SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

Authors: Xian Shi, Yexin Yang, Zerui Li, Yanni Chen, Zhifu Gao, Shiliang Zhang

Comments: accepted by ICASSP2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[18] arXiv:2308.03300 [pdf, other]: Title: Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Chuyuan Zhang

Comments: 40th Internation Conference on Machine Learning (ICML 2023)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19] arXiv:2308.03332 [pdf, ps, other]: Title: Improving Deep Attractor Network by BGRU and GMM for Speech Separation

Authors: Rawad Melhem, Assef Jafar, Riad Hamadeh

Journal-ref: Journal of Harbin Institute of Technology (New Series), vol. 28, no. 3, pp. 90-96, 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2308.04025 [pdf, other]: Title: MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition

Authors: Yu Pan, Yuguang Yang, Yuheng Huang, Jixun Yao, Jingjing Yin, Yanni Hu, Heng Lu, Lei Ma, Jianjun Zhao

Comments: 12 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21] arXiv:2308.04169 [pdf, other]: Title: Dual input neural networks for positional sound source localization

Authors: Eric Grinstein, Vincent W. Neo, Patrick A. Naylor

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22] arXiv:2308.04244 [pdf, other]: Title: Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning

Authors: Xiaoyu Chen, Changde Du, Qiongyi Zhou, Huiguang He

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM)
[23] arXiv:2308.04517 [pdf, other]: Title: Capturing Spectral and Long-term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques

Authors: Samiul Islam, Md. Maksudul Haque, Abu Jobayer Md. Sadat

Comments: the research paper is still in progress

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[24] arXiv:2308.04666 [pdf, other]: Title: Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation

Authors: Zirui Ge, Xinzhou Xu, Haiyan Guo, Tingting Wang, Zhen Yang

Comments: 9 pages, 4 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2308.04729 [pdf, other]: Title: JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Authors: Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

[ total of 219 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-219 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Aug 2023