Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024, skipping first 75

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

[76] arXiv:2403.00529 (cross-list from cs.SD) [pdf, other]: Title: VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

Authors: Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

Comments: preprint

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[77] arXiv:2403.00790 (cross-list from cs.SD) [pdf, ps, other]: Title: Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations

Authors: Tofara Moyo

Comments: 3 pages

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[78] arXiv:2403.00854 (cross-list from q-bio.NC) [pdf, other]: Title: Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning

Authors: Lauren Stumpf, Balasundaram Kadirvelu, Sigourney Waibel, A. Aldo Faisal

Comments: 17 pages, 2 tables, 4 main figures, 2 supplemental figures, prepared for journal submission

Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2403.00977 (cross-list from cs.SD) [pdf, other]: Title: Scaling Up Adaptive Filter Optimizers

Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2403.01087 (cross-list from cs.MM) [pdf, other]: Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild

Authors: Sindhu Hegde, Rudrabha Mukhopadhyay, C.V. Jawahar, Vinay Namboodiri

Comments: 8 pages of content, 1 page of references and 4 figures

Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2403.01132 (cross-list from cs.LG) [pdf, ps, other]: Title: MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems

Authors: Chu Wang, Jinhong Wu, Yanzhi Wang, Zhijian Zha, Qi Zhou

Comments: The number of figures is 16. The number of tables is 5. The number of words is 9717

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2403.01255 (cross-list from cs.SD) [pdf, other]: Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey

Authors: Hamza Kheddar, Mustapha Hemis, Yassine Himeur

Journal-ref: Information Fusion, Elsevier, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[83] arXiv:2403.01278 (cross-list from cs.SD) [pdf, other]: Title: Enhancing Audio Generation Diversity with Visual Information

Authors: Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2403.01699 (cross-list from cs.CL) [pdf, other]: Title: Brilla AI: AI Contestant for the National Science and Maths Quiz

Authors: George Boateng, Jonathan Abrefah Mensah, Kevin Takyi Yeboah, William Edor, Andrew Kojo Mensah-Onumah, Naafi Dasana Ibrahim, Nana Sam Yeboah

Comments: 14 pages. Accepted for the WideAIED track at the 25th International Conference on AI in Education (AIED 2024)

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2403.01700 (cross-list from cs.SD) [pdf, other]: Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

Authors: Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

Comments: Accepted by ICASSP 2024

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[86] arXiv:2403.01785 (cross-list from cs.SD) [pdf, other]: Title: What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution

Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2403.01792 (cross-list from cs.SD) [pdf, other]: Title: ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88] arXiv:2403.01960 (cross-list from cs.SD) [pdf, other]: Title: A robust audio deepfake detection system via multi-view feature

Authors: Yujie Yang, Haochen Qin, Hang Zhou, Chengcheng Wang, Tianyu Guo, Kai Han, Yunhe Wang

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2403.02002 (cross-list from cs.SD) [pdf, other]: Title: Fine-Grained Quantitative Emotion Editing for Speech Generation

Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

Comments: This paper is submitted to IEEE Signal Processing Letters

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2403.02010 (cross-list from cs.SD) [pdf, other]: Title: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR

Authors: Zhiyun Fan, Linhao Dong, Jun Zhang, Lu Lu, Zejun Ma

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:2403.02687 (cross-list from cs.HC) [src]: Title: Enhanced DareFightingICE Competitions: Sound Design and AI Competitions

Authors: Ibrahim Khan, Chollakorn Nimpattanavong, Thai Van Nguyen, Kantinan Plupattanakit, Ruck Thawonmas

Comments: This paper describes a new competition platform using Unity for our competitions at the 2024 IEEE Conference on Games (CoG 2024). It was accepted for presentation at CoG 2024. However, we recently discovered a much more effective way to do this task without using Unity, leading to our decision to withdraw the paper from CoG 2024 and ArXiv

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2403.02701 (cross-list from cs.SD) [pdf, other]: Title: Fighting Game Adaptive Background Music for Improved Gameplay

Authors: Ibrahim Khan, Thai Van Nguyen, Chollakorn Nimpattanavong, Ruck Thawonmas

Comments: This is an updated version of our IEEE CoG 2023 paper (this https URL). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[93] arXiv:2403.02918 (cross-list from cs.RO) [pdf, other]: Title: Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction

Authors: Yue Li, Koen V Hindriks, Florian Kunneman

Comments: Accepted by ACM Technological Advances in Human-Robot Interaction. 9 pages

Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2403.02938 (cross-list from cs.CL) [pdf, other]: Title: AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models

Authors: Kazuki Kawamura, Jun Rekimoto

Journal-ref: AHs '23: Proceedings of the Augmented Humans International Conference 2023

Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2403.03095 (cross-list from cs.CV) [pdf, other]: Title: Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

Authors: Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

Comments: Accepted To ICASSP2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2403.03145 (cross-list from cs.CV) [pdf, other]: Title: Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

Authors: Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

Comments: Accepted to NeurIPS2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:2403.03224 (cross-list from physics.soc-ph) [pdf, other]: Title: Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

Authors: Vedant Tapiavala, Joshua Piesner, Sourjyamoy Barman, Feng Fu

Comments: 16 pages, 4 figures

Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2403.03395 (cross-list from cs.SD) [pdf, other]: Title: Interactive Melody Generation System for Enhancing the Creativity of Musicians

Authors: So Hirawata, Noriko Otani

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[99] arXiv:2403.03411 (cross-list from cs.SD) [pdf, other]: Title: CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation

Authors: Vahid Ahmadi Kalkhorani, DeLiang Wang

Comments: 9 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2403.03510 (cross-list from cs.SD) [pdf, other]: Title: METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D

Authors: Stefan Schoder, Paul Maurerlehner

Comments: 4

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024, skipping first 75