We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024, skipping first 75

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]
[76]  arXiv:2403.00529 (cross-list from cs.SD) [pdf, other]
Title: VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis
Comments: preprint
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[77]  arXiv:2403.00790 (cross-list from cs.SD) [pdf, ps, other]
Title: Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations
Authors: Tofara Moyo
Comments: 3 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[78]  arXiv:2403.00854 (cross-list from q-bio.NC) [pdf, other]
Title: Speaker-Independent Dysarthria Severity Classification using Self-Supervised Transformers and Multi-Task Learning
Comments: 17 pages, 2 tables, 4 main figures, 2 supplemental figures, prepared for journal submission
Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79]  arXiv:2403.00977 (cross-list from cs.SD) [pdf, other]
Title: Scaling Up Adaptive Filter Optimizers
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80]  arXiv:2403.01087 (cross-list from cs.MM) [pdf, other]
Title: Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Comments: 8 pages of content, 1 page of references and 4 figures
Journal-ref: In Proceedings of the 31st ACM International Conference on Multimedia, 2023
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81]  arXiv:2403.01132 (cross-list from cs.LG) [pdf, ps, other]
Title: MPIPN: A Multi Physics-Informed PointNet for solving parametric acoustic-structure systems
Comments: The number of figures is 16. The number of tables is 5. The number of words is 9717
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82]  arXiv:2403.01255 (cross-list from cs.SD) [pdf, other]
Title: Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
Journal-ref: Information Fusion, Elsevier, 2024
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[83]  arXiv:2403.01278 (cross-list from cs.SD) [pdf, other]
Title: Enhancing Audio Generation Diversity with Visual Information
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84]  arXiv:2403.01699 (cross-list from cs.CL) [pdf, other]
Title: Brilla AI: AI Contestant for the National Science and Maths Quiz
Comments: 14 pages. Accepted for the WideAIED track at the 25th International Conference on AI in Education (AIED 2024)
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85]  arXiv:2403.01700 (cross-list from cs.SD) [pdf, other]
Title: Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer
Comments: Accepted by ICASSP 2024
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[86]  arXiv:2403.01785 (cross-list from cs.SD) [pdf, other]
Title: What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87]  arXiv:2403.01792 (cross-list from cs.SD) [pdf, other]
Title: ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88]  arXiv:2403.01960 (cross-list from cs.SD) [pdf, other]
Title: A robust audio deepfake detection system via multi-view feature
Comments: 5 pages, 2 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89]  arXiv:2403.02002 (cross-list from cs.SD) [pdf, other]
Title: Fine-Grained Quantitative Emotion Editing for Speech Generation
Comments: This paper is submitted to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90]  arXiv:2403.02010 (cross-list from cs.SD) [pdf, other]
Title: SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91]  arXiv:2403.02687 (cross-list from cs.HC) [src]
Title: Enhanced DareFightingICE Competitions: Sound Design and AI Competitions
Comments: This paper describes a new competition platform using Unity for our competitions at the 2024 IEEE Conference on Games (CoG 2024). It was accepted for presentation at CoG 2024. However, we recently discovered a much more effective way to do this task without using Unity, leading to our decision to withdraw the paper from CoG 2024 and ArXiv
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92]  arXiv:2403.02701 (cross-list from cs.SD) [pdf, other]
Title: Fighting Game Adaptive Background Music for Improved Gameplay
Comments: This is an updated version of our IEEE CoG 2023 paper (this https URL). This version has revised the description of the association between the distance between the two players (PD) and the instrument's volume on page 2. arXiv admin note: substantial text overlap with arXiv:2303.15734
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[93]  arXiv:2403.02918 (cross-list from cs.RO) [pdf, other]
Title: Single-Channel Robot Ego-Speech Filtering during Human-Robot Interaction
Comments: Accepted by ACM Technological Advances in Human-Robot Interaction. 9 pages
Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94]  arXiv:2403.02938 (cross-list from cs.CL) [pdf, other]
Title: AIx Speed: Playback Speed Optimization Using Listening Comprehension of Speech Recognition Models
Journal-ref: AHs '23: Proceedings of the Augmented Humans International Conference 2023
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95]  arXiv:2403.03095 (cross-list from cs.CV) [pdf, other]
Title: Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization
Comments: Accepted To ICASSP2024
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96]  arXiv:2403.03145 (cross-list from cs.CV) [pdf, other]
Title: Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Comments: Accepted to NeurIPS2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97]  arXiv:2403.03224 (cross-list from physics.soc-ph) [pdf, other]
Title: Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory
Comments: 16 pages, 4 figures
Subjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98]  arXiv:2403.03395 (cross-list from cs.SD) [pdf, other]
Title: Interactive Melody Generation System for Enhancing the Creativity of Musicians
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[99]  arXiv:2403.03411 (cross-list from cs.SD) [pdf, other]
Title: CrossNet: Leveraging Global, Cross-Band, Narrow-Band, and Positional Encoding for Single- and Multi-Channel Speaker Separation
Comments: 9 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100]  arXiv:2403.03510 (cross-list from cs.SD) [pdf, other]
Title: METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D
Comments: 4
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph)
[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help  (Access key information)