We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 27

[ total of 55 entries: 1-10 | 8-17 | 18-27 | 28-37 | 38-47 | 48-55 ]
[ showing 10 entries per page: fewer | more | all ]

Fri, 24 May 2024 (continued, showing 10 of 18 entries)

[28]  arXiv:2405.14290 (cross-list from cs.SD) [pdf, other]
Title: Frequency-Domain Sound Field from the Perspective of Band-Limited Functions
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29]  arXiv:2405.14161 (cross-list from cs.CL) [pdf, other]
Title: Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Comments: 23 pages, Preprint
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30]  arXiv:2405.13762 (cross-list from cs.CV) [pdf, other]
Title: A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31]  arXiv:2405.13661 (cross-list from cs.SD) [pdf, ps, other]
Title: Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32]  arXiv:2405.13636 (cross-list from cs.SD) [pdf, other]
Title: Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
Authors: Jiaju Lin, Haoxuan Hu
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[33]  arXiv:2405.13527 (cross-list from cs.SD) [pdf, other]
Title: End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding
Authors: Wei Zeng, Xian He, Ye Wang
Comments: 8 pages, 5 figures, accepted by IJCAI 2024 - AI, Arts & Creativity Track
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34]  arXiv:2405.13477 (cross-list from cs.HC) [pdf, other]
Title: A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction
Comments: 8 pages,16 figures, Under review by RoMan 2024 conference
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35]  arXiv:2405.13428 (cross-list from cs.SD) [pdf, other]
Title: Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
Comments: Under review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36]  arXiv:2405.13379 (cross-list from cs.CL) [pdf, ps, other]
Title: You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37]  arXiv:2405.13162 (cross-list from cs.SD) [pdf, ps, other]
Title: Non-autoregressive real-time Accent Conversion model with voice cloning
Comments: 8 pages, 6 figures, 3 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[ total of 55 entries: 1-10 | 8-17 | 18-27 | 28-37 | 38-47 | 48-55 ]
[ showing 10 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2405, contact, help  (Access key information)