Audio and Speech Processing

Authors and titles for recent submissions, skipping first 27

[ total of 55 entries: 1-10 | 8-17 | 18-27 | 28-37 | 38-47 | 48-55 ]
[ showing 10 entries per page: fewer | more | all ]

Fri, 24 May 2024 (continued, showing 10 of 18 entries)

[28] arXiv:2405.14290 (cross-list from cs.SD) [pdf, other]: Title: Frequency-Domain Sound Field from the Perspective of Band-Limited Functions

Authors: Takahiro Iwami, Akira Omoto

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2405.14161 (cross-list from cs.CL) [pdf, other]: Title: Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

Comments: 23 pages, Preprint

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30] arXiv:2405.13762 (cross-list from cs.CV) [pdf, other]: Title: A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2405.13661 (cross-list from cs.SD) [pdf, ps, other]: Title: Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

Authors: Hong Zhang, Jie Lin, Shengxuan Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32] arXiv:2405.13636 (cross-list from cs.SD) [pdf, other]: Title: Audio Mamba: Pretrained Audio State Space Model For Audio Tagging

Authors: Jiaju Lin, Haoxuan Hu

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[33] arXiv:2405.13527 (cross-list from cs.SD) [pdf, other]: Title: End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding

Authors: Wei Zeng, Xian He, Ye Wang

Comments: 8 pages, 5 figures, accepted by IJCAI 2024 - AI, Arts & Creativity Track

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34] arXiv:2405.13477 (cross-list from cs.HC) [pdf, other]: Title: A Near-Real-Time Processing Ego Speech Filtering Pipeline Designed for Speech Interruption During Human-Robot Interaction

Authors: Yue Li, Florian A. Kunneman, Koen V. Hindriks

Comments: 8 pages,16 figures, Under review by RoMan 2024 conference

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2405.13428 (cross-list from cs.SD) [pdf, other]: Title: Ambisonizer: Neural Upmixing as Spherical Harmonics Generation

Authors: Yongyi Zang, Yifan Wang, Minglun Lee

Comments: Under review

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2405.13379 (cross-list from cs.CL) [pdf, ps, other]: Title: You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish

Authors: Ronald Cumbal, Birger Moell, Jose Lopes, Olof Engwall

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37] arXiv:2405.13162 (cross-list from cs.SD) [pdf, ps, other]: Title: Non-autoregressive real-time Accent Conversion model with voice cloning

Authors: Vladimir Nechaev, Sergey Kosyakov

Comments: 8 pages, 6 figures, 3 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

[ total of 55 entries: 1-10 | 8-17 | 18-27 | 28-37 | 38-47 | 48-55 ]
[ showing 10 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 27

Fri, 24 May 2024 (continued, showing 10 of 18 entries)