Audio and Speech Processing

Authors and titles for recent submissions, skipping first 38

[ total of 55 entries: 1-25 | 14-38 | 39-55 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 22 May 2024

[39] arXiv:2405.12609 [pdf, other]: Title: Mamba in Speech: Towards an Alternative to Self-Attention

Authors: Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:2405.12496 [pdf, other]: Title: A Survey of Integrating Wireless Technology into Active Noise Control

Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

Subjects: Audio and Speech Processing (eess.AS); Networking and Internet Architecture (cs.NI); Sound (cs.SD); Signal Processing (eess.SP)
[41] arXiv:2405.12957 (cross-list from cs.SD) [pdf, other]: Title: Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models

Authors: Rudolf Herdt, Louisa Kinzel, Johann Georg Maaß, Marvin Walther, Henning Fröhlich, Tim Schubert, Peter Maass, Christian Patrick Schaaf

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42] arXiv:2405.12899 (cross-list from math.FA) [pdf, other]: Title: On a time-frequency blurring operator with applications in data augmentation

Authors: Simon Halvdansson

Comments: 22 pages, 4 figures

Subjects: Functional Analysis (math.FA); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2405.12847 (cross-list from cs.IR) [pdf, other]: Title: A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

Authors: Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang

Journal-ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, 174-181. Milan, Italy, November 5-9, 2023

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2405.12774 (cross-list from cs.LG) [pdf, ps, other]: Title: Blind Separation of Vibration Sources using Deep Learning and Deconvolution

Authors: Igor Makienko, Michael Grebshtein, Eli Gildish

Comments: 20 pages, 13 figures

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[45] arXiv:2405.12666 (cross-list from cs.SD) [pdf, other]: Title: SYMPLEX: Controllable Symbolic Music Generation using Simplex Diffusion with Vocabulary Priors

Authors: Nicolas Jonason, Luca Casini, Bob L.T. Sturm

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Tue, 21 May 2024

[46] arXiv:2405.11831 [pdf, other]: Title: SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Authors: Siavash Shams, Sukru Samet Dindar, Xilin Jiang, Nima Mesgarani

Comments: Code at this https URL

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[47] arXiv:2405.11792 [pdf, other]: Title: Source Localization by Multidimensional Steered Response Power Mapping with Sparse Bayesian Learning

Authors: Wei-Ting Lai, Lachlan Birnie, Xingyu Chen, Amy Bastine, Thushara D. Abhayapala, Prasanga N. Samarasinghe

Subjects: Audio and Speech Processing (eess.AS)
[48] arXiv:2405.11767 [pdf, other]: Title: Multi-speaker Text-to-speech Training with Speaker Anonymized Data

Authors: Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda

Comments: 5 pages. Submitted to Signal Processing Letters. Audio sample page: this https URL

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[49] arXiv:2405.11592 [pdf, other]: Title: Speech-dependent Data Augmentation for Own Voice Reconstruction with Hearable Microphones in Noisy Environments

Authors: Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

Comments: 19 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS)
[50] arXiv:2405.11413 [pdf, other]: Title: Exploring speech style spaces with language models: Emotional TTS without emotion labels

Authors: Shreeram Suresh Chandra, Zongyang Du, Berrak Sisman

Comments: Accepted at Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[51] arXiv:2405.11093 [pdf, other]: Title: AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations

Authors: David Xu

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[52] arXiv:2405.11078 [pdf, ps, other]: Title: Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System

Authors: Vimal Manohar, Szu-Jui Chen, Zhiqi Wang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

Comments: Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Journal-ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 6665-6669

Subjects: Audio and Speech Processing (eess.AS)
[53] arXiv:2405.12221 (cross-list from cs.CV) [pdf, other]: Title: Images that Sound: Composing Images and Sounds on a Single Canvas

Authors: Ziyang Chen, Daniel Geng, Andrew Owens

Comments: Project site: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54] arXiv:2405.12031 (cross-list from cs.SD) [pdf, other]: Title: Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification

Authors: Nian Li, Jianguo Wei

Comments: 8 pages, 2 figures, 3 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2405.11554 (cross-list from cs.SD) [pdf, other]: Title: DAC-JAX: A JAX Implementation of the Descript Audio Codec

Authors: David Braun

Comments: 5 pages, 3 figures, 2 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 55 entries: 1-25 | 14-38 | 39-55 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 38

Wed, 22 May 2024

Tue, 21 May 2024