Sound

Authors and titles for recent submissions, skipping first 13

[ total of 83 entries: 1-25 | 14-38 | 39-63 | 64-83 ]
[ showing 25 entries per page: fewer | more | all ]

Fri, 7 Jun 2024 (continued, showing last 2 of 15 entries)

[14] arXiv:2406.03637 (cross-list from eess.AS) [pdf, other]: Title: Style Mixture of Experts for Expressive Text-To-Speech Synthesis

Authors: Ahad Jawaid, Shreeram Suresh Chandra, Junchen Lu, Berrak Sisman

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[15] arXiv:2405.19334 (cross-list from cs.AI) [pdf, other]: Title: LLMs Meet Multimodal Generation and Editing: A Survey

Authors: Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen

Comments: 51 Pages with 16 Figures, 12 Tables, and 534 References. GitHub Repository at: this https URL

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD)

Thu, 6 Jun 2024 (showing first 23 of 24 entries)

[16] arXiv:2406.03344 [pdf, other]: Title: Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Authors: Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung

Comments: Code is available at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[17] arXiv:2406.03251 [pdf, other]: Title: ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

Authors: Theo Mariotte, Anthony Larcher, Silvio Montresor, Jean-Hugh Thomas

Comments: 5 pages, 2 figures, 2 tables, accepted at Interspeech 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18] arXiv:2406.03247 [pdf, other]: Title: Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

Comments: Accepted by INTERSPEECH 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2406.03240 [pdf, other]: Title: Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion strategy

Authors: Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, Jianhua Tao

Comments: Accepted by INTERSPEECH 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[20] arXiv:2406.03237 [pdf, other]: Title: Generalized Fake Audio Detection via Deep Stable Learning

Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

Comments: accepted by INTERSPEECH2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2406.03138 [pdf, other]: Title: A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Comments: 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2406.02963 [pdf, other]: Title: Dataset-Distillation Generative Model for Speech Emotion Recognition

Authors: Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H.M Wong, Dianwen Ng, Hung-yi Lee, Nancy F. Chen, Eng Siong Chng

Comments: Accepted at Interspeech 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2406.02940 [pdf, other]: Title: Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Authors: Haohan Guo, Fenglong Xie, Dongchao Yang, Hui Lu, Xixin Wu, Helen Meng

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2406.02897 [pdf, other]: Title: LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Authors: Trung Dang, David Aponte, Dung Tran, Kazuhito Koishida

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2406.02565 [pdf, other]: Title: Sequence-to-sequence models in peer-to-peer learning: A practical application

Authors: Robert Šajina, Ivo Ipšić

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Multiagent Systems (cs.MA); Audio and Speech Processing (eess.AS)
[26] arXiv:2406.03460 (cross-list from eess.AS) [pdf, other]: Title: The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement

Authors: Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann

Comments: Accepted at Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[27] arXiv:2406.03407 (cross-list from cs.LG) [pdf, other]: Title: Physics and geometry informed neural operator network with application to acoustic scattering

Authors: Siddharth Nair, Timothy F. Walsh, Greg Pickrell, Fabio Semperlotti

Comments: 20 pages of main text, 9 figures

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Computational Physics (physics.comp-ph)
[28] arXiv:2406.03274 (cross-list from eess.AS) [pdf, other]: Title: Enhancing CTC-based speech recognition with diverse modeling units

Authors: Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[29] arXiv:2406.03120 (cross-list from eess.AS) [pdf, other]: Title: RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification

Authors: Jacob Bitterman, Daniel Levi, Hilel Hagai Diamandi, Sharon Gannot, Tal Rosenwein

Comments: Accepted to Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[30] arXiv:2406.03049 (cross-list from cs.CL) [pdf, other]: Title: StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

Comments: Accepted to ACL 2024 main conference, Project Page: this https URL

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2406.02951 (cross-list from cs.CV) [pdf, other]: Title: AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection

Authors: Trevine Oorloff, Surya Koppisetti, Nicolò Bonettini, Divyaraj Solanki, Ben Colman, Yaser Yacoob, Ali Shahriyari, Gaurav Bharaj

Comments: Accepted to CVPR 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32] arXiv:2406.02950 (cross-list from eess.AS) [pdf, other]: Title: 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Authors: Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe

Comments: submitted to IEEE/ACM Transactions on Audio Speech and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[33] arXiv:2406.02925 (cross-list from eess.AS) [pdf, other]: Title: SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation

Authors: Hsuan Su, Hua Farn, Shang-Tse Chen, Hung-yi Lee

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[34] arXiv:2406.02887 (cross-list from eess.AS) [pdf, other]: Title: USM RNN-T model weights binarization

Authors: Oleg Rybakov, Dmitriy Serdyuk, Chengjian Zheng

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[35] arXiv:2406.02859 (cross-list from eess.AS) [pdf, ps, other]: Title: ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

Comments: Accepted by Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36] arXiv:2406.02733 (cross-list from cs.CL) [pdf, other]: Title: Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

Authors: Min-Jae Hwang, Ilia Kulikov, Benjamin Peloquin, Hongyu Gong, Peng-Jen Chen, Ann Lee

Comments: Accepted to ACL 2024 (findings)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37] arXiv:2406.02649 (cross-list from eess.AS) [pdf, other]: Title: Keyword-Guided Adaptation of Automatic Speech Recognition

Authors: Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet

Comments: Accepted to InterSpeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[38] arXiv:2406.02572 (cross-list from eess.AS) [pdf, other]: Title: Selfsupervised learning for pathological speech detection

Authors: Shakeel Ahmad Sheikh

Comments: in Intersection of Book Chapter in Machine Leanring and Computational Social Sciences CRC (in progress) 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

[ total of 83 entries: 1-25 | 14-38 | 39-63 | 64-83 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2406, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 13

Fri, 7 Jun 2024 (continued, showing last 2 of 15 entries)

Thu, 6 Jun 2024 (showing first 23 of 24 entries)