Audio and Speech Processing

Authors and titles for recent submissions, skipping first 21

[ total of 130 entries: 1-50 | 22-71 | 72-121 | 122-130 ]
[ showing 50 entries per page: fewer | more | all ]

Mon, 10 Jun 2024 (continued, showing last 6 of 27 entries)

[22] arXiv:2406.04673 (cross-list from cs.CV) [pdf, other]: Title: MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

Authors: Sanjoy Chowdhury, Sayan Nag, K J Joseph, Balaji Vasan Srinivasan, Dinesh Manocha

Comments: Accepted at CVPR 2024 as Highlight paper. Webpage: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[23] arXiv:2406.04595 (cross-list from cs.SD) [pdf, other]: Title: Pitch-Aware RNN-T for Mandarin Chinese Mispronunciation Detection and Diagnosis

Authors: Xintong Wang, Mingqian Shi, Ye Wang

Comments: Accepted at Interspeech 2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[24] arXiv:2406.04589 (cross-list from cs.SD) [pdf, other]: Title: MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion Enhanced Taylor Transformer for U-Net-based Speech Enhancement

Authors: Zizhen Lin, Xiaoting Chen, Junyu Wang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2406.04541 (cross-list from cs.CL) [pdf, other]: Title: Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation

Authors: Keqi Deng, Philip C. Woodland

Comments: Accepted by ACL 2024 Main Conference

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[26] arXiv:2406.04512 (cross-list from cs.CL) [pdf, other]: Title: To Distill or Not to Distill? On the Robustness of Robust Knowledge Distillation

Authors: Abdul Waheed, Karima Kadaoui, Muhammad Abdul-Mageed

Comments: Accepted at ACL'24 main

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2406.04350 (cross-list from cs.SD) [pdf, other]: Title: Prompt-guided Precise Audio Editing with Diffusion Models

Authors: Manjie Xu, Chenxing Li, Duzhen zhang, Dan Su, Wei Liang, Dong Yu

Comments: Accepted by ICML 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Fri, 7 Jun 2024

[28] arXiv:2406.04281 [pdf, other]: Title: Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Jinyu Li, Sheng Zhao, Naoyuki Kanda

Comments: Accepted to Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS)
[29] arXiv:2406.04269 [pdf, other]: Title: Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Authors: Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian

Comments: 5 pages, 3 figures, 4 tables, Accepted by Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30] arXiv:2406.04212 [pdf, ps, other]: Title: Sound Event Bounding Boxes

Authors: Janek Ebbers, Francois G. Germain, Gordon Wichern, Jonathan Le Roux

Comments: Accepted for publication at Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[31] arXiv:2406.04123 [pdf, other]: Title: Helsinki Speech Challenge 2024

Authors: Martin Ludvigsen, Elli Karvonen, Markus Juvonen, Samuli Siltanen

Subjects: Audio and Speech Processing (eess.AS)
[32] arXiv:2406.03899 [pdf, other]: Title: PLDNet: PLD-Guided Lightweight Deep Network Boosted by Efficient Attention for Handheld Dual-Microphone Speech Enhancement

Authors: Nan Zhou, Youhai Jiang, Jialin Tan, Chongmin Qi

Comments: Accepted at Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[33] arXiv:2406.03657 [pdf, other]: Title: UrBAN: Urban Beehive Acoustics and PheNotyping Dataset

Authors: Mahsa Abdollahi, Yi Zhu, Heitor R. Guimarães, Nico Coallier, Ségolène Maucourt, Pierre Giovenazzo, Tiago H. Falk

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[34] arXiv:2406.03637 [pdf, other]: Title: Style Mixture of Experts for Expressive Text-To-Speech Synthesis

Authors: Ahad Jawaid, Shreeram Suresh Chandra, Junchen Lu, Berrak Sisman

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[35] arXiv:2406.03514 [pdf, other]: Title: NeuRO: An Application for Code-Switched Autism Detection in Children

Authors: Mohd Mujtaba Akhtar, Girish, Orchid Chetia Phukan, Muskaan Singh

Comments: Accepted to INTERSPEECH 24 Show & Tell Demonstrations

Subjects: Audio and Speech Processing (eess.AS)
[36] arXiv:2406.04140 (cross-list from cs.SD) [pdf, other]: Title: STraDa: A Singer Traits Dataset

Authors: Yuexuan Kong, Viet-Anh Tran, Romain Hennequin

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37] arXiv:2406.03882 (cross-list from cs.CL) [pdf, other]: Title: Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

Authors: Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

Comments: Accepted by Interspeech 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:2406.03872 (cross-list from cs.CL) [pdf, other]: Title: BLSP-Emo: Towards Empathetic Large Speech-Language Models

Authors: Chen Wang, Minpeng Liao, Zhongqiang Huang, Junhong Wu, Chengqing Zong, Jiajun Zhang

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:2406.03822 (cross-list from cs.SD) [pdf, other]: Title: SilentCipher: Deep Audio Watermarking

Authors: Mayank Kumar Singh, Naoya Takahashi, Weihsiang Liao, Yuki Mitsufuji

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[40] arXiv:2406.03814 (cross-list from cs.CL) [pdf, other]: Title: Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

Authors: Jiaming Zhou, Shiwan Zhao, Hui Wang, Tian-Hao Zhang, Haoqin Sun, Xuechen Wang, Yong Qin

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:2406.03714 (cross-list from cs.SD) [pdf, other]: Title: Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

Authors: Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

Comments: Accepted by Interspeech 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:2406.03706 (cross-list from cs.SD) [pdf, other]: Title: Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Authors: Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

Comments: Accepted by Interspeech 2024

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[43] arXiv:2406.03512 (cross-list from cs.SD) [pdf, other]: Title: Harder or Different? Understanding Generalization of Audio Deepfake Detection

Authors: Nicolas M. Müller, Nicholas Evans, Hemlata Tak, Philip Sperl, Konstantin Böttinger

Journal-ref: Interspeech 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[44] arXiv:2406.03510 (cross-list from cs.SD) [pdf, other]: Title: Speech-based Clinical Depression Screening: An Empirical Study

Authors: Yangbin Chen, Chenyang Xu, Chunfeng Liang, Yanbao Tao, Chuan Shi

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Thu, 6 Jun 2024 (showing first 27 of 41 entries)

[45] arXiv:2406.03460 [pdf, other]: Title: The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement

Authors: Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann

Comments: Accepted at Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[46] arXiv:2406.03274 [pdf, other]: Title: Enhancing CTC-based speech recognition with diverse modeling units

Authors: Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[47] arXiv:2406.03272 [pdf, other]: Title: Multi-Microphone Speech Emotion Recognition using the Hierarchical Token-semantic Audio Transformer Architecture

Authors: Ohad Cohen, Gershon Hazan, Sharon Gannot

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[48] arXiv:2406.03228 [pdf, other]: Title: Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement

Authors: Wang Dai, Xiaofei Li, Archontis Politis, Tuomas Virtanen

Comments: Accepted by EUSIPCO 2024

Subjects: Audio and Speech Processing (eess.AS)
[49] arXiv:2406.03205 [pdf, other]: Title: CoLLAB: A Collaborative Approach for Multilingual Abuse Detection

Authors: Orchid Chetia Phukan, Yashasvi Chaurasia, Arun Balaji Buduru, Rajesh Sharma

Subjects: Audio and Speech Processing (eess.AS)
[50] arXiv:2406.03155 [pdf, other]: Title: Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment

Authors: Christoph Boeddeker, Tobias Cord-Landwehr, Reinhold Haeb-Umbach

Comments: Accepted for Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS)
[51] arXiv:2406.03120 [pdf, other]: Title: RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification

Authors: Jacob Bitterman, Daniel Levi, Hilel Hagai Diamandi, Sharon Gannot, Tal Rosenwein

Comments: Accepted to Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[52] arXiv:2406.03111 [pdf, other]: Title: Singing Voice Graph Modeling for SingFake Detection

Authors: Xuanjun Chen, Haibin Wu, Jyh-Shing Roger Jang, Hung-yi Lee

Comments: Accepted by Interspeech 2024; Codebase available at: this https URL

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[53] arXiv:2406.02950 [pdf, other]: Title: 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Authors: Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe

Comments: submitted to IEEE/ACM Transactions on Audio Speech and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[54] arXiv:2406.02925 [pdf, other]: Title: SYN2REAL: Leveraging Task Arithmetic for Mitigating Synthetic-Real Discrepancies in ASR Domain Adaptation

Authors: Hsuan Su, Hua Farn, Shang-Tse Chen, Hung-yi Lee

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[55] arXiv:2406.02887 [pdf, other]: Title: USM RNN-T model weights binarization

Authors: Oleg Rybakov, Dmitriy Serdyuk, Chengjian Zheng

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56] arXiv:2406.02859 [pdf, ps, other]: Title: ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

Comments: Accepted by Interspeech 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2406.02652 [pdf, other]: Title: RepCNN: Micro-sized, Mighty Models for Wakeword Detection

Authors: Arnav Kundu, Prateeth Nayak, Hywel Richards, Priyanka Padmanabhan, Devang Naik

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[58] arXiv:2406.02649 [pdf, other]: Title: Keyword-Guided Adaptation of Automatic Speech Recognition

Authors: Aviv Shamsian, Aviv Navon, Neta Glazer, Gill Hetz, Joseph Keshet

Comments: Accepted to InterSpeech 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[59] arXiv:2406.02608 [pdf, other]: Title: PPINtonus: Early Detection of Parkinson's Disease Using Deep-Learning Tonal Analysis

Authors: Varun Reddy

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[60] arXiv:2406.02572 [pdf, other]: Title: Selfsupervised learning for pathological speech detection

Authors: Shakeel Ahmad Sheikh

Comments: in Intersection of Book Chapter in Machine Leanring and Computational Social Sciences CRC (in progress) 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[61] arXiv:2406.02569 [pdf, other]: Title: Cluster-to-Predict Affect Contours from Speech

Authors: Gökhan Kuşçu, Engin Erzin

Comments: 8 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC)
[62] arXiv:2406.02566 [pdf, other]: Title: Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition

Authors: Ognjen Kundacina, Vladimir Vincan, Dragisa Miskovic

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[63] arXiv:2406.02563 [pdf, other]: Title: A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system

Authors: Sunil Kumar Kopparapu, Ashish Panda

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[64] arXiv:2406.02562 [pdf, other]: Title: Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices

Authors: Gwantae Kim, Bokyeung Lee, Donghyeon Kim, Hanseok Ko

Comments: Table 2 is revised

Journal-ref: ICASSP 2024 Workshop(HSCMA 2024) paper

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[65] arXiv:2406.02561 [pdf, ps, other]: Title: Breaking Walls: Pioneering Automatic Speech Recognition for Central Kurdish: End-to-End Transformer Paradigm

Authors: Abdulhady Abas Abdullah, Hadi Veisi, Tarik Rashid

Comments:

Subjects: Audio and Speech Processing (eess.AS)
[66] arXiv:2406.02560 [pdf, other]: Title: Less Peaky and More Accurate CTC Forced Alignment by Label Priors

Authors: Ruizhe Huang, Xiaohui Zhang, Zhaoheng Ni, Li Sun, Moto Hira, Jeff Hwang, Vimal Manohar, Vineel Pratap, Matthew Wiesner, Shinji Watanabe, Daniel Povey, Sanjeev Khudanpur

Comments: Accepted by ICASSP 2024. Github repo: this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[67] arXiv:2406.02555 [pdf, ps, other]: Title: PhoWhisper: Automatic Speech Recognition for Vietnamese

Authors: Thanh-Thien Le, Linh The Nguyen, Dat Quoc Nguyen

Comments: Accepted to ICLR 2024 Tiny Papers Track

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[68] arXiv:2406.02554 [pdf, other]: Title: Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
[69] arXiv:2406.03407 (cross-list from cs.LG) [pdf, other]: Title: Physics and geometry informed neural operator network with application to acoustic scattering

Authors: Siddharth Nair, Timothy F. Walsh, Greg Pickrell, Fabio Semperlotti

Comments: 20 pages of main text, 9 figures

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Computational Physics (physics.comp-ph)
[70] arXiv:2406.03344 (cross-list from cs.SD) [pdf, other]: Title: Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Authors: Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung

Comments: Code is available at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[71] arXiv:2406.03251 (cross-list from cs.SD) [pdf, other]: Title: ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

Authors: Theo Mariotte, Anthony Larcher, Silvio Montresor, Jean-Hugh Thomas

Comments: 5 pages, 2 figures, 2 tables, accepted at Interspeech 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

[ total of 130 entries: 1-50 | 22-71 | 72-121 | 122-130 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2406, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 21

Mon, 10 Jun 2024 (continued, showing last 6 of 27 entries)

Fri, 7 Jun 2024

Thu, 6 Jun 2024 (showing first 27 of 41 entries)