Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024

[ total of 213 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-213 ]
[ showing 50 entries per page: fewer | more | all ]

[1] arXiv:2403.00293 [pdf, other]: Title: Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification

Authors: Mufan Sang, John H.L. Hansen

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2] arXiv:2403.00379 [pdf, other]: Title: The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[3] arXiv:2403.00887 [pdf, other]: Title: SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

Authors: Aron R, Indra Sigicharla, Chirag Periwal, Mohanaprasad K, Nithya Darisini P S, Sourabh Tiwari, Shivani Arora

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[4] arXiv:2403.01130 [pdf, other]: Title: Arbitrary Discrete Fourier Analysis and Its Application in Replayed Speech Detection

Authors: Shih-Kuang Lee

Comments: this https URL

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[5] arXiv:2403.01355 [pdf, ps, other]: Title: a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification

Authors: Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot

Comments: 8 pages, submitted to Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[6] arXiv:2403.01369 [pdf, other]: Title: A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Comments: 8 pages; Shorter form accepted in ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[7] arXiv:2403.01494 [pdf, other]: Title: PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

Authors: Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian

Comments: Accepted to ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[8] arXiv:2403.01670 [pdf, other]: Title: 6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human

Authors: Masahiro Yasuda, Shoichiro Saito, Akira Nakayama, Noboru Harada

Comments: ICASSP2024 accepted

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2403.02167 [pdf, other]: Title: Speech emotion recognition from voice messages recorded in the wild

Authors: Lucía Gómez-Zaragozá, Óscar Valls, Rocío del Amor, María José Castro-Bleda, Valery Naranjo, Mariano Alcañiz Raya, Javier Marín-Morales

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[10] arXiv:2403.02288 [pdf, other]: Title: PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

Authors: Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin

Comments: submitted to Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS)
[11] arXiv:2403.02371 [pdf, other]: Title: NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

Authors: Janaína Mendes-Laureano, Jorge A. Gómez-García, Alejandro Guerrero-López, Elisa Luque-Buzo, Julián D. Arias-Londoño, Francisco J. Grandas-Pérez, Juan I. Godino-Llorente

Comments: Preprint version

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[12] arXiv:2403.03100 [pdf, other]: Title: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2403.03611 [pdf, ps, other]: Title: Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Authors: Dang Thoai Phan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14] arXiv:2403.04433 [pdf, ps, other]: Title: On the Use of Autoregressive Methods for Audio Inpainting

Authors: Ondřej Mokrý, Pavel Rajmic

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15] arXiv:2403.04743 [pdf, other]: Title: Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism

Authors: Xiaoyu Tang, Yixin Lin, Ting Dang, Yuanfang Zhang, Jintao Cheng

Subjects: Audio and Speech Processing (eess.AS)
[16] arXiv:2403.04800 [pdf, other]: Title: (Un)paired signal-to-signal translation with 1D conditional GANs

Authors: Eric Easthope

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[17] arXiv:2403.04804 [pdf, other]: Title: AttentionStitch: How Attention Solves the Speech Editing Problem

Authors: Antonios Alexos, Pierre Baldi

Comments: Accepted in Machine Learning for Audio workship in NeurIPS 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[18] arXiv:2403.05187 [pdf, other]: Title: Robust Semantic Communications for Speech Transmission

Authors: Zhenzi Weng, Zhijin Qin

Subjects: Audio and Speech Processing (eess.AS)
[19] arXiv:2403.05393 [pdf, other]: Title: Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks

Authors: Vikas Tokala, Eric Grinstein, Mike Brookes, Simon Doclo, Jesper Jensen, Patrick A. Naylor

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS)
[20] arXiv:2403.05791 [pdf, other]: Title: Asynchronous Microphone Array Calibration using Hybrid TDOA Information

Authors: Chengjie Zhang, Jiang Wang, He Kong

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21] arXiv:2403.05887 [pdf, other]: Title: Aligning Speech to Languages to Enhance Code-switching Speech Recognition

Authors: Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

Comments: Manuscript submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS)
[22] arXiv:2403.06847 [pdf, other]: Title: SonoTraceLab -- A Raytracing-Based Acoustic Modelling System for Simulating Echolocation Behavior of Bats

Authors: Wouter Jansen, Jan Steckel

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2403.06856 [pdf, other]: Title: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Authors: Amit Eliav, Sharon Gannot

Comments: 5 pages, 6 tables, 2 figures

Subjects: Audio and Speech Processing (eess.AS)
[24] arXiv:2403.07579 [pdf, other]: Title: On HRTF Notch Frequency Prediction Using Anthropometric Features and Neural Networks

Authors: Lior Arbel, Ishwarya Ananthabhotla, Zamir Ben-Hur, David Lou Alon, Boaz Rafaely

Subjects: Audio and Speech Processing (eess.AS)
[25] arXiv:2403.07661 [pdf, other]: Title: Gender-ambiguous voice generation through feminine speaking style transfer in male voices

Authors: Maria Koutsogiannaki, Shafel Mc Dowall, Ioannis Agiomyrgiannakis

Comments: submitted to Interspeech

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[26] arXiv:2403.07767 [pdf, ps, other]: Title: Beyond the Labels: Unveiling Text-Dependency in Paralinguistic Speech Recognition Datasets

Authors: Jan Pešán, Santosh Kesiraju, Lukáš Burget, Jan ''Honza'' Černocký

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
[27] arXiv:2403.07937 [pdf, other]: Title: Speech Robust Bench: A Robustness Benchmark For Speech Recognition

Authors: Muhammad A. Shah, David Solans Noguero, Mikko A. Heikkila, Nicolas Kourtellis

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[28] arXiv:2403.07947 [pdf, ps, other]: Title: The evaluation of a code-switched Sepedi-English automatic speech recognition system

Authors: Amanda Phaladi, Thipe Modipa

Comments: 13 pages,2 figures,2nd International Conference on NLP & AI (NLPAI 2024)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[29] arXiv:2403.08654 [pdf, other]: Title: An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

Comments: Under review on IEEE Transactions on Audio, Speech, and Language Processing (2024)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30] arXiv:2403.09524 [pdf, other]: Title: Physics-Informed Neural Network for Volumetric Sound field Reconstruction of Speech Signals

Authors: Marco Olivieri, Xenofon Karakonstantis, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti, Efren Fernandez-Grande

Subjects: Audio and Speech Processing (eess.AS)
[31] arXiv:2403.09527 [pdf, other]: Title: WavCraft: Audio Editing and Generation with Large Language Models

Authors: Jinhua Liang, Huan Zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D. Plumbley, Huy Phan, Emmanouil Benetos

Subjects: Audio and Speech Processing (eess.AS)
[32] arXiv:2403.09789 [pdf, other]: Title: Audiosockets: A Python socket package for Real-Time Audio Processing

Authors: Nicolas Shu, David V. Anderson

Comments: 4 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[33] arXiv:2403.10271 [pdf, other]: Title: SuperME: Supervised and Mixture-to-Mixture Co-Learning for Speech Enhancement and Robust ASR

Authors: Zhong-Qiu Wang

Comments: in submission

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[34] arXiv:2403.10420 [pdf, other]: Title: Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks

Authors: Peter Leer, Jesper Jensen, Laurel Carney, Zheng-Hua Tan, Jan Østergaard, Lars Bramsløw

Subjects: Audio and Speech Processing (eess.AS)
[35] arXiv:2403.10428 [pdf, other]: Title: How to train your ears: Auditory-model emulation for large-dynamic-range inputs and mild-to-severe hearing losses

Authors: Peter Leer, Jesper Jensen, Zheng-Hua Tan, Jan Østergaard, Lars Bramsløw

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. This version is the authors' version and may vary from the final publication in details

Subjects: Audio and Speech Processing (eess.AS)
[36] arXiv:2403.10548 [pdf, other]: Title: Two-sided Acoustic Metascreen for Broadband and Individual Reflection and Transmission Control

Authors: Ao Chen, Xin Zhang

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37] arXiv:2403.10565 [pdf, other]: Title: PTSD-MDNN : Fusion tardive de réseaux de neurones profonds multimodaux pour la détection du trouble de stress post-traumatique

Authors: Long Nguyen-Phuoc, Renald Gaboriau, Dimitri Delacroix, Laurent Navarro

Comments: in French language. GRETSI 2023

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC)
[38] arXiv:2403.10756 [pdf, other]: Title: Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

Authors: Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

Comments: Submitted to EUSIPCO2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2403.10937 [pdf, other]: Title: Initial Decoding with Minimally Augmented Language Model for Improved Lattice Rescoring in Low Resource ASR

Authors: Savitha Murthy, Dinkar Sitaram

Comments: 14 pages, 7 figures, Accepted in Sadhana Journal

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[40] arXiv:2403.11037 [pdf, other]: Title: Fine-Grained Engine Fault Sound Event Detection Using Multimodal Signals

Authors: Dennis Fedorishin, Livio Forte III, Philip Schneider, Srirangaraj Setlur, Venu Govindaraju

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[41] arXiv:2403.11508 [pdf, other]: Title: Discriminative Neighborhood Smoothing for Generative Anomalous Sound Detection

Authors: Takuya Fujimura, Keisuke Imoto, Tomoki Toda

Comments: Submitted to EUSIPCO 2024

Subjects: Audio and Speech Processing (eess.AS)
[42] arXiv:2403.11578 [pdf, other]: Title: AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition

Authors: SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo

Subjects: Audio and Speech Processing (eess.AS)
[43] arXiv:2403.12182 [pdf, other]: Title: Latent CLAP Loss for Better Foley Sound Synthesis

Authors: Tornike Karchkhadze, Hassan Salami Kavaki, Mohammad Rasool Izadi, Bryce Irvin, Mikolaj Kegler, Ari Hertz, Shuo Zhang, Marko Stamenovic

Subjects: Audio and Speech Processing (eess.AS)
[44] arXiv:2403.12258 [pdf, other]: Title: A Multi-loudspeaker Binaural Room Impulse Response Dataset with High-Resolution Translational and Rotational Head Coordinates in a Listening Room

Authors: Yue Qiao, Ryan Miguel Gonzales, Edgar Choueiri

Comments: Submitted to Frontiers in Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[45] arXiv:2403.12630 [pdf, other]: Title: Reproducing the Acoustic Velocity Vectors in a Circular Listening Area

Authors: Jiarui Wang, Thushara Abhayapala, Jihui Aimee Zhang, Prasanga Samarasinghe

Comments: Submitted to EUSIPCO 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46] arXiv:2403.13332 [pdf, other]: Title: TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Authors: Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Comments: Accepted by ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47] arXiv:2403.13356 [pdf, other]: Title: KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario

Authors: Huali Zhou, Yuke Lin, Dong Liu, Ming Li

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[48] arXiv:2403.13465 [pdf, other]: Title: BanglaNum -- A Public Dataset for Bengali Digit Recognition from Speech

Authors: Mir Sayeed Mohammad, Azizul Zahid, Md Asif Iqbal

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[49] arXiv:2403.13643 [pdf, ps, other]: Title: Vibration Sensitivity of one-port and two-port MEMS microphones

Authors: Francis Doyon-D'Amour, Carly Stalder, Timothy Hodges, Michel Stephan, Lixiue Wu, Triantafillos Koukoulas, Stephane Leahy, Raphael St-Gelais

Comments: 8 pages, 14 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50] arXiv:2403.14179 [pdf, ps, other]: Title: AdaProj: Adaptively Scaled Angular Margin Subspace Projections for Anomalous Sound Detection with Auxiliary Classification Tasks

Authors: Kevin Wilkinghoff

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

[ total of 213 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-213 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024