Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:2403.00293 [pdf, other]: Title: Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification

Authors: Mufan Sang, John H.L. Hansen

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2] arXiv:2403.00379 [pdf, other]: Title: The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[3] arXiv:2403.00887 [pdf, other]: Title: SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

Authors: Aron R, Indra Sigicharla, Chirag Periwal, Mohanaprasad K, Nithya Darisini P S, Sourabh Tiwari, Shivani Arora

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[4] arXiv:2403.01130 [pdf, other]: Title: Arbitrary Discrete Fourier Analysis and Its Application in Replayed Speech Detection

Authors: Shih-Kuang Lee

Comments: this https URL

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[5] arXiv:2403.01355 [pdf, ps, other]: Title: a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification

Authors: Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, Itshak Lapidot

Comments: 8 pages, submitted to Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[6] arXiv:2403.01369 [pdf, other]: Title: A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

Comments: 8 pages; Shorter form accepted in ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[7] arXiv:2403.01494 [pdf, other]: Title: PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

Authors: Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian

Comments: Accepted to ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[8] arXiv:2403.01670 [pdf, other]: Title: 6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human

Authors: Masahiro Yasuda, Shoichiro Saito, Akira Nakayama, Noboru Harada

Comments: ICASSP2024 accepted

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2403.02167 [pdf, other]: Title: Speech emotion recognition from voice messages recorded in the wild

Authors: Lucía Gómez-Zaragozá, Óscar Valls, Rocío del Amor, María José Castro-Bleda, Valery Naranjo, Mariano Alcañiz Raya, Javier Marín-Morales

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[10] arXiv:2403.02288 [pdf, other]: Title: PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

Authors: Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin

Comments: submitted to Speaker Odyssey 2024

Subjects: Audio and Speech Processing (eess.AS)
[11] arXiv:2403.02371 [pdf, other]: Title: NeuroVoz: a Castillian Spanish corpus of parkinsonian speech

Authors: Janaína Mendes-Laureano, Jorge A. Gómez-García, Alejandro Guerrero-López, Elisa Luque-Buzo, Julián D. Arias-Londoño, Francisco J. Grandas-Pérez, Juan I. Godino-Llorente

Comments: Preprint version

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[12] arXiv:2403.03100 [pdf, other]: Title: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2403.03611 [pdf, ps, other]: Title: Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Authors: Dang Thoai Phan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14] arXiv:2403.04433 [pdf, ps, other]: Title: On the Use of Autoregressive Methods for Audio Inpainting

Authors: Ondřej Mokrý, Pavel Rajmic

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15] arXiv:2403.04743 [pdf, other]: Title: Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism

Authors: Xiaoyu Tang, Yixin Lin, Ting Dang, Yuanfang Zhang, Jintao Cheng

Subjects: Audio and Speech Processing (eess.AS)
[16] arXiv:2403.04800 [pdf, other]: Title: (Un)paired signal-to-signal translation with 1D conditional GANs

Authors: Eric Easthope

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[17] arXiv:2403.04804 [pdf, other]: Title: AttentionStitch: How Attention Solves the Speech Editing Problem

Authors: Antonios Alexos, Pierre Baldi

Comments: Accepted in Machine Learning for Audio workship in NeurIPS 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
[18] arXiv:2403.05187 [pdf, other]: Title: Robust Semantic Communications for Speech Transmission

Authors: Zhenzi Weng, Zhijin Qin

Subjects: Audio and Speech Processing (eess.AS)
[19] arXiv:2403.05393 [pdf, other]: Title: Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks

Authors: Vikas Tokala, Eric Grinstein, Mike Brookes, Simon Doclo, Jesper Jensen, Patrick A. Naylor

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS)
[20] arXiv:2403.05791 [pdf, other]: Title: Asynchronous Microphone Array Calibration using Hybrid TDOA Information

Authors: Chengjie Zhang, Jiang Wang, He Kong

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21] arXiv:2403.05887 [pdf, other]: Title: Aligning Speech to Languages to Enhance Code-switching Speech Recognition

Authors: Hexin Liu, Xiangyu Zhang, Leibny Paola Garcia, Andy W. H. Khong, Eng Siong Chng, Shinji Watanabe

Comments: Manuscript submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Audio and Speech Processing (eess.AS)
[22] arXiv:2403.06847 [pdf, other]: Title: SonoTraceLab -- A Raytracing-Based Acoustic Modelling System for Simulating Echolocation Behavior of Bats

Authors: Wouter Jansen, Jan Steckel

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2403.06856 [pdf, other]: Title: Concurrent Speaker Detection: A multi-microphone Transformer-Based Approach

Authors: Amit Eliav, Sharon Gannot

Comments: 5 pages, 6 tables, 2 figures

Subjects: Audio and Speech Processing (eess.AS)
[24] arXiv:2403.07579 [pdf, other]: Title: On HRTF Notch Frequency Prediction Using Anthropometric Features and Neural Networks

Authors: Lior Arbel, Ishwarya Ananthabhotla, Zamir Ben-Hur, David Lou Alon, Boaz Rafaely

Subjects: Audio and Speech Processing (eess.AS)
[25] arXiv:2403.07661 [pdf, other]: Title: Gender-ambiguous voice generation through feminine speaking style transfer in male voices

Authors: Maria Koutsogiannaki, Shafel Mc Dowall, Ioannis Agiomyrgiannakis

Comments: submitted to Interspeech

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

[ total of 213 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 201-213 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Mar 2024