We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Dec 2023

[ total of 233 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-233 ]
[ showing 50 entries per page: fewer | more | all ]
[1]  arXiv:2312.00174 [pdf, other]
Title: Compression of end-to-end non-autoregressive image-to-speech system for low-resourced devices
Comments: 5 pages, 2 figures, 2 tables, presented at the 15th ITG Conference on Speech Communications, September 2023, Aachen
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
[2]  arXiv:2312.00231 [pdf, other]
Title: Learning domain-invariant classifiers for infant cry sounds
Subjects: Audio and Speech Processing (eess.AS)
[3]  arXiv:2312.00249 [pdf, other]
Title: Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities
Subjects: Audio and Speech Processing (eess.AS)
[4]  arXiv:2312.00698 [pdf, other]
Title: SPIRE-SIES: A Spontaneous Indian English Speech Corpus
Comments: 6 pages, 7 plots, 3 tables, Accepted at O-COCOSDA 2023
Subjects: Audio and Speech Processing (eess.AS)
[5]  arXiv:2312.01744 [pdf, other]
Title: SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement
Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023
Subjects: Audio and Speech Processing (eess.AS)
[6]  arXiv:2312.01808 [pdf, ps, other]
Title: Head Orientation Estimation with Distributed Microphones Using Speech Radiation Patterns
Comments: 6 pages, submitted to 57th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7]  arXiv:2312.02581 [pdf, ps, other]
Title: Auralization based on multi-perspective ambisonic room impulse responses
Comments: 18 pages, published in Acta Acustica (Open Access), datasets are available via this https URL and this https URL
Journal-ref: Acta Acustica, Volume 4, Number 6, Article Number 25, 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2312.02683 [pdf, other]
Title: Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler
Comments: Accepted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[9]  arXiv:2312.03034 [pdf, other]
Title: Distributed Speech Dereverberation Using Weighted Prediction Error
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[10]  arXiv:2312.03129 [pdf, other]
Title: Leveraging Laryngograph Data for Robust Voicing Detection in Speech
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11]  arXiv:2312.03324 [pdf, ps, other]
Title: Lightweight Speaker Verification Using Transformation Module with Feature Partition and Fusion
Comments: 12 pages, 5 figures, 6 tables; accepted for publication in IEEE-ACM TASLP
Subjects: Audio and Speech Processing (eess.AS)
[12]  arXiv:2312.03620 [pdf, other]
Title: Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
Comments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Open Access: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13]  arXiv:2312.03668 [pdf, other]
Title: An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Comments: 6 pages, 2 figures, 3 tables, The model is available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[14]  arXiv:2312.03694 [pdf, other]
Title: Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
Comments: The code is available at: this https URL
Subjects: Audio and Speech Processing (eess.AS)
[15]  arXiv:2312.04131 [pdf, other]
Title: Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16]  arXiv:2312.04324 [pdf, other]
Title: DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2312.04370 [pdf, other]
Title: Investigating the Design Space of Diffusion Models for Speech Enhancement
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[18]  arXiv:2312.05173 [pdf, other]
Title: Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach
Comments: Accepted for publication at IEEE ICASSP 2024 OJSP track
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19]  arXiv:2312.06065 [pdf, other]
Title: EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[20]  arXiv:2312.06270 [pdf, other]
Title: Testing Speech Emotion Recognition Machine Learning Models
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21]  arXiv:2312.06907 [pdf, other]
Title: w2v-SELD: A Sound Event Localization and Detection Framework for Self-Supervised Spatial Audio Pre-Training
Comments: 17 pages, 5 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22]  arXiv:2312.07513 [pdf, other]
Title: NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23]  arXiv:2312.08089 [pdf, other]
Title: Audio Deepfake Detection with Self-Supervised WavLM and Multi-Fusion Attentive Classifier
Comments: Accepted to ICASSP 2024. 5 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS)
[24]  arXiv:2312.08132 [pdf, ps, other]
Title: Ultra Low Complexity Deep Learning Based Noise Suppression
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
[25]  arXiv:2312.08496 [pdf, ps, other]
Title: Metrological support of acoustic measuring installations mid-frequency devices
Comments: 9 pages, 1 figure
Journal-ref: Environmental control systems. 2023. Issue. 2 (40). pp. 117-126
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[26]  arXiv:2312.08553 [pdf, other]
Title: USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Comments: Accepted by ICASSP 2024. Preprint
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27]  arXiv:2312.08603 [pdf, other]
Title: NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
Comments: Accepted by ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[28]  arXiv:2312.08610 [pdf, other]
Title: A computationally efficient semi-blind source separation based approach for nonlinear echo cancellation based on an element-wise iterative source steering
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29]  arXiv:2312.08622 [pdf, other]
Title: Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification
Comments: Submitted to 2024 ICASSP
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[30]  arXiv:2312.08641 [pdf, other]
Title: Towards Automatic Data Augmentation for Disordered Speech Recognition
Comments: To appear at IEEE ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[31]  arXiv:2312.08821 [pdf, other]
Title: Reconstruction of Sound Field through Diffusion Models
Comments: Accepted for publication at ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[32]  arXiv:2312.08856 [pdf, other]
Title: Attention-Guided Adaptation for Code-Switching Speech Recognition
Comments: Accepted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[33]  arXiv:2312.08908 [pdf, other]
Title: Multi-Microphone Noise Data Augmentation for DNN-based Own Voice Reconstruction for Hearables in Noisy Environments
Comments: ICASSP 2024 (c) 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Audio and Speech Processing (eess.AS)
[34]  arXiv:2312.08998 [pdf, ps, other]
Title: Design, construction and evaluation of emotional multimodal pathological speech database
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[35]  arXiv:2312.09034 [pdf, other]
Title: Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection
Comments: ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[36]  arXiv:2312.09100 [pdf, other]
Title: FastInject: Injecting Unpaired Text Data into CTC-based ASR training
Comments: Accepted by ICASSP2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37]  arXiv:2312.09572 [pdf, other]
Title: IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases
Comments: Submitted to IEEE Access
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[38]  arXiv:2312.09578 [pdf, ps, other]
Title: Self-Supervised Learning for Anomalous Sound Detection
Comments: Accepted for presentation at IEEE ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[39]  arXiv:2312.09620 [pdf, other]
Title: A Deep Representation Learning-based Speech Enhancement Method Using Complex Convolution Recurrent Variational Autoencoder
Comments: Accepted by ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS)
[40]  arXiv:2312.09645 [pdf, other]
Title: Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech
Comments: Presented at SACAIR 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[41]  arXiv:2312.09663 [pdf, other]
Title: Toward Deep Drum Source Separation
Comments: 9 pages, 2 figures, 3 tables. Published in Pattern Recognition Letters, 2024
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[42]  arXiv:2312.09747 [pdf, other]
Title: SELM: Speech Enhancement Using Discrete Tokens and Language Models
Comments: Accepted by ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[43]  arXiv:2312.09760 [pdf, other]
Title: U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias
Comments: Accepted by ASRU2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[44]  arXiv:2312.09768 [pdf, other]
Title: Decoding Envelope and Frequency-Following EEG Responses to Continuous Speech Using Deep Neural Networks
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[45]  arXiv:2312.09952 [pdf, other]
Title: Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction
Comments: Accepted by ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46]  arXiv:2312.10087 [pdf, ps, other]
Title: Revisiting the Entropy Semiring for Neural Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[47]  arXiv:2312.10088 [pdf, ps, other]
Title: On Robustness to Missing Video for Audiovisual Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[48]  arXiv:2312.10687 [pdf, other]
Title: MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Comments: Accepted at AAAI2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[49]  arXiv:2312.10741 [pdf, other]
Title: StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis
Comments: Accepted by AAAI 2024
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[50]  arXiv:2312.10756 [pdf, other]
Title: Attention-Driven Multichannel Speech Enhancement in Moving Sound Source Scenarios
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
[ total of 233 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-233 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help  (Access key information)