We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Feb 2021

[ total of 208 entries: 1-208 ]
[ showing 208 entries per page: fewer | more ]
[1]  arXiv:2102.00154 [pdf, ps, other]
Title: Semi-supervised Sound Event Detection using Random Augmentation and Consistency Regularization
Authors: Xiaofei Li
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2]  arXiv:2102.00184 [pdf, other]
Title: Adversarially learning disentangled speech representations for robust multi-factor voice conversion
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[3]  arXiv:2102.00196 [pdf, ps, other]
Title: Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures
Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4485-4489
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[4]  arXiv:2102.00270 [pdf, other]
Title: Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks
Comments: 8 pages, 4 figures, IEEE spoken language and technology workshop
Subjects: Audio and Speech Processing (eess.AS)
[5]  arXiv:2102.00306 [pdf, other]
Title: End-to-End Language Identification using Multi-Head Self-Attention and 1D Convolutional Neural Networks
Comments: 5 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS)
[6]  arXiv:2102.00804 [pdf, other]
Title: Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript
Comments: Accepted to Interspeech 2021 conference
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[7]  arXiv:2102.00850 [pdf, other]
Title: On Scaling Contrastive Representations for Low-Resource Speech Recognition
Comments: {\copyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[8]  arXiv:2102.01106 [pdf, other]
Title: Universal Neural Vocoding with Parallel WaveNet
Comments: 5 pages, 2 figures. Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9]  arXiv:2102.01326 [pdf, other]
Title: Multimodal Attention Fusion for Target Speaker Extraction
Comments: 7 pages, 5 figures
Journal-ref: in IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 778-784
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[10]  arXiv:2102.01363 [pdf, other]
Title: The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[11]  arXiv:2102.01380 [pdf, ps, other]
Title: Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition
Comments: 5 pages, ICASSP 2021
Journal-ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[12]  arXiv:2102.01746 [pdf, other]
Title: Inference of the Selective Auditory Attention using Sequential LMMSE Estimation
Comments: 12 pages, 13 figures
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[13]  arXiv:2102.01931 [pdf, other]
Title: A Global-local Attention Framework for Weakly Labelled Audio Tagging
Comments: Accepted to ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14]  arXiv:2102.02599 [pdf, other]
Title: VSEGAN: Visual Speech Enhancement Generative Adversarial Network
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[15]  arXiv:2102.02909 [pdf, ps, other]
Title: Infant Cry Classification with Graph Convolutional Networks
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[16]  arXiv:2102.02998 [pdf, other]
Title: Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output
Comments: Submitted to Inerspeech 2022
Subjects: Audio and Speech Processing (eess.AS)
[17]  arXiv:2102.03109 [pdf, other]
Title: Estimation of Microphone Clusters in Acoustic Sensor Networks using Unsupervised Federated Learning
Comments: Accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[18]  arXiv:2102.03166 [pdf, other]
Title: Lexical and syntactic gemination in Italian consonants -- Does a geminate Italian consonant consist of a repeated or a strengthened consonant?
Comments: Under revision at The Journal of the Acoustical Society of America
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19]  arXiv:2102.03216 [pdf, ps, other]
Title: Intermediate Loss Regularization for CTC-based Speech Recognition
Comments: Accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[20]  arXiv:2102.03468 [pdf, other]
Title: Sound Event Detection in Urban Audio With Single and Multi-Rate PCEN
Comments: 5 pages, 2 figures, 1 table, accepted for publication in IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[21]  arXiv:2102.03634 [pdf, other]
Title: Speaker attribution with voice profiles by graph-based semi-supervised learning
Comments: Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[22]  arXiv:2102.03649 [pdf, other]
Title: The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23]  arXiv:2102.03762 [pdf, other]
Title: Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Comments: Accepted for ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[24]  arXiv:2102.03786 [pdf, other]
Title: EMA2S: An End-to-End Multimodal Articulatory-to-Speech System
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[25]  arXiv:2102.03951 [pdf, other]
Title: End-to-End Multi-Channel Transformer for Speech Recognition
Comments: Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[26]  arXiv:2102.04029 [pdf, ps, other]
Title: Non-linear frequency warping using constant-Q transformation for speech emotion recognition
Comments: Accepted for publication in 2021 IEEE International Conference on Computer Communication and Informatics (IEEE ICCCI 2021)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[27]  arXiv:2102.04144 [pdf, ps, other]
Title: Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement
Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[28]  arXiv:2102.04629 [pdf, other]
Title: Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform
Comments: 5 pages, 2 figures, Journal submitted
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29]  arXiv:2102.04696 [pdf, other]
Title: Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation
Comments: Accepted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[30]  arXiv:2102.04697 [pdf, other]
Title: Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers
Comments: Accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[31]  arXiv:2102.05109 [pdf, other]
Title: CDPAM: Contrastive learning for perceptual audio similarity
Comments: Dataset, code and sound examples can be found at this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32]  arXiv:2102.05245 [pdf, other]
Title: Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet
Comments: Accepted for ICASSP 2021, 5 pages
Subjects: Audio and Speech Processing (eess.AS)
[33]  arXiv:2102.05259 [pdf, other]
Title: VACE-WPE: Virtual Acoustic Channel Expansion Based On Neural Networks for Weighted Prediction Error-Based Speech Dereverberation
Comments: 13 pages, 12 figures, 10 tables
Subjects: Audio and Speech Processing (eess.AS)
[34]  arXiv:2102.05889 [pdf, other]
Title: ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech
Journal-ref: IEEE Transactions on Biometrics, Behavior, and Identity Science 2021
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[35]  arXiv:2102.06200 [pdf, other]
Title: Efficient neural networks for real-time modeling of analog dynamic range compression
Comments: Updated and will appear at 152nd AES Convention (note title change)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36]  arXiv:2102.06237 [pdf, other]
Title: An Investigation of End-to-End Models for Robust Speech Recognition
Comments: Accepted to appear at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[37]  arXiv:2102.06306 [pdf, other]
Title: DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals
Comments: Accepted in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[38]  arXiv:2102.06322 [pdf, other]
Title: Joint Dereverberation and Separation with Iterative Source Steering
Comments: 5 pages, 2 figures, accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[39]  arXiv:2102.06332 [pdf, ps, other]
Title: Data Augmentation with Signal Companding for Detection of Logical Access Attacks
Comments: 5 pages, Accepted for publication in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021
Subjects: Audio and Speech Processing (eess.AS)
[40]  arXiv:2102.06454 [pdf, other]
Title: Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[41]  arXiv:2102.06610 [pdf, other]
Title: Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
Comments: 5 pages, 2 figures, ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[42]  arXiv:2102.06744 [pdf, ps, other]
Title: Hybrid phonetic-neural model for correction in speech recognition systems
Comments: 13 pages, 3 figures, presented in COMIA 2020 (this http URL)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[43]  arXiv:2102.06816 [pdf, other]
Title: Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASR
Comments: Accepted to ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[44]  arXiv:2102.07047 [pdf, other]
Title: Adversarial defense for automatic speaker verification by cascaded self-supervised learning models
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[45]  arXiv:2102.07054 [pdf, other]
Title: Inverted Vocal Tract Variables and Facial Action Units to Quantify Neuromotor Coordination in Schizophrenia
Comments: Conference
Subjects: Audio and Speech Processing (eess.AS)
[46]  arXiv:2102.07330 [pdf, other]
Title: A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement
Comments: Accepted IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS)
[47]  arXiv:2102.07390 [pdf, other]
Title: Representation Learning For Speech Recognition Using Feedback Based Relevance Weighting
Comments: arXiv admin note: substantial text overlap with arXiv:2011.00721, arXiv:2011.02136, arXiv:2001.07067
Journal-ref: IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2021
Subjects: Audio and Speech Processing (eess.AS)
[48]  arXiv:2102.07445 [pdf, other]
Title: On training targets for noise-robust voice activity detection
Journal-ref: 29th European Signal Processing Conference (EUSIPCO), 2021, Dublin, Ireland
Subjects: Audio and Speech Processing (eess.AS)
[49]  arXiv:2102.07786 [pdf, other]
Title: PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
Comments: 5 pages, accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[50]  arXiv:2102.07955 [pdf, other]
Title: Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Comments: Submitted to Computer Speech & Language
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[51]  arXiv:2102.07961 [pdf, other]
Title: Semi-Supervised Singing Voice Separation with Noisy Self-Training
Comments: Accepted at 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)
Subjects: Audio and Speech Processing (eess.AS)
[52]  arXiv:2102.08075 [pdf, other]
Title: Axial Residual Networks for CycleGAN-based Voice Conversion
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53]  arXiv:2102.08328 [pdf, other]
Title: Context-Aware Prosody Correction for Text-Based Speech Editing
Comments: To appear in proceedings of ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[54]  arXiv:2102.08706 [pdf, other]
Title: Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder
Comments: ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[55]  arXiv:2102.09106 [pdf, other]
Title: Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition
Comments: To be published in IEEE ICASSP
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56]  arXiv:2102.09168 [pdf, other]
Title: Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition
Comments: Accepted to ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57]  arXiv:2102.09660 [pdf, other]
Title: Generative Speech Coding with Predictive Variance Regularization
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58]  arXiv:2102.09666 [pdf, other]
Title: Dynamic curriculum learning via data parameters for noise robust keyword spotting
Comments: Accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[59]  arXiv:2102.09838 [pdf, other]
Title: A Robust Maximum Likelihood Distortionless Response Beamformer based on a Complex Generalized Gaussian Distribution
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[60]  arXiv:2102.09853 [pdf, ps, other]
Title: Direction of Arrival Estimation of Noisy Speech Using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals
Comments: 5 pages, 6 figures. Accepted to EUSIPCO 2021
Subjects: Audio and Speech Processing (eess.AS)
[61]  arXiv:2102.09918 [pdf, other]
Title: End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62]  arXiv:2102.09928 [pdf, other]
Title: Do End-to-End Speech Recognition Models Care About Context?
Comments: Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[63]  arXiv:2102.09939 [pdf, ps, other]
Title: ABSP System for The Third DIHARD Challenge
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[64]  arXiv:2102.09959 [pdf, other]
Title: Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast
Comments: 5 pages, 3 figures, Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[65]  arXiv:2102.10345 [pdf, other]
Title: Model architectures to extrapolate emotional expressions in DNN-based text-to-speech
Comments: This is the author's final draft. Accepted by Speech Communication. Please refer to the journal if you want
Subjects: Audio and Speech Processing (eess.AS)
[66]  arXiv:2102.10376 [pdf, other]
Title: The Use of Voice Source Features for Sung Speech Recognition
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[67]  arXiv:2102.10449 [pdf, other]
Title: WARP-Q: Quality Prediction For Generative Neural Speech Codecs
Comments: Accepted for presentation at IEEE ICASSP 2021. Source code and data can be found on this https URL
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[68]  arXiv:2102.10815 [pdf, other]
Title: LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation
Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2012.01684
Subjects: Audio and Speech Processing (eess.AS)
[69]  arXiv:2102.11265 [pdf, other]
Title: Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies
Comments: new version has an updated title
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[70]  arXiv:2102.11480 [pdf, ps, other]
Title: Evolutionary optimization of contexts for phonetic correction in speech recognition systems
Comments: 13 pages, 4 figures, This article is a translation of the paper "Optimizaci\'on evolutiva de contextos para la correcci\'on fon\'etica en sistemas de reconocimiento del habla" presented in COMIA 2019
Journal-ref: Research in Computing Science Issue 148(8), 2019, pp. 293-306. ISSN 1870-4069
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[71]  arXiv:2102.11525 [pdf, other]
Title: End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend
Comments: 5 pages, 1 figure, accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[72]  arXiv:2102.11594 [pdf, other]
Title: Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[73]  arXiv:2102.11634 [pdf, other]
Title: Dual-Path Modeling for Long Recording Speech Separation in Meetings
Comments: Accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[74]  arXiv:2102.11906 [pdf, other]
Title: Handling Background Noise in Neural Speech Generation
Comments: 5 pages, 3 figures, presented at the Asilomar Conference on Signals, Systems, and Computers 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75]  arXiv:2102.12078 [pdf, other]
Title: Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks
Comments: Preprint
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[76]  arXiv:2102.12394 [pdf, other]
Title: SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[77]  arXiv:2102.12397 [pdf, other]
Title: Thoughts on the potential to compensate a hearing loss in noise
Comments: 26 pages, 22 figures, related code this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[78]  arXiv:2102.12624 [pdf, other]
Title: Meta-Learning for improving rare word recognition in end-to-end ASR
Comments: Revised version to be published in the proceedings of ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79]  arXiv:2102.12829 [pdf, other]
Title: Automatic Classification of OSA related Snoring Signals from Nocturnal Audio Recordings
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[80]  arXiv:2102.13334 [pdf, ps, other]
Title: Integration of deep learning with expectation maximization for spatial cue based speech separation in reverberant conditions
Subjects: Audio and Speech Processing (eess.AS)
[81]  arXiv:2102.13397 [pdf, other]
Title: Underwater Acoustic Communication Receiver Using Deep Belief Network
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[82]  arXiv:2102.13468 [pdf, other]
Title: The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates
Comments: 5 pages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[83]  arXiv:2102.04832 (cross-list from eess.SP) [pdf, other]
Title: Fast and Accurate Amplitude Demodulation of Wideband Signals
Comments: Accepted for publication in IEEE Transactions on Signal Processing
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84]  arXiv:2102.06269 (cross-list from eess.IV) [pdf, other]
Title: Disentanglement for audio-visual emotion recognition using multitask setup
Comments: Accepted for ICASSP 2021, 5 pages
Subjects: Image and Video Processing (eess.IV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85]  arXiv:2102.06393 (cross-list from eess.SP) [pdf, other]
Title: Mind the beat: detecting audio onsets from EEG recordings of music listening
Comments: to be published in ICASSP 2021 4 figures, 5 pages (4 pages of content + 1 page of references)
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86]  arXiv:2102.07896 (cross-list from eess.SP) [pdf, other]
Title: A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images
Comments: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Data
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[87]  arXiv:2102.07990 (cross-list from eess.SP) [pdf, other]
Title: Through-the-Wall Radar under Electromagnetic Complex Wall: A Deep Learning Approach
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88]  arXiv:2102.00151 (cross-list from cs.SD) [pdf, other]
Title: Expressive Neural Voice Cloning
Comments: 12 pages, 2 figures, 2 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[89]  arXiv:2102.00201 (cross-list from cs.SD) [pdf, other]
Title: Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging
Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[90]  arXiv:2102.00247 (cross-list from cs.CL) [pdf, other]
Title: Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[91]  arXiv:2102.00291 (cross-list from cs.SD) [pdf, other]
Title: Speech Recognition by Simply Fine-tuning BERT
Comments: Accepted to ICASSP 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[92]  arXiv:2102.00313 (cross-list from cs.SD) [pdf, other]
Title: Cortical Features for Defense Against Adversarial Audio Attacks
Comments: Co-author legal name changed
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[93]  arXiv:2102.00382 (cross-list from cs.SD) [pdf, other]
Title: Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks
Comments: ICASSP 2021 camera-ready version. Copyrights belong to IEEE
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[94]  arXiv:2102.00429 (cross-list from cs.SD) [pdf, other]
Title: High Fidelity Speech Regeneration with Application to Speech Enhancement
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95]  arXiv:2102.00550 (cross-list from cs.SD) [pdf, other]
Title: Boosting the Predictive Accurary of Singer Identification Using Discrete Wavelet Transform For Feature Extraction
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96]  arXiv:2102.00616 (cross-list from cs.SD) [pdf, ps, other]
Title: Neural Network architectures to classify emotions in Indian Classical Music
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[97]  arXiv:2102.01013 (cross-list from cs.CL) [pdf, other]
Title: End2End Acoustic to Semantic Transduction
Comments: Accepted at IEEE ICASSP 2021
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98]  arXiv:2102.01133 (cross-list from cs.SD) [pdf, other]
Title: Deep Music Information Dynamics
Authors: Shlomo Dubnov
Journal-ref: The 2020 Joint Conference on AI Music Creativity, October 19-23, 2020, Royal Institute of Technology (KTH), Stockholm, Sweden
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99]  arXiv:2102.01243 (cross-list from cs.SD) [pdf, other]
Title: PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Comments: Published in IEEE/ACM Transactions on Audio Speech and Language Processing. Code at this https URL
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3292-3306, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[100]  arXiv:2102.01547 (cross-list from cs.SD) [pdf, other]
Title: WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit
Comments: 5 pages, 2 figures, 4 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[101]  arXiv:2102.01640 (cross-list from cs.SD) [pdf, other]
Title: SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer
Comments: 2 pages, 1 figure
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[102]  arXiv:2102.01692 (cross-list from cs.SD) [pdf, ps, other]
Title: Generacion de voces artificiales infantiles en castellano con acento costarricense
Comments: 12 pages, in Spanish
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[103]  arXiv:2102.01813 (cross-list from cs.SD) [pdf, other]
Title: Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[104]  arXiv:2102.01927 (cross-list from cs.SD) [pdf, ps, other]
Title: Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance
Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.15253
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105]  arXiv:2102.01930 (cross-list from cs.SD) [pdf, other]
Title: General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[106]  arXiv:2102.01991 (cross-list from cs.SD) [pdf, other]
Title: Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram
Comments: 5 pages, 2 figures, 4 tables, accepted by ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[107]  arXiv:2102.01993 (cross-list from cs.SD) [pdf, other]
Title: Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Comments: 5 pages, 4 figures, 2 tables, accepted by ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[108]  arXiv:2102.02028 (cross-list from cs.SD) [pdf, other]
Title: Music source separation conditioned on 3D point clouds
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[109]  arXiv:2102.02074 (cross-list from cs.SD) [pdf, ps, other]
Title: Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[110]  arXiv:2102.02270 (cross-list from cs.CL) [pdf, other]
Title: Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111]  arXiv:2102.02282 (cross-list from cs.SD) [pdf, other]
Title: Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks
Comments: 7 pages, 5 figures, Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020
Journal-ref: Proceedings of the 21st International Society for Music Information Retrieval Conference (2020) 216-222
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[112]  arXiv:2102.02417 (cross-list from cs.SD) [pdf, other]
Title: Audio Adversarial Examples: Attacks Using Vocal Masks
Comments: 9 pages, 1 figure, 2 tables. Submitted to COLING2020
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[113]  arXiv:2102.02640 (cross-list from cs.SD) [pdf, ps, other]
Title: Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach
Comments: 6 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[114]  arXiv:2102.02964 (cross-list from cs.SD) [pdf, ps, other]
Title: Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds
Comments: 15 pages, 14 figures
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[115]  arXiv:2102.03049 (cross-list from cs.SD) [pdf, ps, other]
Title: Benchmarking of eight recurrent neural network variants for breath phase and adventitious sound detection on a self-developed open-access lung sound database-HF_Lung_V1
Comments: 48 pages, 8 figures. Accepted by PLoS One
Journal-ref: PLoS ONE, 2021, 16(7): e0254134
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[116]  arXiv:2102.03055 (cross-list from cs.SD) [pdf, other]
Title: Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR
Comments: Accepted at IEEE SLT 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[117]  arXiv:2102.03170 (cross-list from cs.SD) [pdf, other]
Title: White-box Audio VST Effect Programming
Comments: The latest version of the system is to appear at EvoMUSART 2021 as a full paper. Audio samples of the latest system can be listened to at this https URL
Journal-ref: 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[118]  arXiv:2102.03207 (cross-list from cs.SD) [pdf, other]
Title: Real-time Denoising and Dereverberation with Tiny Recurrent U-Net
Comments: 5 pages, 2 figures, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). arXiv admin note: text overlap with arXiv:2006.00687
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[119]  arXiv:2102.03229 (cross-list from cs.SD) [pdf, other]
Title: Multi-Task Self-Supervised Pre-Training for Music Classification
Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[120]  arXiv:2102.03424 (cross-list from cs.CV) [pdf, other]
Title: Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Comments: Accepted to ICASSP 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[121]  arXiv:2102.03662 (cross-list from cs.CL) [pdf, other]
Title: A bandit approach to curriculum generation for automatic speech recognition
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122]  arXiv:2102.03868 (cross-list from cs.SD) [pdf, other]
Title: U-vectors: Generating clusterable speaker embedding from unlabeled data
Comments: 18 pages, 7 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[123]  arXiv:2102.03957 (cross-list from cs.SD) [pdf, other]
Title: Extracting the Auditory Attention in a Dual-Speaker Scenario from EEG using a Joint CNN-LSTM Model
Comments: 18 pages, 6 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124]  arXiv:2102.04040 (cross-list from cs.SD) [pdf, ps, other]
Title: LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Comments: Accepted to ICASSP 21
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[125]  arXiv:2102.04051 (cross-list from cs.HC) [pdf, other]
Title: HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception
Comments: 5 pages, 6 figures, to be published in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126]  arXiv:2102.04056 (cross-list from cs.SD) [pdf, other]
Title: Speaker and Direction Inferred Dual-channel Speech Separation
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127]  arXiv:2102.04062 (cross-list from cs.SD) [pdf, ps, other]
Title: An Update on a Progressively Expanded Database for Automated Lung Sound Analysis
Comments: Under review, 14 pages, 5 figures, 3 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[128]  arXiv:2102.04198 (cross-list from cs.SD) [pdf, other]
Title: ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network
Comments: 5 pages, 3 figures, accepted by ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129]  arXiv:2102.04254 (cross-list from cs.CE) [pdf, other]
Title: A Data-Driven Approach to Violin Making
Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130]  arXiv:2102.04429 (cross-list from cs.SD) [pdf, other]
Title: Federated Acoustic Modeling For Automatic Speech Recognition
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Distributed, Parallel, and Cluster Computing (cs.DC); Audio and Speech Processing (eess.AS)
[131]  arXiv:2102.04488 (cross-list from cs.CL) [pdf, other]
Title: Wake Word Detection with Streaming Transformers
Comments: Accepted at IEEE ICASSP 2021. 5 pages, 3 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132]  arXiv:2102.04588 (cross-list from cs.SD) [pdf, other]
Title: A comparative study of two-dimensional vocal tract acoustic modeling based on Finite-Difference Time-Domain methods
Comments: 4 pages, 3 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[133]  arXiv:2102.04680 (cross-list from cs.SD) [pdf, other]
Title: TräumerAI: Dreaming Music with StyleGAN
Comments: presented in NeurIPS Workshop 2020: Machine Learning for Creativity and Design
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[134]  arXiv:2102.04740 (cross-list from stat.ME) [pdf, other]
Title: Principal components variable importance reconstruction (PC-VIR): Exploring predictive importance in multicollinear acoustic speech data
Comments: 10 pages, 3 figures, GitHub repository
Subjects: Methodology (stat.ME); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135]  arXiv:2102.04880 (cross-list from cs.SD) [pdf, ps, other]
Title: Diagnosis of COVID-19 and Non-COVID-19 Patients by Classifying Only a Single Cough Sound
Authors: Masoud Maleki
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)
[136]  arXiv:2102.04932 (cross-list from cs.LG) [pdf, other]
Title: Sparsification via Compressed Sensing for Automatic Speech Recognition
Authors: Kai Zhen (1 and 2), Hieu Duy Nguyen (2), Feng-Ju Chang (2), Athanasios Mouchtaris (2), Ariya Rastrow (2). ((1) Indiana University Bloomington, (2) Alexa Machine Learning, Amazon, USA)
Comments: 5 pages, accepted for publication in (ICASSP 2021) 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing. June 6-12, 2021. Location: Toronto, ON, Canada
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137]  arXiv:2102.04945 (cross-list from cs.SD) [pdf, other]
Title: On permutation invariant training for speech source separation
Comments: In proceedings of ICASSP2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[138]  arXiv:2102.04997 (cross-list from cs.LG) [pdf, other]
Title: Deep Neural Network based Cough Detection using Bed-mounted Accelerometer Measurements
Comments: It has been accepted in ICASSP, 2021. Copyright information is shown at the very first page
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[139]  arXiv:2102.05151 (cross-list from cs.SD) [pdf, other]
Title: Enhancing Audio Augmentation Methods with Consistency Learning
Comments: Accepted to 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[140]  arXiv:2102.05225 (cross-list from cs.SD) [pdf, other]
Title: Exploring Automatic COVID-19 Diagnosis via voice and symptoms from Crowdsourced Data
Comments: 5 pages, 3 figures, 2 tables, Accepted for publication at ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141]  arXiv:2102.05630 (cross-list from cs.SD) [pdf, other]
Title: Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[142]  arXiv:2102.05749 (cross-list from cs.SD) [pdf, ps, other]
Title: Self-Supervised VQ-VAE for One-Shot Music Style Transfer
Comments: ICASSP 2021. Website: this https URL
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021) 96-100
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[143]  arXiv:2102.05872 (cross-list from cs.SD) [pdf, ps, other]
Title: Onoma-to-wave: Environmental sound synthesis from onomatopoeic words
Comments: Accepted to APSIPA Transactions on Signal and Information Processing
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144]  arXiv:2102.05894 (cross-list from cs.SD) [pdf, ps, other]
Title: CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions
Comments: Published in Applied Soft Computing journal
Journal-ref: Applied Soft Computing, Elsevier, 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[145]  arXiv:2102.06003 (cross-list from cs.SD) [pdf, ps, other]
Title: Language Independent Emotion Quantification using Non linear Modelling of Speech
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[146]  arXiv:2102.06034 (cross-list from cs.SD) [pdf, other]
Title: Speech enhancement with mixture-of-deep-experts with clean clustering pre-training
Comments: arXiv admin note: text overlap with arXiv:1703.09302
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[147]  arXiv:2102.06038 (cross-list from cs.SD) [pdf, ps, other]
Title: A Fractal Approach to Characterize Emotions in Audio and Visual Domain: A Study on Cross-Modal Interaction
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[148]  arXiv:2102.06142 (cross-list from cs.SD) [pdf, other]
Title: Multichannel-based learning for audio object extraction
Comments: In proceedings of ICASSP2021. Appendix added
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 206-210
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149]  arXiv:2102.06283 (cross-list from cs.CL) [pdf, other]
Title: Speech-language Pre-training for End-to-end Spoken Language Understanding
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150]  arXiv:2102.06291 (cross-list from cs.SD) [pdf, other]
Title: A Multi-View Approach To Audio-Visual Speaker Verification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[151]  arXiv:2102.06357 (cross-list from cs.SD) [pdf, other]
Title: Contrastive Unsupervised Learning for Speech Emotion Recognition
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[152]  arXiv:2102.06380 (cross-list from cs.CL) [pdf, ps, other]
Title: Neural Inverse Text Normalization
Comments: 5 pages, accepted to ICASSP 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[153]  arXiv:2102.06431 (cross-list from cs.SD) [pdf, other]
Title: VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[154]  arXiv:2102.06455 (cross-list from cs.SD) [pdf, other]
Title: Deep Sound Field Reconstruction in Real Rooms: Introducing the ISOBEL Sound Field Dataset
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[155]  arXiv:2102.06467 (cross-list from cs.SD) [pdf, other]
Title: Content-Aware Speaker Embeddings for Speaker Diarisation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[156]  arXiv:2102.06657 (cross-list from cs.CV) [pdf, other]
Title: End-to-end Audio-visual Speech Recognition with Conformers
Comments: Accepted to ICASSP 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[157]  arXiv:2102.06750 (cross-list from cs.CL) [pdf, other]
Title: Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding
Comments: Proc. IEEE ICASSP 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[158]  arXiv:2102.06930 (cross-list from cs.SD) [pdf, other]
Title: Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms
Comments: 5 pages, 4 figures, 6 tables, to be published in the Proc. of the 46th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) @ Toronto, Ontario, Canada
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159]  arXiv:2102.06934 (cross-list from cs.SD) [pdf, other]
Title: Multi-Channel Speech Enhancement using Graph Neural Networks
Journal-ref: Proc. ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160]  arXiv:2102.07133 (cross-list from cs.SD) [pdf, other]
Title: Parametric Optimization of Violin Top Plates using Machine Learning
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[161]  arXiv:2102.07259 (cross-list from cs.SD) [pdf, other]
Title: Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition
Comments: Submitted to IEEE/ACM Trans. on Audio, Speech, and Language Processing
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[162]  arXiv:2102.07307 (cross-list from cs.SD) [pdf, other]
Title: I-vector Based Within Speaker Voice Quality Identification on connected speech
Comments: s
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[163]  arXiv:2102.07594 (cross-list from cs.CL) [pdf, other]
Title: Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT
Comments: 14 pages, 7 figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[164]  arXiv:2102.07982 (cross-list from cs.SD) [pdf, other]
Title: Voice Gender Scoring and Independent Acoustic Characterization of Perceived Masculinity and Femininity
Comments: 24 pages, 7 figures, journal
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165]  arXiv:2102.08015 (cross-list from cs.SD) [pdf, ps, other]
Title: Improving speech recognition models with small samples for air traffic control systems
Comments: This work has been accepted by Neurocomputing for publication
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[166]  arXiv:2102.08074 (cross-list from cs.SD) [pdf, other]
Title: Semi Supervised Learning For Few-shot Audio Classification By Episodic Triplet Mining
Comments: 5 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167]  arXiv:2102.08183 (cross-list from cs.SD) [pdf, other]
Title: Comparison of semi-supervised deep learning algorithms for audio classification
Comments: 9 pages, 5 figures, 5 tables. This is the version 3 of the paper. Contains minor fixes compared to the EURASIP one (which is the version 2 of the paper)
Journal-ref: EURASIP Journal on Audio, Speech, and Music Processing, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[168]  arXiv:2102.08359 (cross-list from cs.SD) [pdf, other]
Title: End-2-End COVID-19 Detection from Breath & Cough Audio
Comments: 5 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[169]  arXiv:2102.08535 (cross-list from cs.CL) [pdf, ps, other]
Title: ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems
Comments: An improved work based on our previous Interspeech 2020 paper (this https URL)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170]  arXiv:2102.08551 (cross-list from cs.SD) [pdf, other]
Title: Weighted Recursive Least Square Filter and Neural Network based Residual Echo Suppression for the AEC-Challenge
Comments: 5 pages, 2 figures, accepted by ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[171]  arXiv:2102.08575 (cross-list from cs.SD) [pdf, ps, other]
Title: End-to-end lyrics Recognition with Voice to Singing Style Transfer
Comments: accepted at ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[172]  arXiv:2102.08833 (cross-list from cs.SD) [pdf, other]
Title: DESED-FL and URBAN-FL: Federated Learning Datasets for Sound Event Detection
Comments: To be published in EUSIPCO 2021
Subjects: Sound (cs.SD); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[173]  arXiv:2102.09202 (cross-list from cs.SD) [pdf, other]
Title: Low Resource Audio-to-Lyrics Alignment From Polyphonic Music Recordings
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174]  arXiv:2102.09281 (cross-list from cs.LG) [pdf, other]
Title: DINO: A Conditional Energy-Based GAN for Domain Translation
Comments: Accepted to ICLR 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175]  arXiv:2102.09607 (cross-list from cs.LG) [pdf, ps, other]
Title: Modelling Paralinguistic Properties in Conversational Speech to Detect Bipolar Disorder and Borderline Personality Disorder
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176]  arXiv:2102.09680 (cross-list from cs.CL) [pdf, other]
Title: Fixing Errors of the Google Voice Recognizer through Phonetic Distance Metrics
Comments: 13 pages, 4 figures. This article is a translation of the paper "Correcci\'on de errores del reconocedor de voz de Google usando m\'etricas de distancia fon\'etica" presented in COMIA 2018
Journal-ref: Research in Computing Science 148(1), 2019, pp. 57-70. ISSN 1870-4069
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[177]  arXiv:2102.09737 (cross-list from cs.CV) [pdf, other]
Title: One Shot Audio to Animated Video Generation
Comments: arXiv admin note: substantial text overlap with arXiv:2012.07842, arXiv:2012.07304
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[178]  arXiv:2102.09763 (cross-list from cs.SD) [pdf, other]
Title: Frequency-Temporal Attention Network for Singing Melody Extraction
Comments: This paper has been accepted by ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179]  arXiv:2102.09794 (cross-list from cs.SD) [pdf, other]
Title: Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure
Journal-ref: Proc. of the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021(virtual)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[180]  arXiv:2102.09817 (cross-list from cs.SD) [pdf, ps, other]
Title: Unit selection synthesis based data augmentation for fixed phrase speaker verification
Comments: Accepted to ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181]  arXiv:2102.09828 (cross-list from cs.SD) [pdf, other]
Title: AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge
Comments: Accepted to ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[182]  arXiv:2102.09914 (cross-list from cs.CL) [pdf, other]
Title: Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input
Comments: 4 pages
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[183]  arXiv:2102.09966 (cross-list from cs.SD) [pdf, ps, other]
Title: CatNet: music source separation system with mix-audio augmentation
Comments: 5 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184]  arXiv:2102.09971 (cross-list from cs.SD) [pdf, other]
Title: Speech enhancement with weakly labelled data from AudioSet
Comments: 5 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185]  arXiv:2102.09978 (cross-list from cs.SD) [pdf, other]
Title: TransMask: A Compact and Fast Speech Separation Model Based on Transformer
Comments: Accepted in ICASSP2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186]  arXiv:2102.10233 (cross-list from cs.SD) [pdf, other]
Title: The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187]  arXiv:2102.10236 (cross-list from cs.SD) [pdf, other]
Title: Singer Identification Using Deep Timbre Feature Learning with KNN-Net
Comments: Published as a conference paper at ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188]  arXiv:2102.10322 (cross-list from cs.SD) [pdf, other]
Title: Learnable MFCCs for Speaker Verification
Comments: Accepted to ISCAS 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[189]  arXiv:2102.10331 (cross-list from q-bio.NC) [pdf, other]
Title: Separating Stimulus-Induced and Background Components of Dynamic Functional Connectivity in Naturalistic fMRI
Comments: Main paper: 10 pages, 8 figures. Supplemental file: 3 pages
Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Signal Processing (eess.SP); Applications (stat.AP)
[190]  arXiv:2102.10515 (cross-list from cs.SD) [pdf, other]
Title: Anomaly Detection in Audio with Concept Drift using Adaptive Huffman Coding
Comments: 22 pages, 8 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191]  arXiv:2102.10905 (cross-list from cs.CL) [pdf, other]
Title: Joint Intent Detection And Slot Filling Based on Continual Learning Model
Comments: Accepted to ICASSP 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[192]  arXiv:2102.11058 (cross-list from cs.SD) [pdf, other]
Title: Anyone GAN Sing
Comments: 5 pages, 8 figures
Journal-ref: International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN: 2349-5162, Vol.7, Issue 5, page no. 25-29, May-2020
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[193]  arXiv:2102.11114 (cross-list from cs.CL) [pdf, other]
Title: Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model
Comments: Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194]  arXiv:2102.11420 (cross-list from cs.SD) [pdf, other]
Title: Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion
Comments: For demo, see this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195]  arXiv:2102.11457 (cross-list from cs.SD) [pdf, other]
Title: Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196]  arXiv:2102.11474 (cross-list from cs.SD) [pdf, other]
Title: Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197]  arXiv:2102.11488 (cross-list from cs.SD) [pdf, other]
Title: Senone-aware Adversarial Multi-task Training for Unsupervised Child to Adult Speech Adaptation
Comments: accepted for presentation at ICASSP-2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[198]  arXiv:2102.11531 (cross-list from cs.SD) [pdf, other]
Title: Memory-efficient Speech Recognition on Smart Devices
Journal-ref: ICASSP 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[199]  arXiv:2102.11588 (cross-list from cs.SD) [pdf, other]
Title: Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain
Comments: 4 pages, 6 figures, ICASSP 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[200]  arXiv:2102.11771 (cross-list from cs.SD) [pdf, ps, other]
Title: Improving Deep Learning Sound Events Classifiers using Gram Matrix Feature-wise Correlations
Comments: To appear on ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[201]  arXiv:2102.12111 (cross-list from cs.SD) [pdf, other]
Title: Deep Learning Approach for Singer Voice Classification of Vietnamese Popular Music
Comments: Published in SoICT 2019: Proceedings of the Tenth International Symposium on Information and Communication Technology
Journal-ref: SoICT 2019: Proceedings of the Tenth International Symposium on Information and Communication Technology
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[202]  arXiv:2102.12289 (cross-list from cs.SD) [pdf, other]
Title: Automatic Feature Extraction for Heartbeat Anomaly Detection
Comments: 7 pages, 2 figures, Presented at PharML 2020 Workshop - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), see this https URL, source-code: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[203]  arXiv:2102.12564 (cross-list from cs.SD) [pdf, other]
Title: Triplet loss based embeddings for forensic speaker identification in Spanish
Comments: Long Paper: Neural Computing and Applications, Special Issue on LatinX in AI Research (2021). 11 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[204]  arXiv:2102.12664 (cross-list from cs.CL) [pdf, other]
Title: MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition
Comments: To appear at ICASSP 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[205]  arXiv:2102.12841 (cross-list from cs.SD) [pdf, other]
Title: MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames
Comments: Accepted to ICASSP 2021. Project page: this http URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[206]  arXiv:2102.13314 (cross-list from cs.LG) [pdf, other]
Title: Efficient Client Contribution Evaluation for Horizontal Federated Learning
Comments: Accepted to ICASSP 2021
Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[207]  arXiv:2102.13479 (cross-list from cs.SD) [pdf, other]
Title: Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features via Acoustic Domain Adaptation
Comments: 5 pages, 3 figures; accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[208]  arXiv:2102.13552 (cross-list from cs.SD) [pdf, other]
Title: The NPU System for the 2020 Personalized Voice Trigger Challenge
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 208 entries: 1-208 ]
[ showing 208 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help  (Access key information)