We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Feb 2021

[ total of 208 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-208 ]
[ showing 50 entries per page: fewer | more | all ]
[1]  arXiv:2102.00154 [pdf, ps, other]
Title: Semi-supervised Sound Event Detection using Random Augmentation and Consistency Regularization
Authors: Xiaofei Li
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2]  arXiv:2102.00184 [pdf, other]
Title: Adversarially learning disentangled speech representations for robust multi-factor voice conversion
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[3]  arXiv:2102.00196 [pdf, ps, other]
Title: Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures
Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4485-4489
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[4]  arXiv:2102.00270 [pdf, other]
Title: Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks
Comments: 8 pages, 4 figures, IEEE spoken language and technology workshop
Subjects: Audio and Speech Processing (eess.AS)
[5]  arXiv:2102.00306 [pdf, other]
Title: End-to-End Language Identification using Multi-Head Self-Attention and 1D Convolutional Neural Networks
Comments: 5 pages, 1 figure
Subjects: Audio and Speech Processing (eess.AS)
[6]  arXiv:2102.00804 [pdf, other]
Title: Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript
Comments: Accepted to Interspeech 2021 conference
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[7]  arXiv:2102.00850 [pdf, other]
Title: On Scaling Contrastive Representations for Low-Resource Speech Recognition
Comments: {\copyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[8]  arXiv:2102.01106 [pdf, other]
Title: Universal Neural Vocoding with Parallel WaveNet
Comments: 5 pages, 2 figures. Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9]  arXiv:2102.01326 [pdf, other]
Title: Multimodal Attention Fusion for Target Speaker Extraction
Comments: 7 pages, 5 figures
Journal-ref: in IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 778-784
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[10]  arXiv:2102.01363 [pdf, other]
Title: The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[11]  arXiv:2102.01380 [pdf, ps, other]
Title: Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition
Comments: 5 pages, ICASSP 2021
Journal-ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[12]  arXiv:2102.01746 [pdf, other]
Title: Inference of the Selective Auditory Attention using Sequential LMMSE Estimation
Comments: 12 pages, 13 figures
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[13]  arXiv:2102.01931 [pdf, other]
Title: A Global-local Attention Framework for Weakly Labelled Audio Tagging
Comments: Accepted to ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14]  arXiv:2102.02599 [pdf, other]
Title: VSEGAN: Visual Speech Enhancement Generative Adversarial Network
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[15]  arXiv:2102.02909 [pdf, ps, other]
Title: Infant Cry Classification with Graph Convolutional Networks
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[16]  arXiv:2102.02998 [pdf, other]
Title: Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output
Comments: Submitted to Inerspeech 2022
Subjects: Audio and Speech Processing (eess.AS)
[17]  arXiv:2102.03109 [pdf, other]
Title: Estimation of Microphone Clusters in Acoustic Sensor Networks using Unsupervised Federated Learning
Comments: Accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[18]  arXiv:2102.03166 [pdf, other]
Title: Lexical and syntactic gemination in Italian consonants -- Does a geminate Italian consonant consist of a repeated or a strengthened consonant?
Comments: Under revision at The Journal of the Acoustical Society of America
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19]  arXiv:2102.03216 [pdf, ps, other]
Title: Intermediate Loss Regularization for CTC-based Speech Recognition
Comments: Accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[20]  arXiv:2102.03468 [pdf, other]
Title: Sound Event Detection in Urban Audio With Single and Multi-Rate PCEN
Comments: 5 pages, 2 figures, 1 table, accepted for publication in IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[21]  arXiv:2102.03634 [pdf, other]
Title: Speaker attribution with voice profiles by graph-based semi-supervised learning
Comments: Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[22]  arXiv:2102.03649 [pdf, other]
Title: The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23]  arXiv:2102.03762 [pdf, other]
Title: Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism
Comments: Accepted for ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[24]  arXiv:2102.03786 [pdf, other]
Title: EMA2S: An End-to-End Multimodal Articulatory-to-Speech System
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[25]  arXiv:2102.03951 [pdf, other]
Title: End-to-End Multi-Channel Transformer for Speech Recognition
Comments: Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[26]  arXiv:2102.04029 [pdf, ps, other]
Title: Non-linear frequency warping using constant-Q transformation for speech emotion recognition
Comments: Accepted for publication in 2021 IEEE International Conference on Computer Communication and Informatics (IEEE ICCCI 2021)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[27]  arXiv:2102.04144 [pdf, ps, other]
Title: Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement
Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[28]  arXiv:2102.04629 [pdf, other]
Title: Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform
Comments: 5 pages, 2 figures, Journal submitted
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29]  arXiv:2102.04696 [pdf, other]
Title: Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation
Comments: Accepted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[30]  arXiv:2102.04697 [pdf, other]
Title: Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers
Comments: Accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[31]  arXiv:2102.05109 [pdf, other]
Title: CDPAM: Contrastive learning for perceptual audio similarity
Comments: Dataset, code and sound examples can be found at this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32]  arXiv:2102.05245 [pdf, other]
Title: Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet
Comments: Accepted for ICASSP 2021, 5 pages
Subjects: Audio and Speech Processing (eess.AS)
[33]  arXiv:2102.05259 [pdf, other]
Title: VACE-WPE: Virtual Acoustic Channel Expansion Based On Neural Networks for Weighted Prediction Error-Based Speech Dereverberation
Comments: 13 pages, 12 figures, 10 tables
Subjects: Audio and Speech Processing (eess.AS)
[34]  arXiv:2102.05889 [pdf, other]
Title: ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech
Journal-ref: IEEE Transactions on Biometrics, Behavior, and Identity Science 2021
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[35]  arXiv:2102.06200 [pdf, other]
Title: Efficient neural networks for real-time modeling of analog dynamic range compression
Comments: Updated and will appear at 152nd AES Convention (note title change)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36]  arXiv:2102.06237 [pdf, other]
Title: An Investigation of End-to-End Models for Robust Speech Recognition
Comments: Accepted to appear at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[37]  arXiv:2102.06306 [pdf, other]
Title: DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals
Comments: Accepted in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[38]  arXiv:2102.06322 [pdf, other]
Title: Joint Dereverberation and Separation with Iterative Source Steering
Comments: 5 pages, 2 figures, accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[39]  arXiv:2102.06332 [pdf, ps, other]
Title: Data Augmentation with Signal Companding for Detection of Logical Access Attacks
Comments: 5 pages, Accepted for publication in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021
Subjects: Audio and Speech Processing (eess.AS)
[40]  arXiv:2102.06454 [pdf, other]
Title: Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier
Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[41]  arXiv:2102.06610 [pdf, other]
Title: Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders
Comments: 5 pages, 2 figures, ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[42]  arXiv:2102.06744 [pdf, ps, other]
Title: Hybrid phonetic-neural model for correction in speech recognition systems
Comments: 13 pages, 3 figures, presented in COMIA 2020 (this http URL)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[43]  arXiv:2102.06816 [pdf, other]
Title: Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASR
Comments: Accepted to ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[44]  arXiv:2102.07047 [pdf, other]
Title: Adversarial defense for automatic speaker verification by cascaded self-supervised learning models
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[45]  arXiv:2102.07054 [pdf, other]
Title: Inverted Vocal Tract Variables and Facial Action Units to Quantify Neuromotor Coordination in Schizophrenia
Comments: Conference
Subjects: Audio and Speech Processing (eess.AS)
[46]  arXiv:2102.07330 [pdf, other]
Title: A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement
Comments: Accepted IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS)
[47]  arXiv:2102.07390 [pdf, other]
Title: Representation Learning For Speech Recognition Using Feedback Based Relevance Weighting
Comments: arXiv admin note: substantial text overlap with arXiv:2011.00721, arXiv:2011.02136, arXiv:2001.07067
Journal-ref: IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2021
Subjects: Audio and Speech Processing (eess.AS)
[48]  arXiv:2102.07445 [pdf, other]
Title: On training targets for noise-robust voice activity detection
Journal-ref: 29th European Signal Processing Conference (EUSIPCO), 2021, Dublin, Ireland
Subjects: Audio and Speech Processing (eess.AS)
[49]  arXiv:2102.07786 [pdf, other]
Title: PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
Comments: 5 pages, accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[50]  arXiv:2102.07955 [pdf, other]
Title: Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition
Comments: Submitted to Computer Speech & Language
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 208 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-208 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help  (Access key information)