Audio and Speech Processing
Authors and titles for eess.AS in Feb 2021
[ total of 208 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-208 ][ showing 50 entries per page: fewer | more | all ]
- [1] arXiv:2102.00154 [pdf, ps, other]
-
Title: Semi-supervised Sound Event Detection using Random Augmentation and Consistency RegularizationAuthors: Xiaofei LiSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [2] arXiv:2102.00184 [pdf, other]
-
Title: Adversarially learning disentangled speech representations for robust multi-factor voice conversionSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [3] arXiv:2102.00196 [pdf, ps, other]
-
Title: Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech MixturesComments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksJournal-ref: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4485-4489Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
- [4] arXiv:2102.00270 [pdf, other]
-
Title: Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial NetworksComments: 8 pages, 4 figures, IEEE spoken language and technology workshopSubjects: Audio and Speech Processing (eess.AS)
- [5] arXiv:2102.00306 [pdf, other]
-
Title: End-to-End Language Identification using Multi-Head Self-Attention and 1D Convolutional Neural NetworksComments: 5 pages, 1 figureSubjects: Audio and Speech Processing (eess.AS)
- [6] arXiv:2102.00804 [pdf, other]
-
Title: Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR TranscriptComments: Accepted to Interspeech 2021 conferenceSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
- [7] arXiv:2102.00850 [pdf, other]
-
Title: On Scaling Contrastive Representations for Low-Resource Speech RecognitionAuthors: Lasse Borgholt, Tycho Max Sylvester Tax, Jakob Drachmann Havtorn, Lars Maaløe, Christian IgelComments: {\copyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [8] arXiv:2102.01106 [pdf, other]
-
Title: Universal Neural Vocoding with Parallel WaveNetAuthors: Yunlong Jiao, Adam Gabrys, Georgi Tinchev, Bartosz Putrycz, Daniel Korzekwa, Viacheslav KlimkovComments: 5 pages, 2 figures. Accepted to ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [9] arXiv:2102.01326 [pdf, other]
-
Title: Multimodal Attention Fusion for Target Speaker ExtractionAuthors: Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko ArakiComments: 7 pages, 5 figuresJournal-ref: in IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 778-784Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [10] arXiv:2102.01363 [pdf, other]
-
Title: The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-LapAuthors: Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev KhudanpurSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [11] arXiv:2102.01380 [pdf, ps, other]
-
Title: Internal Language Model Training for Domain-Adaptive End-to-End Speech RecognitionAuthors: Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan GongComments: 5 pages, ICASSP 2021Journal-ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, CanadaSubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [12] arXiv:2102.01746 [pdf, other]
-
Title: Inference of the Selective Auditory Attention using Sequential LMMSE EstimationComments: 12 pages, 13 figuresSubjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
- [13] arXiv:2102.01931 [pdf, other]
-
Title: A Global-local Attention Framework for Weakly Labelled Audio TaggingComments: Accepted to ICASSP2021Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [14] arXiv:2102.02599 [pdf, other]
-
Title: VSEGAN: Visual Speech Enhancement Generative Adversarial NetworkComments: Accepted by ICASSP 2022Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
- [15] arXiv:2102.02909 [pdf, ps, other]
-
Title: Infant Cry Classification with Graph Convolutional NetworksSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [16] arXiv:2102.02998 [pdf, other]
-
Title: Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel OutputComments: Submitted to Inerspeech 2022Subjects: Audio and Speech Processing (eess.AS)
- [17] arXiv:2102.03109 [pdf, other]
-
Title: Estimation of Microphone Clusters in Acoustic Sensor Networks using Unsupervised Federated LearningComments: Accepted at ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [18] arXiv:2102.03166 [pdf, other]
-
Title: Lexical and syntactic gemination in Italian consonants -- Does a geminate Italian consonant consist of a repeated or a strengthened consonant?Authors: Maria Gabriella Di Benedetto, Stefanie Shattuck-Hufnagel, Luca De Nardis, Sara Budoni, Javier Arango, Ian Chan, Alec DeCaprioComments: Under revision at The Journal of the Acoustical Society of AmericaSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [19] arXiv:2102.03216 [pdf, ps, other]
-
Title: Intermediate Loss Regularization for CTC-based Speech RecognitionComments: Accepted at ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [20] arXiv:2102.03468 [pdf, other]
-
Title: Sound Event Detection in Urban Audio With Single and Multi-Rate PCENComments: 5 pages, 2 figures, 1 table, accepted for publication in IEEE ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [21] arXiv:2102.03634 [pdf, other]
-
Title: Speaker attribution with voice profiles by graph-based semi-supervised learningComments: Interspeech 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [22] arXiv:2102.03649 [pdf, other]
-
Title: The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization ChallengeSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [23] arXiv:2102.03762 [pdf, other]
-
Title: Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning MechanismComments: Accepted for ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [24] arXiv:2102.03786 [pdf, other]
-
Title: EMA2S: An End-to-End Multimodal Articulatory-to-Speech SystemAuthors: Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu TsaoSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
- [25] arXiv:2102.03951 [pdf, other]
-
Title: End-to-End Multi-Channel Transformer for Speech RecognitionComments: Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [26] arXiv:2102.04029 [pdf, ps, other]
-
Title: Non-linear frequency warping using constant-Q transformation for speech emotion recognitionComments: Accepted for publication in 2021 IEEE International Conference on Computer Communication and Informatics (IEEE ICCCI 2021)Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
- [27] arXiv:2102.04144 [pdf, ps, other]
-
Title: Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech EnhancementComments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
- [28] arXiv:2102.04629 [pdf, other]
-
Title: Real-time Monaural Speech Enhancement With Short-time Discrete Cosine TransformComments: 5 pages, 2 figures, Journal submittedSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [29] arXiv:2102.04696 [pdf, other]
-
Title: Independent Vector Extraction for Fast Joint Blind Source Separation and DereverberationComments: Accepted to IEEE Signal Processing LettersSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
- [30] arXiv:2102.04697 [pdf, other]
-
Title: Train your classifier first: Cascade Neural Networks Training from upper layers to lower layersComments: Accepted by ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
- [31] arXiv:2102.05109 [pdf, other]
-
Title: CDPAM: Contrastive learning for perceptual audio similarityComments: Dataset, code and sound examples can be found at this https URLSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [32] arXiv:2102.05245 [pdf, other]
-
Title: Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNetComments: Accepted for ICASSP 2021, 5 pagesSubjects: Audio and Speech Processing (eess.AS)
- [33] arXiv:2102.05259 [pdf, other]
-
Title: VACE-WPE: Virtual Acoustic Channel Expansion Based On Neural Networks for Weighted Prediction Error-Based Speech DereverberationComments: 13 pages, 12 figures, 10 tablesSubjects: Audio and Speech Processing (eess.AS)
- [34] arXiv:2102.05889 [pdf, other]
-
Title: ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speechAuthors: Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik LeeJournal-ref: IEEE Transactions on Biometrics, Behavior, and Identity Science 2021Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
- [35] arXiv:2102.06200 [pdf, other]
-
Title: Efficient neural networks for real-time modeling of analog dynamic range compressionComments: Updated and will appear at 152nd AES Convention (note title change)Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [36] arXiv:2102.06237 [pdf, other]
-
Title: An Investigation of End-to-End Models for Robust Speech RecognitionComments: Accepted to appear at ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [37] arXiv:2102.06306 [pdf, other]
-
Title: DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech SignalsComments: Accepted in ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
- [38] arXiv:2102.06322 [pdf, other]
-
Title: Joint Dereverberation and Separation with Iterative Source SteeringComments: 5 pages, 2 figures, accepted at ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
- [39] arXiv:2102.06332 [pdf, ps, other]
-
Title: Data Augmentation with Signal Companding for Detection of Logical Access AttacksComments: 5 pages, Accepted for publication in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021Subjects: Audio and Speech Processing (eess.AS)
- [40] arXiv:2102.06454 [pdf, other]
-
Title: Guided Variational Autoencoder for Speech Enhancement With a Supervised ClassifierJournal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [41] arXiv:2102.06610 [pdf, other]
-
Title: Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized AutoencodersComments: 5 pages, 2 figures, ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
- [42] arXiv:2102.06744 [pdf, ps, other]
-
Title: Hybrid phonetic-neural model for correction in speech recognition systemsComments: 13 pages, 3 figures, presented in COMIA 2020 (this http URL)Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
- [43] arXiv:2102.06816 [pdf, other]
-
Title: Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASRComments: Accepted to ICASSP2021Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
- [44] arXiv:2102.07047 [pdf, other]
-
Title: Adversarial defense for automatic speaker verification by cascaded self-supervised learning modelsComments: Accepted to ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
- [45] arXiv:2102.07054 [pdf, other]
-
Title: Inverted Vocal Tract Variables and Facial Action Units to Quantify Neuromotor Coordination in SchizophreniaComments: ConferenceSubjects: Audio and Speech Processing (eess.AS)
- [46] arXiv:2102.07330 [pdf, other]
-
Title: A Modulation-Domain Loss for Neural-Network-based Real-time Speech EnhancementComments: Accepted IEEE ICASSP 2021Subjects: Audio and Speech Processing (eess.AS)
- [47] arXiv:2102.07390 [pdf, other]
-
Title: Representation Learning For Speech Recognition Using Feedback Based Relevance WeightingComments: arXiv admin note: substantial text overlap with arXiv:2011.00721, arXiv:2011.02136, arXiv:2001.07067Journal-ref: IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2021Subjects: Audio and Speech Processing (eess.AS)
- [48] arXiv:2102.07445 [pdf, other]
-
Title: On training targets for noise-robust voice activity detectionJournal-ref: 29th European Signal Processing Conference (EUSIPCO), 2021, Dublin, IrelandSubjects: Audio and Speech Processing (eess.AS)
- [49] arXiv:2102.07786 [pdf, other]
-
Title: PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic componentsAuthors: Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi TokudaComments: 5 pages, accepted to ICASSP 2021Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
- [50] arXiv:2102.07955 [pdf, other]
-
Title: Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech RecognitionComments: Submitted to Computer Speech & LanguageSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ showing 50 entries per page: fewer | more | all ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)