Audio and Speech Processing

Authors and titles for eess.AS in Feb 2021

[ total of 208 entries: 1-208 ]
[ showing 208 entries per page: fewer | more ]

[1] arXiv:2102.00154 [pdf, ps, other]: Title: Semi-supervised Sound Event Detection using Random Augmentation and Consistency Regularization

Authors: Xiaofei Li

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2] arXiv:2102.00184 [pdf, other]: Title: Adversarially learning disentangled speech representations for robust multi-factor voice conversion

Authors: Jie Wang, Jingbei Li, Xintao Zhao, Zhiyong Wu, Shiyin Kang, Helen Meng

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[3] arXiv:2102.00196 [pdf, ps, other]: Title: Directional Sparse Filtering using Weighted Lehmer Mean for Blind Separation of Unbalanced Speech Mixtures

Authors: Karn Watcharasupat, Anh H. T. Nguyen, Ching-Hui Ooi, Andy W. H. Khong

Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4485-4489

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[4] arXiv:2102.00270 [pdf, other]: Title: Enhancing the Intelligibility of Cleft Lip and Palate Speech using Cycle-consistent Adversarial Networks

Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S R Mahadeva Prasanna

Comments: 8 pages, 4 figures, IEEE spoken language and technology workshop

Subjects: Audio and Speech Processing (eess.AS)
[5] arXiv:2102.00306 [pdf, other]: Title: End-to-End Language Identification using Multi-Head Self-Attention and 1D Convolutional Neural Networks

Authors: Krishna D N, Ankita Patil

Comments: 5 pages, 1 figure

Subjects: Audio and Speech Processing (eess.AS)
[6] arXiv:2102.00804 [pdf, other]: Title: Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Authors: Mukuntha Narayanan Sundararaman, Ayush Kumar, Jithendra Vepa

Comments: Accepted to Interspeech 2021 conference

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[7] arXiv:2102.00850 [pdf, other]: Title: On Scaling Contrastive Representations for Low-Resource Speech Recognition

Authors: Lasse Borgholt, Tycho Max Sylvester Tax, Jakob Drachmann Havtorn, Lars Maaløe, Christian Igel

Comments: {\copyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[8] arXiv:2102.01106 [pdf, other]: Title: Universal Neural Vocoding with Parallel WaveNet

Authors: Yunlong Jiao, Adam Gabrys, Georgi Tinchev, Bartosz Putrycz, Daniel Korzekwa, Viacheslav Klimkov

Comments: 5 pages, 2 figures. Accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9] arXiv:2102.01326 [pdf, other]: Title: Multimodal Attention Fusion for Target Speaker Extraction

Authors: Hiroshi Sato, Tsubasa Ochiai, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki

Comments: 7 pages, 5 figures

Journal-ref: in IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 778-784

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[10] arXiv:2102.01363 [pdf, other]: Title: The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

Authors: Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[11] arXiv:2102.01380 [pdf, ps, other]: Title: Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

Authors: Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong

Comments: 5 pages, ICASSP 2021

Journal-ref: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[12] arXiv:2102.01746 [pdf, other]: Title: Inference of the Selective Auditory Attention using Sequential LMMSE Estimation

Authors: Ivine Kuruvila, Kubilay Can Demir, Eghart Fischer, Ulrich Hoppe

Comments: 12 pages, 13 figures

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[13] arXiv:2102.01931 [pdf, other]: Title: A Global-local Attention Framework for Weakly Labelled Audio Tagging

Authors: Helin Wang, Yuexian Zou, Wenwu Wang

Comments: Accepted to ICASSP2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14] arXiv:2102.02599 [pdf, other]: Title: VSEGAN: Visual Speech Enhancement Generative Adversarial Network

Authors: Xinmeng Xu, Yang Wang, Dongxiang Xu, Yiyuan Peng, Cong Zhang, Jie Jia, Binbin Chen

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[15] arXiv:2102.02909 [pdf, ps, other]: Title: Infant Cry Classification with Graph Convolutional Networks

Authors: Chunyan Ji, Ming Chen, Bin Li, Yi Pan

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[16] arXiv:2102.02998 [pdf, other]: Title: Beam-Guided TasNet: An Iterative Speech Separation Framework with Multi-Channel Output

Authors: Hangting Chen, Yang Yi, Dang Feng, Pengyuan Zhang

Comments: Submitted to Inerspeech 2022

Subjects: Audio and Speech Processing (eess.AS)
[17] arXiv:2102.03109 [pdf, other]: Title: Estimation of Microphone Clusters in Acoustic Sensor Networks using Unsupervised Federated Learning

Authors: Alexandru Nelus, Rene Glitza, Rainer Martin

Comments: Accepted at ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[18] arXiv:2102.03166 [pdf, other]: Title: Lexical and syntactic gemination in Italian consonants -- Does a geminate Italian consonant consist of a repeated or a strengthened consonant?

Authors: Maria Gabriella Di Benedetto, Stefanie Shattuck-Hufnagel, Luca De Nardis, Sara Budoni, Javier Arango, Ian Chan, Alec DeCaprio

Comments: Under revision at The Journal of the Acoustical Society of America

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19] arXiv:2102.03216 [pdf, ps, other]: Title: Intermediate Loss Regularization for CTC-based Speech Recognition

Authors: Jaesong Lee, Shinji Watanabe

Comments: Accepted at ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[20] arXiv:2102.03468 [pdf, other]: Title: Sound Event Detection in Urban Audio With Single and Multi-Rate PCEN

Authors: Christopher Ick, Brian McFee

Comments: 5 pages, 2 figures, 1 table, accepted for publication in IEEE ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[21] arXiv:2102.03634 [pdf, other]: Title: Speaker attribution with voice profiles by graph-based semi-supervised learning

Authors: Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno

Comments: Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[22] arXiv:2102.03649 [pdf, other]: Title: The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge

Authors: Weiqing Wang, Qingjian Lin, Danwei Cai, Lin Yang, Ming Li

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2102.03762 [pdf, other]: Title: Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

Comments: Accepted for ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[24] arXiv:2102.03786 [pdf, other]: Title: EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Authors: Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[25] arXiv:2102.03951 [pdf, other]: Title: End-to-End Multi-Channel Transformer for Speech Recognition

Authors: Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

Comments: Accepted by 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[26] arXiv:2102.04029 [pdf, ps, other]: Title: Non-linear frequency warping using constant-Q transformation for speech emotion recognition

Authors: Premjeet Singh, Goutam Saha, Md Sahidullah

Comments: Accepted for publication in 2021 IEEE International Conference on Computer Communication and Informatics (IEEE ICCCI 2021)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[27] arXiv:2102.04144 [pdf, ps, other]: Title: Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

Authors: Mostafa Sadeghi, Xavier Alameda-Pineda

Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[28] arXiv:2102.04629 [pdf, other]: Title: Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform

Authors: Qinglong Li, Fei Gao, Haixin Guan, Kaichi Ma

Comments: 5 pages, 2 figures, Journal submitted

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29] arXiv:2102.04696 [pdf, other]: Title: Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation

Authors: Rintaro Ikeshita, Tomohiro Nakatani

Comments: Accepted to IEEE Signal Processing Letters

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[30] arXiv:2102.04697 [pdf, other]: Title: Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

Authors: Shucong Zhang, Cong-Thanh Do, Rama Doddipatla, Erfan Loweimi, Peter Bell, Steve Renals

Comments: Accepted by ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[31] arXiv:2102.05109 [pdf, other]: Title: CDPAM: Contrastive learning for perceptual audio similarity

Authors: Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein

Comments: Dataset, code and sound examples can be found at this https URL

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32] arXiv:2102.05245 [pdf, other]: Title: Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet

Authors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy

Comments: Accepted for ICASSP 2021, 5 pages

Subjects: Audio and Speech Processing (eess.AS)
[33] arXiv:2102.05259 [pdf, other]: Title: VACE-WPE: Virtual Acoustic Channel Expansion Based On Neural Networks for Weighted Prediction Error-Based Speech Dereverberation

Authors: Joon-Young Yang, Joon-Hyuk Chang

Comments: 13 pages, 12 figures, 10 tables

Subjects: Audio and Speech Processing (eess.AS)
[34] arXiv:2102.05889 [pdf, other]: Title: ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech

Authors: Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik Lee

Journal-ref: IEEE Transactions on Biometrics, Behavior, and Identity Science 2021

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[35] arXiv:2102.06200 [pdf, other]: Title: Efficient neural networks for real-time modeling of analog dynamic range compression

Authors: Christian J. Steinmetz, Joshua D. Reiss

Comments: Updated and will appear at 152nd AES Convention (note title change)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36] arXiv:2102.06237 [pdf, other]: Title: An Investigation of End-to-End Models for Robust Speech Recognition

Authors: Archiki Prasad, Preethi Jyothi, Rajbabu Velmurugan

Comments: Accepted to appear at ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[37] arXiv:2102.06306 [pdf, other]: Title: DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals

Authors: Satwinder Singh, Ruili Wang, Yuanhang Qiu

Comments: Accepted in ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[38] arXiv:2102.06322 [pdf, other]: Title: Joint Dereverberation and Separation with Iterative Source Steering

Authors: Taishi Nakashima, Robin Scheibler, Masahito Togami, Nobutaka Ono

Comments: 5 pages, 2 figures, accepted at ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[39] arXiv:2102.06332 [pdf, ps, other]: Title: Data Augmentation with Signal Companding for Detection of Logical Access Attacks

Authors: Rohan Kumar Das, Jichen Yang, Haizhou Li

Comments: 5 pages, Accepted for publication in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2021

Subjects: Audio and Speech Processing (eess.AS)
[40] arXiv:2102.06454 [pdf, other]: Title: Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier

Authors: Guillaume Carbajal, Julius Richter, Timo Gerkmann

Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[41] arXiv:2102.06610 [pdf, other]: Title: Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Authors: Jonah Casebeer, Vinjai Vale, Umut Isik, Jean-Marc Valin, Ritwik Giri, Arvindh Krishnaswamy

Comments: 5 pages, 2 figures, ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[42] arXiv:2102.06744 [pdf, ps, other]: Title: Hybrid phonetic-neural model for correction in speech recognition systems

Authors: Rafael Viana-Cámara, Mario Campos-Soberanis, Diego Campos-Sobrino

Comments: 13 pages, 3 figures, presented in COMIA 2020 (this http URL)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[43] arXiv:2102.06816 [pdf, other]: Title: Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASR

Authors: Ruchao Fan, Amber Afshan, Abeer Alwan

Comments: Accepted to ICASSP2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[44] arXiv:2102.07047 [pdf, other]: Title: Adversarial defense for automatic speaker verification by cascaded self-supervised learning models

Authors: Haibin Wu, Xu Li, Andy T. Liu, Zhiyong Wu, Helen Meng, Hung-yi Lee

Comments: Accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[45] arXiv:2102.07054 [pdf, other]: Title: Inverted Vocal Tract Variables and Facial Action Units to Quantify Neuromotor Coordination in Schizophrenia

Authors: Yashish Maduwantha H.P.E.R.S, Chris Kitchen, Deanna L. Kelly, Carol Espy-Wilson

Comments: Conference

Subjects: Audio and Speech Processing (eess.AS)
[46] arXiv:2102.07330 [pdf, other]: Title: A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement

Authors: Tyler Vuong, Yangyang Xia, Richard M. Stern

Comments: Accepted IEEE ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS)
[47] arXiv:2102.07390 [pdf, other]: Title: Representation Learning For Speech Recognition Using Feedback Based Relevance Weighting

Authors: Purvi Agrawal, Sriram Ganapathy

Comments: arXiv admin note: substantial text overlap with arXiv:2011.00721, arXiv:2011.02136, arXiv:2001.07067

Journal-ref: IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2021

Subjects: Audio and Speech Processing (eess.AS)
[48] arXiv:2102.07445 [pdf, other]: Title: On training targets for noise-robust voice activity detection

Authors: Sebastian Braun, Ivan Tashev

Journal-ref: 29th European Signal Processing Conference (EUSIPCO), 2021, Dublin, Ireland

Subjects: Audio and Speech Processing (eess.AS)
[49] arXiv:2102.07786 [pdf, other]: Title: PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components

Authors: Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

Comments: 5 pages, accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[50] arXiv:2102.07955 [pdf, other]: Title: Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Authors: Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

Comments: Submitted to Computer Speech & Language

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[51] arXiv:2102.07961 [pdf, other]: Title: Semi-Supervised Singing Voice Separation with Noisy Self-Training

Authors: Zhepei Wang, Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh Krishnaswamy

Comments: Accepted at 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

Subjects: Audio and Speech Processing (eess.AS)
[52] arXiv:2102.08075 [pdf, other]: Title: Axial Residual Networks for CycleGAN-based Voice Conversion

Authors: Jaeseong You, Gyuhyeon Nam, Dalhyun Kim, Gyeongsu Chae

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53] arXiv:2102.08328 [pdf, other]: Title: Context-Aware Prosody Correction for Text-Based Speech Editing

Authors: Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Comments: To appear in proceedings of ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[54] arXiv:2102.08706 [pdf, other]: Title: Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Authors: Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann

Comments: ICASSP 2021. (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[55] arXiv:2102.09106 [pdf, other]: Title: Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition

Authors: Gary Yeung, Ruchao Fan, Abeer Alwan

Comments: To be published in IEEE ICASSP

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56] arXiv:2102.09168 [pdf, other]: Title: Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition

Authors: Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

Comments: Accepted to ICASSP2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2102.09660 [pdf, other]: Title: Generative Speech Coding with Predictive Variance Regularization

Authors: W. Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Hengchin Yeh

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58] arXiv:2102.09666 [pdf, other]: Title: Dynamic curriculum learning via data parameters for noise robust keyword spotting

Authors: Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir

Comments: Accepted at ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[59] arXiv:2102.09838 [pdf, other]: Title: A Robust Maximum Likelihood Distortionless Response Beamformer based on a Complex Generalized Gaussian Distribution

Authors: Weixin Meng, Chengshi Zheng, Xiaodong Li

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[60] arXiv:2102.09853 [pdf, ps, other]: Title: Direction of Arrival Estimation of Noisy Speech Using Convolutional Recurrent Neural Networks with Higher-Order Ambisonics Signals

Authors: Nils Poschadel, Robert Hupke, Stephan Preihs, Jürgen Peissig

Comments: 5 pages, 6 figures. Accepted to EUSIPCO 2021

Subjects: Audio and Speech Processing (eess.AS)
[61] arXiv:2102.09918 [pdf, other]: Title: End-to-End Neural Systems for Automatic Children Speech Recognition: An Empirical Study

Authors: Prashanth Gurunath Shivakumar, Shrikanth Narayanan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62] arXiv:2102.09928 [pdf, other]: Title: Do End-to-End Speech Recognition Models Care About Context?

Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel

Comments: Published in the proceedings of INTERSPEECH 2020, pp. 4352-4356

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[63] arXiv:2102.09939 [pdf, ps, other]: Title: ABSP System for The Third DIHARD Challenge

Authors: A Kishore Kumar, Shefali Waldekar, Goutam Saha, Md Sahidullah

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[64] arXiv:2102.09959 [pdf, other]: Title: Artificially Synthesising Data for Audio Classification and Segmentation to Improve Speech and Music Detection in Radio Broadcast

Authors: Satvik Venkatesh, David Moffat, Alexis Kirke, Gözel Shakeri, Stephen Brewster, Jörg Fachner, Helen Odell-Miller, Alex Street, Nicolas Farina, Sube Banerjee, Eduardo Reck Miranda

Comments: 5 pages, 3 figures, Accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[65] arXiv:2102.10345 [pdf, other]: Title: Model architectures to extrapolate emotional expressions in DNN-based text-to-speech

Authors: Katsuki Inoue, Sunao Hara, Masanobu Abe, Nobukatsu Hojo, Yusuke Ijima

Comments: This is the author's final draft. Accepted by Speech Communication. Please refer to the journal if you want

Subjects: Audio and Speech Processing (eess.AS)
[66] arXiv:2102.10376 [pdf, other]: Title: The Use of Voice Source Features for Sung Speech Recognition

Authors: Gerardo Roa Dabike, Jon Barker

Comments: Accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
[67] arXiv:2102.10449 [pdf, other]: Title: WARP-Q: Quality Prediction For Generative Neural Speech Codecs

Authors: Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines

Comments: Accepted for presentation at IEEE ICASSP 2021. Source code and data can be found on this https URL

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[68] arXiv:2102.10815 [pdf, other]: Title: LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation

Authors: Zhen Zeng, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2012.01684

Subjects: Audio and Speech Processing (eess.AS)
[69] arXiv:2102.11265 [pdf, other]: Title: Automated Evaluation Of Psychotherapy Skills Using Speech And Language Technologies

Authors: Nikolaos Flemotomos, Victor R. Martinez, Zhuohao Chen, Karan Singla, Victor Ardulov, Raghuveer Peri, Derek D. Caperton, James Gibson, Michael J. Tanana, Panayiotis Georgiou, Jake Van Epps, Sarah P. Lord, Tad Hirsch, Zac E. Imel, David C. Atkins, Shrikanth Narayanan

Comments: new version has an updated title

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[70] arXiv:2102.11480 [pdf, ps, other]: Title: Evolutionary optimization of contexts for phonetic correction in speech recognition systems

Authors: Rafael Viana-Cámara, Diego Campos-Sobrino, Mario Campos-Soberanis

Comments: 13 pages, 4 figures, This article is a translation of the paper "Optimizaci\'on evolutiva de contextos para la correcci\'on fon\'etica en sistemas de reconocimiento del habla" presented in COMIA 2019

Journal-ref: Research in Computing Science Issue 148(8), 2019, pp. 293-306. ISSN 1870-4069

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[71] arXiv:2102.11525 [pdf, other]: Title: End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend

Authors: Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Tsubasa Ochiai, Naoyuki Kamo, Reinhold Haeb-Umbach, Yanmin Qian

Comments: 5 pages, 1 figure, accepted by ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[72] arXiv:2102.11594 [pdf, other]: Title: Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Authors: Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[73] arXiv:2102.11634 [pdf, other]: Title: Dual-Path Modeling for Long Recording Speech Separation in Meetings

Authors: Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian

Comments: Accepted by ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[74] arXiv:2102.11906 [pdf, other]: Title: Handling Background Noise in Neural Speech Generation

Authors: Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh, W. Bastiaan Kleijn, Jan Skoglund

Comments: 5 pages, 3 figures, presented at the Asilomar Conference on Signals, Systems, and Computers 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75] arXiv:2102.12078 [pdf, other]: Title: Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks

Authors: Ju Lin, Adriaan J. van Wijngaarden, Kuang-Ching Wang, Melissa C. Smith

Comments: Preprint

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[76] arXiv:2102.12394 [pdf, other]: Title: SEP-28k: A Dataset for Stuttering Event Detection From Podcasts With People Who Stutter

Authors: Colin Lea, Vikramjit Mitra, Aparna Joshi, Sachin Kajarekar, Jeffrey P. Bigham

Comments: Accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[77] arXiv:2102.12397 [pdf, other]: Title: Thoughts on the potential to compensate a hearing loss in noise

Authors: Marc René Schädler

Comments: 26 pages, 22 figures, related code this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[78] arXiv:2102.12624 [pdf, other]: Title: Meta-Learning for improving rare word recognition in end-to-end ASR

Authors: Florian Lux, Ngoc Thang Vu

Comments: Revised version to be published in the proceedings of ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79] arXiv:2102.12829 [pdf, other]: Title: Automatic Classification of OSA related Snoring Signals from Nocturnal Audio Recordings

Authors: Arun Sebastian, Peter A. Cistulli, Gary Cohen, Philip de Chazal

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[80] arXiv:2102.13334 [pdf, ps, other]: Title: Integration of deep learning with expectation maximization for spatial cue based speech separation in reverberant conditions

Authors: Sania Gul, Muhammad Salman Khan, Syed Waqar Shah

Subjects: Audio and Speech Processing (eess.AS)
[81] arXiv:2102.13397 [pdf, other]: Title: Underwater Acoustic Communication Receiver Using Deep Belief Network

Authors: Abigail Lee-Leon, Chau Yuen, Dorien Herremans

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[82] arXiv:2102.13468 [pdf, other]: Title: The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates

Authors: Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, Casper Kaandorp

Comments: 5 pages

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[83] arXiv:2102.04832 (cross-list from eess.SP) [pdf, other]: Title: Fast and Accurate Amplitude Demodulation of Wideband Signals

Authors: Mantas Gabrielaitis

Comments: Accepted for publication in IEEE Transactions on Signal Processing

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2102.06269 (cross-list from eess.IV) [pdf, other]: Title: Disentanglement for audio-visual emotion recognition using multitask setup

Authors: Raghuveer Peri, Srinivas Parthasarathy, Charles Bradshaw, Shiva Sundaram

Comments: Accepted for ICASSP 2021, 5 pages

Subjects: Image and Video Processing (eess.IV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2102.06393 (cross-list from eess.SP) [pdf, other]: Title: Mind the beat: detecting audio onsets from EEG recordings of music listening

Authors: Ashvala Vinay, Alexander Lerch, Grace Leslie

Comments: to be published in ICASSP 2021 4 figures, 5 pages (4 pages of content + 1 page of references)

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2102.07896 (cross-list from eess.SP) [pdf, other]: Title: A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Authors: Yongwan Lim, Asterios Toutios, Yannick Bliesener, Ye Tian, Sajan Goud Lingala, Colin Vaz, Tanner Sorensen, Miran Oh, Sarah Harper, Weiyi Chen, Yoonjeong Lee, Johannes Töger, Mairym Lloréns Montesserin, Caitlin Smith, Bianca Godinez, Louis Goldstein, Dani Byrd, Krishna S. Nayak, Shrikanth S. Narayanan

Comments: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Data

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[87] arXiv:2102.07990 (cross-list from eess.SP) [pdf, other]: Title: Through-the-Wall Radar under Electromagnetic Complex Wall: A Deep Learning Approach

Authors: Fardin Ghorbani, Hossein Soleimani

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88] arXiv:2102.00151 (cross-list from cs.SD) [pdf, other]: Title: Expressive Neural Voice Cloning

Authors: Paarth Neekhara, Shehzeen Hussain, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

Comments: 12 pages, 2 figures, 2 tables

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[89] arXiv:2102.00201 (cross-list from cs.SD) [pdf, other]: Title: Melon Playlist Dataset: a public dataset for audio-based playlist generation and music tagging

Authors: Andres Ferraro, Yuntae Kim, Soohyeon Lee, Biho Kim, Namjun Jo, Semi Lim, Suyon Lim, Jungtaek Jang, Sehwan Kim, Xavier Serra, Dmitry Bogdanov

Comments: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[90] arXiv:2102.00247 (cross-list from cs.CL) [pdf, other]: Title: Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet

Authors: Shilun Lin, Fenglong Xie, Li Meng, Xinhui Li, Li Lu

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[91] arXiv:2102.00291 (cross-list from cs.SD) [pdf, other]: Title: Speech Recognition by Simply Fine-tuning BERT

Authors: Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, Tomoki Toda

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[92] arXiv:2102.00313 (cross-list from cs.SD) [pdf, other]: Title: Cortical Features for Defense Against Adversarial Audio Attacks

Authors: Ilya Kavalerov, Ruijie Zheng, Wojciech Czaja, Rama Chellappa

Comments: Co-author legal name changed

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[93] arXiv:2102.00382 (cross-list from cs.SD) [pdf, other]: Title: Structure-Aware Audio-to-Score Alignment using Progressively Dilated Convolutional Neural Networks

Authors: Ruchit Agrawal, Daniel Wolff, Simon Dixon

Comments: ICASSP 2021 camera-ready version. Copyrights belong to IEEE

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[94] arXiv:2102.00429 (cross-list from cs.SD) [pdf, other]: Title: High Fidelity Speech Regeneration with Application to Speech Enhancement

Authors: Adam Polyak, Lior Wolf, Yossi Adi, Ori Kabeli, Yaniv Taigman

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95] arXiv:2102.00550 (cross-list from cs.SD) [pdf, other]: Title: Boosting the Predictive Accurary of Singer Identification Using Discrete Wavelet Transform For Feature Extraction

Authors: Victoire Djimna Noyum, Younous Perieukeu Mofenjou, Cyrille Feudjio, Alkan Göktug, Ernest Fokoué

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96] arXiv:2102.00616 (cross-list from cs.SD) [pdf, ps, other]: Title: Neural Network architectures to classify emotions in Indian Classical Music

Authors: Uddalok Sarkar, Sayan Nag, Medha Basu, Archi Banerjee, Shankha Sanyal, Ranjan Sengupta, Dipak Ghosh

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[97] arXiv:2102.01013 (cross-list from cs.CL) [pdf, other]: Title: End2End Acoustic to Semantic Transduction

Authors: Valentin Pelloin, Nathalie Camelin, Antoine Laurent, Renato De Mori, Antoine Caubrière, Yannick Estève, Sylvain Meignier

Comments: Accepted at IEEE ICASSP 2021

Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2102.01133 (cross-list from cs.SD) [pdf, other]: Title: Deep Music Information Dynamics

Authors: Shlomo Dubnov

Journal-ref: The 2020 Joint Conference on AI Music Creativity, October 19-23, 2020, Royal Institute of Technology (KTH), Stockholm, Sweden

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99] arXiv:2102.01243 (cross-list from cs.SD) [pdf, other]: Title: PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation

Authors: Yuan Gong, Yu-An Chung, James Glass

Comments: Published in IEEE/ACM Transactions on Audio Speech and Language Processing. Code at this https URL

Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3292-3306, 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[100] arXiv:2102.01547 (cross-list from cs.SD) [pdf, other]: Title: WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit

Authors: Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, Xin Lei

Comments: 5 pages, 2 figures, 4 tables

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[101] arXiv:2102.01640 (cross-list from cs.SD) [pdf, other]: Title: SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer

Authors: Pramit Saha, Debasish Ray Mohapatra, Sidney Fels

Comments: 2 pages, 1 figure

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[102] arXiv:2102.01692 (cross-list from cs.SD) [pdf, ps, other]: Title: Generacion de voces artificiales infantiles en castellano con acento costarricense

Authors: Ana Lilia Alvarez-Blanco, Eugenia Cordoba-Warner, Marvin Coto-Jimenez, Vivian Fallas-Lopez, Maribel Morales Rodriguez

Comments: 12 pages, in Spanish

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[103] arXiv:2102.01813 (cross-list from cs.SD) [pdf, other]: Title: Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

Authors: Mingke Xu, Fan Zhang, Xiaodong Cui, Wei Zhang

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[104] arXiv:2102.01927 (cross-list from cs.SD) [pdf, ps, other]: Title: Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance

Authors: Keisuke Imoto, Sakiko Mishima, Yumi Arai, Reishi Kondo

Comments: Accepted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.15253

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105] arXiv:2102.01930 (cross-list from cs.SD) [pdf, other]: Title: General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework

Authors: Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[106] arXiv:2102.01991 (cross-list from cs.SD) [pdf, other]: Title: Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Authors: Shengkui Zhao, Hao Wang, Trung Hieu Nguyen, Bin Ma

Comments: 5 pages, 2 figures, 4 tables, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[107] arXiv:2102.01993 (cross-list from cs.SD) [pdf, other]: Title: Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Authors: Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

Comments: 5 pages, 4 figures, 2 tables, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[108] arXiv:2102.02028 (cross-list from cs.SD) [pdf, other]: Title: Music source separation conditioned on 3D point clouds

Authors: Francesc Lluís, Vasileios Chatziioannou, Alex Hofmann

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[109] arXiv:2102.02074 (cross-list from cs.SD) [pdf, ps, other]: Title: Data Generation Using Pass-phrase-dependent Deep Auto-encoders for Text-Dependent Speaker Verification

Authors: Achintya Kumar Sarkar, Md Sahidullah, Zheng-Hua Tan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[110] arXiv:2102.02270 (cross-list from cs.CL) [pdf, other]: Title: Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Authors: Prashanth Gurunath Shivakumar, Panayiotis Georgiou, Shrikanth Narayanan

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111] arXiv:2102.02282 (cross-list from cs.SD) [pdf, other]: Title: Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks

Authors: Bruno Di Giorgi, Matthias Mauch, Mark Levy

Comments: 7 pages, 5 figures, Proceedings of the 21st International Society for Music Information Retrieval Conference, ISMIR 2020

Journal-ref: Proceedings of the 21st International Society for Music Information Retrieval Conference (2020) 216-222

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[112] arXiv:2102.02417 (cross-list from cs.SD) [pdf, other]: Title: Audio Adversarial Examples: Attacks Using Vocal Masks

Authors: Kai Yuan Tay, Lynnette Ng, Wei Han Chua, Lucerne Loke, Danqi Ye, Melissa Chua

Comments: 9 pages, 1 figure, 2 tables. Submitted to COLING2020

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[113] arXiv:2102.02640 (cross-list from cs.SD) [pdf, ps, other]: Title: Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Authors: Gang Min, Xiongwei Zhang, Xia Zou, Xiangyang Liu

Comments: 6 pages

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[114] arXiv:2102.02964 (cross-list from cs.SD) [pdf, ps, other]: Title: Diversity-Robust Acoustic Feature Signatures Based on Multiscale Fractal Dimension for Similarity Search of Environmental Sounds

Authors: Motohiro Sunouchi, Masaharu Yoshioka

Comments: 15 pages, 14 figures

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[115] arXiv:2102.03049 (cross-list from cs.SD) [pdf, ps, other]: Title: Benchmarking of eight recurrent neural network variants for breath phase and adventitious sound detection on a self-developed open-access lung sound database-HF_Lung_V1

Authors: Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Chao-Jung Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Jack Hsiao, Chung-Wei Chen, Li-Chin Chen, Yen-Chun Lai, Bi-Fang Hsu, Nian-Jhen Lin, Wan-Lin Tsai, Yi-Lin Wu, Tzu-Ling Tseng, Ching-Ting Tseng, Yi-Tsun Chen, Feipei Lai

Comments: 48 pages, 8 figures. Accepted by PLoS One

Journal-ref: PLoS ONE, 2021, 16(7): e0254134

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[116] arXiv:2102.03055 (cross-list from cs.SD) [pdf, other]: Title: Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASR

Authors: Ruizhi Li, Gregory Sell, Hynek Hermansky

Comments: Accepted at IEEE SLT 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[117] arXiv:2102.03170 (cross-list from cs.SD) [pdf, other]: Title: White-box Audio VST Effect Programming

Authors: Christopher Mitcheltree, Hideki Koike

Comments: The latest version of the system is to appear at EvoMUSART 2021 as a full paper. Audio samples of the latest system can be listened to at this https URL

Journal-ref: 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[118] arXiv:2102.03207 (cross-list from cs.SD) [pdf, other]: Title: Real-time Denoising and Dereverberation with Tiny Recurrent U-Net

Authors: Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee

Comments: 5 pages, 2 figures, 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). arXiv admin note: text overlap with arXiv:2006.00687

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[119] arXiv:2102.03229 (cross-list from cs.SD) [pdf, other]: Title: Multi-Task Self-Supervised Pre-Training for Music Classification

Authors: Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee, Juan Pablo Bello, Chao Wang

Comments: Copyright 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[120] arXiv:2102.03424 (cross-list from cs.CV) [pdf, other]: Title: Learning Audio-Visual Correlations from Variational Cross-Modal Generation

Authors: Ye Zhu, Yu Wu, Hugo Latapie, Yi Yang, Yan Yan

Comments: Accepted to ICASSP 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[121] arXiv:2102.03662 (cross-list from cs.CL) [pdf, other]: Title: A bandit approach to curriculum generation for automatic speech recognition

Authors: Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2102.03868 (cross-list from cs.SD) [pdf, other]: Title: U-vectors: Generating clusterable speaker embedding from unlabeled data

Authors: M. F. Mridha, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md. Abdul Hamid, Md. Rashedul Islam, Yutaka Watanobe

Comments: 18 pages, 7 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[123] arXiv:2102.03957 (cross-list from cs.SD) [pdf, other]: Title: Extracting the Auditory Attention in a Dual-Speaker Scenario from EEG using a Joint CNN-LSTM Model

Authors: Ivine Kuruvila, Jan Muncke, Eghart Fischer, Ulrich Hoppe

Comments: 18 pages, 6 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124] arXiv:2102.04040 (cross-list from cs.SD) [pdf, ps, other]: Title: LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Authors: Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu

Comments: Accepted to ICASSP 21

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[125] arXiv:2102.04051 (cross-list from cs.HC) [pdf, other]: Title: HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

Authors: Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

Comments: 5 pages, 6 figures, to be published in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126] arXiv:2102.04056 (cross-list from cs.SD) [pdf, other]: Title: Speaker and Direction Inferred Dual-channel Speech Separation

Authors: Chenxing Li, Jiaming Xu, Nima Mesgarani, Bo Xu

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2102.04062 (cross-list from cs.SD) [pdf, ps, other]: Title: An Update on a Progressively Expanded Database for Automated Lung Sound Analysis

Authors: Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Jack Hsiao, Chung-Wei Chen, Feipei Lai

Comments: Under review, 14 pages, 5 figures, 3 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[128] arXiv:2102.04198 (cross-list from cs.SD) [pdf, other]: Title: ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network

Authors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li

Comments: 5 pages, 3 figures, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2102.04254 (cross-list from cs.CE) [pdf, other]: Title: A Data-Driven Approach to Violin Making

Authors: Sebastian Gonzalez, Davide Salvi, Daniel Baeza, Fabio Antonacci, Augusto Sarti

Subjects: Computational Engineering, Finance, and Science (cs.CE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2102.04429 (cross-list from cs.SD) [pdf, other]: Title: Federated Acoustic Modeling For Automatic Speech Recognition

Authors: Xiaodong Cui, Songtao Lu, Brian Kingsbury

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Distributed, Parallel, and Cluster Computing (cs.DC); Audio and Speech Processing (eess.AS)
[131] arXiv:2102.04488 (cross-list from cs.CL) [pdf, other]: Title: Wake Word Detection with Streaming Transformers

Authors: Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Comments: Accepted at IEEE ICASSP 2021. 5 pages, 3 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132] arXiv:2102.04588 (cross-list from cs.SD) [pdf, other]: Title: A comparative study of two-dimensional vocal tract acoustic modeling based on Finite-Difference Time-Domain methods

Authors: Debasish Ray Mohapatra, Victor Zappi, Sidney Fels

Comments: 4 pages, 3 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[133] arXiv:2102.04680 (cross-list from cs.SD) [pdf, other]: Title: TräumerAI: Dreaming Music with StyleGAN

Authors: Dasaem Jeong, Seungheon Doh, Taegyun Kwon

Comments: presented in NeurIPS Workshop 2020: Machine Learning for Creativity and Design

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[134] arXiv:2102.04740 (cross-list from stat.ME) [pdf, other]: Title: Principal components variable importance reconstruction (PC-VIR): Exploring predictive importance in multicollinear acoustic speech data

Authors: Christopher Carignan, Ander Egurtzegi

Comments: 10 pages, 3 figures, GitHub repository

Subjects: Methodology (stat.ME); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2102.04880 (cross-list from cs.SD) [pdf, ps, other]: Title: Diagnosis of COVID-19 and Non-COVID-19 Patients by Classifying Only a Single Cough Sound

Authors: Masoud Maleki

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)
[136] arXiv:2102.04932 (cross-list from cs.LG) [pdf, other]: Title: Sparsification via Compressed Sensing for Automatic Speech Recognition

Authors: Kai Zhen (1 and 2), Hieu Duy Nguyen (2), Feng-Ju Chang (2), Athanasios Mouchtaris (2), Ariya Rastrow (2). ((1) Indiana University Bloomington, (2) Alexa Machine Learning, Amazon, USA)

Comments: 5 pages, accepted for publication in (ICASSP 2021) 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing. June 6-12, 2021. Location: Toronto, ON, Canada

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2102.04945 (cross-list from cs.SD) [pdf, other]: Title: On permutation invariant training for speech source separation

Authors: Xiaoyu Liu, Jordi Pons

Comments: In proceedings of ICASSP2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[138] arXiv:2102.04997 (cross-list from cs.LG) [pdf, other]: Title: Deep Neural Network based Cough Detection using Bed-mounted Accelerometer Measurements

Authors: Madhurananda Pahar, Igor Miranda, Andreas Diacon, Thomas Niesler

Comments: It has been accepted in ICASSP, 2021. Copyright information is shown at the very first page

Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[139] arXiv:2102.05151 (cross-list from cs.SD) [pdf, other]: Title: Enhancing Audio Augmentation Methods with Consistency Learning

Authors: Turab Iqbal, Karim Helwani, Arvindh Krishnaswamy, Wenwu Wang

Comments: Accepted to 46th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[140] arXiv:2102.05225 (cross-list from cs.SD) [pdf, other]: Title: Exploring Automatic COVID-19 Diagnosis via voice and symptoms from Crowdsourced Data

Authors: Jing Han, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Cecilia Mascolo

Comments: 5 pages, 3 figures, 2 tables, Accepted for publication at ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2102.05630 (cross-list from cs.SD) [pdf, other]: Title: Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning

Authors: Giuseppe Ruggiero, Enrico Zovato, Luigi Di Caro, Vincent Pollet

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[142] arXiv:2102.05749 (cross-list from cs.SD) [pdf, ps, other]: Title: Self-Supervised VQ-VAE for One-Shot Music Style Transfer

Authors: Ondřej Cífka, Alexey Ozerov, Umut Şimşekli, Gaël Richard

Comments: ICASSP 2021. Website: this https URL

Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021) 96-100

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[143] arXiv:2102.05872 (cross-list from cs.SD) [pdf, ps, other]: Title: Onoma-to-wave: Environmental sound synthesis from onomatopoeic words

Authors: Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryosuke Yamanishi, Takahiro Fukumori, Yoichi Yamashita

Comments: Accepted to APSIPA Transactions on Signal and Information Processing

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2102.05894 (cross-list from cs.SD) [pdf, ps, other]: Title: CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions

Authors: Ali Bou Nassif, Ismail Shahin, Shibani Hamsa, Nawel Nemmour, Keikichi Hirose

Comments: Published in Applied Soft Computing journal

Journal-ref: Applied Soft Computing, Elsevier, 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[145] arXiv:2102.06003 (cross-list from cs.SD) [pdf, ps, other]: Title: Language Independent Emotion Quantification using Non linear Modelling of Speech

Authors: Uddalok Sarkar, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[146] arXiv:2102.06034 (cross-list from cs.SD) [pdf, other]: Title: Speech enhancement with mixture-of-deep-experts with clean clustering pre-training

Authors: Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot

Comments: arXiv admin note: text overlap with arXiv:1703.09302

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[147] arXiv:2102.06038 (cross-list from cs.SD) [pdf, ps, other]: Title: A Fractal Approach to Characterize Emotions in Audio and Visual Domain: A Study on Cross-Modal Interaction

Authors: Sayan Nag, Uddalok Sarkar, Shankha Sanyal, Archi Banerjee, Souparno Roy, Samir Karmakar, Ranjan Sengupta, Dipak Ghosh

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[148] arXiv:2102.06142 (cross-list from cs.SD) [pdf, other]: Title: Multichannel-based learning for audio object extraction

Authors: Daniel Arteaga, Jordi Pons

Comments: In proceedings of ICASSP2021. Appendix added

Journal-ref: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 206-210

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2102.06283 (cross-list from cs.CL) [pdf, other]: Title: Speech-language Pre-training for End-to-end Spoken Language Understanding

Authors: Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150] arXiv:2102.06291 (cross-list from cs.SD) [pdf, other]: Title: A Multi-View Approach To Audio-Visual Speaker Verification

Authors: Leda Sarı, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[151] arXiv:2102.06357 (cross-list from cs.SD) [pdf, other]: Title: Contrastive Unsupervised Learning for Speech Emotion Recognition

Authors: Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[152] arXiv:2102.06380 (cross-list from cs.CL) [pdf, ps, other]: Title: Neural Inverse Text Normalization

Authors: Monica Sunkara, Chaitanya Shivade, Sravan Bodapati, Katrin Kirchhoff

Comments: 5 pages, accepted to ICASSP 2021

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[153] arXiv:2102.06431 (cross-list from cs.SD) [pdf, other]: Title: VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

Authors: Peng Liu, Yuewen Cao, Songxiang Liu, Na Hu, Guangzhi Li, Chao Weng, Dan Su

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[154] arXiv:2102.06455 (cross-list from cs.SD) [pdf, other]: Title: Deep Sound Field Reconstruction in Real Rooms: Introducing the ISOBEL Sound Field Dataset

Authors: Miklas Strøm Kristoffersen, Martin Bo Møller, Pablo Martínez-Nuevo, Jan Østergaard

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[155] arXiv:2102.06467 (cross-list from cs.SD) [pdf, other]: Title: Content-Aware Speaker Embeddings for Speaker Diarisation

Authors: G. Sun, D. Liu, C. Zhang, P. C. Woodland

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[156] arXiv:2102.06657 (cross-list from cs.CV) [pdf, other]: Title: End-to-end Audio-visual Speech Recognition with Conformers

Authors: Pingchuan Ma, Stavros Petridis, Maja Pantic

Comments: Accepted to ICASSP 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[157] arXiv:2102.06750 (cross-list from cs.CL) [pdf, other]: Title: Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

Authors: Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Comments: Proc. IEEE ICASSP 2021

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[158] arXiv:2102.06930 (cross-list from cs.SD) [pdf, other]: Title: Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms

Authors: Kleanthis Avramidis, Agelos Kratimenos, Christos Garoufis, Athanasia Zlatintsi, Petros Maragos

Comments: 5 pages, 4 figures, 6 tables, to be published in the Proc. of the 46th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021) @ Toronto, Ontario, Canada

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159] arXiv:2102.06934 (cross-list from cs.SD) [pdf, other]: Title: Multi-Channel Speech Enhancement using Graph Neural Networks

Authors: Panagiotis Tzirakis, Anurag Kumar, Jacob Donley

Journal-ref: Proc. ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2102.07133 (cross-list from cs.SD) [pdf, other]: Title: Parametric Optimization of Violin Top Plates using Machine Learning

Authors: Davide Salvi, Sebastian Gonzalez, Fabio Antonacci, Augusto Sarti

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[161] arXiv:2102.07259 (cross-list from cs.SD) [pdf, other]: Title: Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition

Authors: Priyabrata Karmakar, Shyh Wei Teng, Guojun Lu

Comments: Submitted to IEEE/ACM Trans. on Audio, Speech, and Language Processing

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[162] arXiv:2102.07307 (cross-list from cs.SD) [pdf, other]: Title: I-vector Based Within Speaker Voice Quality Identification on connected speech

Authors: Chuyao Feng, Eva van Leer, Mackenzie Lee Curtis, David V. Anderson

Comments: s

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[163] arXiv:2102.07594 (cross-list from cs.CL) [pdf, other]: Title: Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT

Authors: Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Comments: 14 pages, 7 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[164] arXiv:2102.07982 (cross-list from cs.SD) [pdf, other]: Title: Voice Gender Scoring and Independent Acoustic Characterization of Perceived Masculinity and Femininity

Authors: Fuling Chen, Roberto Togneri, Murray Maybery, Diana Tan

Comments: 24 pages, 7 figures, journal

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2102.08015 (cross-list from cs.SD) [pdf, ps, other]: Title: Improving speech recognition models with small samples for air traffic control systems

Authors: Yi Lin, Qin Li, Bo Yang, Zhen Yan, Huachun Tan, Zhengmao Chen

Comments: This work has been accepted by Neurocomputing for publication

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[166] arXiv:2102.08074 (cross-list from cs.SD) [pdf, other]: Title: Semi Supervised Learning For Few-shot Audio Classification By Episodic Triplet Mining

Authors: Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu

Comments: 5 pages

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167] arXiv:2102.08183 (cross-list from cs.SD) [pdf, other]: Title: Comparison of semi-supervised deep learning algorithms for audio classification

Authors: Léo Cances, Etienne Labbé, Thomas Pellegrini

Comments: 9 pages, 5 figures, 5 tables. This is the version 3 of the paper. Contains minor fixes compared to the EURASIP one (which is the version 2 of the paper)

Journal-ref: EURASIP Journal on Audio, Speech, and Music Processing, 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[168] arXiv:2102.08359 (cross-list from cs.SD) [pdf, other]: Title: End-2-End COVID-19 Detection from Breath & Cough Audio

Authors: Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Alice Baird, Lyn Jones, Björn W. Schuller

Comments: 5 pages

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[169] arXiv:2102.08535 (cross-list from cs.CL) [pdf, ps, other]: Title: ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Authors: Yi Lin, Bo Yang, Linchao Li, Dongyue Guo, Jianwei Zhang, Hu Chen, Yi Zhang

Comments: An improved work based on our previous Interspeech 2020 paper (this https URL)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2102.08551 (cross-list from cs.SD) [pdf, other]: Title: Weighted Recursive Least Square Filter and Neural Network based Residual Echo Suppression for the AEC-Challenge

Authors: Ziteng Wang, Yueyue Na, Zhang Liu, Biao Tian, Qiang Fu

Comments: 5 pages, 2 figures, accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[171] arXiv:2102.08575 (cross-list from cs.SD) [pdf, ps, other]: Title: End-to-end lyrics Recognition with Voice to Singing Style Transfer

Authors: Sakya Basak, Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

Comments: accepted at ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[172] arXiv:2102.08833 (cross-list from cs.SD) [pdf, other]: Title: DESED-FL and URBAN-FL: Federated Learning Datasets for Sound Event Detection

Authors: David S. Johnson, Wolfgang Lorenz, Michael Taenzer, Stylianos Mimilakis, Sascha Grollmisch, Jakob Abeßer, Hanna Lukashevich

Comments: To be published in EUSIPCO 2021

Subjects: Sound (cs.SD); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[173] arXiv:2102.09202 (cross-list from cs.SD) [pdf, other]: Title: Low Resource Audio-to-Lyrics Alignment From Polyphonic Music Recordings

Authors: Emir Demirel, Sven Ahlbäck, Simon Dixon

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2102.09281 (cross-list from cs.LG) [pdf, other]: Title: DINO: A Conditional Energy-Based GAN for Domain Translation

Authors: Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

Comments: Accepted to ICLR 2021

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2102.09607 (cross-list from cs.LG) [pdf, ps, other]: Title: Modelling Paralinguistic Properties in Conversational Speech to Detect Bipolar Disorder and Borderline Personality Disorder

Authors: Bo Wang, Yue Wu, Nemanja Vaci, Maria Liakata, Terry Lyons, Kate E A Saunders

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176] arXiv:2102.09680 (cross-list from cs.CL) [pdf, other]: Title: Fixing Errors of the Google Voice Recognizer through Phonetic Distance Metrics

Authors: Diego Campos-Sobrino, Mario Campos-Soberanis, Iván Martínez-Chin, Víctor Uc-Cetina

Comments: 13 pages, 4 figures. This article is a translation of the paper "Correcci\'on de errores del reconocedor de voz de Google usando m\'etricas de distancia fon\'etica" presented in COMIA 2018

Journal-ref: Research in Computing Science 148(1), 2019, pp. 57-70. ISSN 1870-4069

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[177] arXiv:2102.09737 (cross-list from cs.CV) [pdf, other]: Title: One Shot Audio to Animated Video Generation

Authors: Neeraj Kumar, Srishti Goel, Ankur Narang, Brejesh Lall, Mujtaba Hasan, Pranshu Agarwal, Dipankar Sarkar

Comments: arXiv admin note: substantial text overlap with arXiv:2012.07842, arXiv:2012.07304

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[178] arXiv:2102.09763 (cross-list from cs.SD) [pdf, other]: Title: Frequency-Temporal Attention Network for Singing Melody Extraction

Authors: Shuai Yu, Xiaoheng Sun, Yi Yu, Wei Li

Comments: This paper has been accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2102.09794 (cross-list from cs.SD) [pdf, other]: Title: Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure

Authors: Zixun Guo, Makris Dimos, Herremans Dorien

Journal-ref: Proc. of the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021(virtual)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[180] arXiv:2102.09817 (cross-list from cs.SD) [pdf, ps, other]: Title: Unit selection synthesis based data augmentation for fixed phrase speaker verification

Authors: Houjun Huang, Xu Xiang, Fei Zhao, Shuai Wang, Yanmin Qian

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181] arXiv:2102.09828 (cross-list from cs.SD) [pdf, other]: Title: AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

Authors: Houjun Huang, Xu Xiang, Yexin Yang, Rao Ma, Yanmin Qian

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[182] arXiv:2102.09914 (cross-list from cs.CL) [pdf, other]: Title: Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

Authors: Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier

Comments: 4 pages

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[183] arXiv:2102.09966 (cross-list from cs.SD) [pdf, ps, other]: Title: CatNet: music source separation system with mix-audio augmentation

Authors: Xuchen Song, Qiuqiang Kong, Xingjian Du, Yuxuan Wang

Comments: 5 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2102.09971 (cross-list from cs.SD) [pdf, other]: Title: Speech enhancement with weakly labelled data from AudioSet

Authors: Qiuqiang Kong, Haohe Liu, Xingjian Du, Li Chen, Rui Xia, Yuxuan Wang

Comments: 5 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2102.09978 (cross-list from cs.SD) [pdf, other]: Title: TransMask: A Compact and Fast Speech Separation Model Based on Transformer

Authors: Zining Zhang, Bingsheng He, Zhenjie Zhang

Comments: Accepted in ICASSP2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2102.10233 (cross-list from cs.SD) [pdf, other]: Title: The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods

Authors: Xian Shi, Fan Yu, Yizhou Lu, Yuhao Liang, Qiangze Feng, Daliang Wang, Yanmin Qian, Lei Xie

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2102.10236 (cross-list from cs.SD) [pdf, other]: Title: Singer Identification Using Deep Timbre Feature Learning with KNN-Net

Authors: Xulong Zhang, Jiale Qian, Yi Yu, Yifu Sun, Wei Li

Comments: Published as a conference paper at ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2102.10322 (cross-list from cs.SD) [pdf, other]: Title: Learnable MFCCs for Speaker Verification

Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Comments: Accepted to ISCAS 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[189] arXiv:2102.10331 (cross-list from q-bio.NC) [pdf, other]: Title: Separating Stimulus-Induced and Background Components of Dynamic Functional Connectivity in Naturalistic fMRI

Authors: Chee-Ming Ting, Jeremy I. Skipper, Steven L. Small, Hernando Ombao

Comments: Main paper: 10 pages, 8 figures. Supplemental file: 3 pages

Subjects: Neurons and Cognition (q-bio.NC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Signal Processing (eess.SP); Applications (stat.AP)
[190] arXiv:2102.10515 (cross-list from cs.SD) [pdf, other]: Title: Anomaly Detection in Audio with Concept Drift using Adaptive Huffman Coding

Authors: Pratibha Kumari, Mukesh Saini

Comments: 22 pages, 8 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2102.10905 (cross-list from cs.CL) [pdf, other]: Title: Joint Intent Detection And Slot Filling Based on Continual Learning Model

Authors: Yanfei Hui, Jianzong Wang, Ning Cheng, Fengying Yu, Tianbo Wu, Jing Xiao

Comments: Accepted to ICASSP 2021

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[192] arXiv:2102.11058 (cross-list from cs.SD) [pdf, other]: Title: Anyone GAN Sing

Authors: Shreeviknesh Sankaran, Sukavanan Nanjundan, G. Paavai Anand

Comments: 5 pages, 8 figures

Journal-ref: International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN: 2349-5162, Vol.7, Issue 5, page no. 25-29, May-2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[193] arXiv:2102.11114 (cross-list from cs.CL) [pdf, other]: Title: Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Authors: Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

Comments: Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[194] arXiv:2102.11420 (cross-list from cs.SD) [pdf, other]: Title: Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

Authors: Samuel J. Broughton, Md Asif Jalal, Roger K. Moore

Comments: For demo, see this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2102.11457 (cross-list from cs.SD) [pdf, other]: Title: Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning

Authors: Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Zeyu Xie, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2102.11474 (cross-list from cs.SD) [pdf, other]: Title: Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

Authors: Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2102.11488 (cross-list from cs.SD) [pdf, other]: Title: Senone-aware Adversarial Multi-task Training for Unsupervised Child to Adult Speech Adaptation

Authors: Richeng Duan, Nancy F. Chen

Comments: accepted for presentation at ICASSP-2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[198] arXiv:2102.11531 (cross-list from cs.SD) [pdf, other]: Title: Memory-efficient Speech Recognition on Smart Devices

Authors: Ganesh Venkatesh, Alagappan Valliappan, Jay Mahadeokar, Yuan Shangguan, Christian Fuegen, Michael L. Seltzer, Vikas Chandra

Journal-ref: ICASSP 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[199] arXiv:2102.11588 (cross-list from cs.SD) [pdf, other]: Title: Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

Authors: Julio Wissing, Benedikt Boenninghoff, Dorothea Kolossa, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Christopher Schymura

Comments: 4 pages, 6 figures, ICASSP 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[200] arXiv:2102.11771 (cross-list from cs.SD) [pdf, ps, other]: Title: Improving Deep Learning Sound Events Classifiers using Gram Matrix Feature-wise Correlations

Authors: Antonio Joia Neto, Andre G C Pacheco, Diogo C Luvizon

Comments: To appear on ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[201] arXiv:2102.12111 (cross-list from cs.SD) [pdf, other]: Title: Deep Learning Approach for Singer Voice Classification of Vietnamese Popular Music

Authors: Toan Pham Van, Ngoc N. Tran, Ta Minh Thanh

Comments: Published in SoICT 2019: Proceedings of the Tenth International Symposium on Information and Communication Technology

Journal-ref: SoICT 2019: Proceedings of the Tenth International Symposium on Information and Communication Technology

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[202] arXiv:2102.12289 (cross-list from cs.SD) [pdf, other]: Title: Automatic Feature Extraction for Heartbeat Anomaly Detection

Authors: Robert-George Colt, Csongor-Huba Várady, Riccardo Volpi, Luigi Malagò

Comments: 7 pages, 2 figures, Presented at PharML 2020 Workshop - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), see this https URL, source-code: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[203] arXiv:2102.12564 (cross-list from cs.SD) [pdf, other]: Title: Triplet loss based embeddings for forensic speaker identification in Spanish

Authors: Emmanuel Maqueda, Javier Alvarez-Jimenez, Carlos Mena, Ivan Meza

Comments: Long Paper: Neural Computing and Applications, Special Issue on LatinX in AI Research (2021). 11 pages, 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[204] arXiv:2102.12664 (cross-list from cs.CL) [pdf, other]: Title: MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition

Authors: Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu

Comments: To appear at ICASSP 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[205] arXiv:2102.12841 (cross-list from cs.SD) [pdf, other]: Title: MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

Comments: Accepted to ICASSP 2021. Project page: this http URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[206] arXiv:2102.13314 (cross-list from cs.LG) [pdf, other]: Title: Efficient Client Contribution Evaluation for Horizontal Federated Learning

Authors: Jie Zhao, Xinghua Zhu, Jianzong Wang, Jing Xiao

Comments: Accepted to ICASSP 2021

Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[207] arXiv:2102.13479 (cross-list from cs.SD) [pdf, other]: Title: Towards Explaining Expressive Qualities in Piano Recordings: Transfer of Explanatory Features via Acoustic Domain Adaptation

Authors: Shreyan Chowdhury, Gerhard Widmer

Comments: 5 pages, 3 figures; accepted for IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[208] arXiv:2102.13552 (cross-list from cs.SD) [pdf, other]: Title: The NPU System for the 2020 Personalized Voice Trigger Challenge

Authors: Jingyong Hou, Li Zhang, Yihui Fu, Qing Wang, Zhanheng Yang, Qijie Shao, Lei Xie

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

[ total of 208 entries: 1-208 ]
[ showing 208 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Feb 2021