Audio and Speech Processing

Authors and titles for eess.AS in May 2021, skipping first 50

[ total of 161 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-161 ]
[ showing 25 entries per page: fewer | more | all ]

[51] arXiv:2105.14826 [pdf, other]: Title: PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform

Authors: Wencheng Li, Zhenhua Tan, Jingyu Ning, Zhenche Xia, Danke Wu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[52] arXiv:2105.15162 [pdf, other]: Title: Automatic audiovisual synchronisation for ultrasound tongue imaging

Authors: Aciel Eshky, Joanne Cleland, Manuel Sam Ribeiro, Eleanor Sugden, Korin Richmond, Steve Renals

Comments: 18 pages, 10 figures. Manuscript accepted at Speech Communication

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[53] arXiv:2105.02225 (cross-list from eess.IV) [pdf, other]: Title: Model reduction in acoustic inversion by artificial neural network

Authors: Janne Koponen, Timo Lähivaara, Jari Kaipio, Marko Vauhkonen

Subjects: Image and Video Processing (eess.IV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Computational Physics (physics.comp-ph); Quantitative Methods (q-bio.QM)
[54] arXiv:2105.02471 (cross-list from eess.SP) [pdf, other]: Title: Signal Analysis via the Stochastic Geometry of Spectrogram Level Sets

Authors: Subhroshekhar Ghosh, Meixia Lin, Dongfang Sun

Journal-ref: IEEE Transactions on Signal Processing, Vol. 70, 2022

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS); Probability (math.PR); Statistics Theory (math.ST)
[55] arXiv:2105.02820 (cross-list from eess.SP) [pdf, other]: Title: Simulating the DFT Algorithm for Audio Processing

Authors: Omkar Deshpande, Kharanshu Solanki, Sree Pujitha Suribhatla, Sanya Zaveri, Luv Ghodasara

Comments: 9 pages, 16 figures (including plots)

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56] arXiv:2105.03809 (cross-list from eess.SP) [pdf, other]: Title: Superresolution photoacoustic tomography using random speckle illumination and second order moments

Authors: Osman Asif Malik, Venkatalakshmi Vyjayanthi Narumanchi, Stephen Becker, Todd W. Murray

Comments: 5 pages, 5 figures; accepted to WASPAA 2021

Journal-ref: 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021, pp. 141-145

Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)
[57] arXiv:2105.06700 (cross-list from eess.SP) [pdf, other]: Title: Nonuniform Sampling Rate Conversion: An Efficient Approach

Authors: Pablo Martínez-Nuevo

Comments: 10 pages, 10 figures, journal

Subjects: Signal Processing (eess.SP); Information Theory (cs.IT); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[58] arXiv:2105.00173 (cross-list from cs.SD) [pdf, ps, other]: Title: Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis Tool for Singers

Authors: Daniel Szelogowski

Comments: 26 pages, 10 figures, 6 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[59] arXiv:2105.00202 (cross-list from cs.LG) [pdf, other]: Title: One-shot learning for acoustic identification of bird species in non-stationary environments

Authors: Michelangelo Acconcjaioco, Stavros Ntalampiras

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2105.00335 (cross-list from cs.SD) [pdf, other]: Title: Audio Transformers:Transformer Architectures For Large Scale Audio Understanding. Adieu Convolutions

Authors: Prateek Verma, Jonathan Berger

Comments: 5 pages, 4 figures; Under review WASPAA 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[61] arXiv:2105.00573 (cross-list from cs.CL) [pdf, other]: Title: Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

Authors: Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe

Comments: NAACL 2021. All code and models are released as part of the ESPnet toolkit: this https URL

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[62] arXiv:2105.00609 (cross-list from cs.SD) [pdf, other]: Title: AvaTr: One-Shot Speaker Extraction with Transformers

Authors: Shell Xu Hu, Md Rifat Arefin, Viet-Nhat Nguyen, Alish Dipani, Xaq Pitkow, Andreas Savas Tolias

Comments: 6 pages, 4 main figures, 2 supplemental figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2105.00641 (cross-list from cs.MM) [pdf, other]: Title: Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction

Authors: Hanne Stenzel, Davide Berghi, Marco Volino, Philip J.B. Jackson

Comments: for dataset visit cvssp.org/data/navvs; accepted as poster in IEEE VR 2021

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64] arXiv:2105.00708 (cross-list from cs.SD) [pdf, other]: Title: Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Authors: Yan-Bo Lin, Yu-Chiang Frank Wang

Comments: AAAI'21

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[65] arXiv:2105.00812 (cross-list from cs.CL) [pdf, ps, other]: Title: Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

Authors: Jinchuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou

Comments: 5 pages, 3 figures, submit to Interspeech2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66] arXiv:2105.00899 (cross-list from cs.LG) [pdf, other]: Title: Fully Learnable Deep Wavelet Transform for Unsupervised Monitoring of High-Frequency Time Series

Authors: Gabriel Michau, Gaetan Frusque, Olga Fink

Comments: 16 pages, 7 figures, 3 tables

Journal-ref: PNAS 2022 Vol. 119 | No. 8

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:2105.00933 (cross-list from cs.SD) [src]: Title: Deep Neural Network for Musical Instrument Recognition using MFCCs

Authors: Saranga Kingkor Mahanta, Abdullah Faiz Ur Rahman Khilji, Partha Pakray

Comments: Was suggested to upload on a later date

Journal-ref: Computacion y Sistemas, Vol 25, No 2 (2021): 25(2) 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[68] arXiv:2105.00982 (cross-list from cs.CL) [pdf, other]: Title: On the limit of English conversational speech recognition

Authors: Zoltán Tüske, George Saon, Brian Kingsbury

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:2105.01051 (cross-list from cs.CL) [pdf, ps, other]: Title: SUPERB: Speech processing Universal PERformance Benchmark

Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

Comments: To appear in Interspeech 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:2105.01254 (cross-list from cs.SD) [pdf, other]: Title: Streaming end-to-end speech recognition with jointly trained neural feature enhancement

Authors: Chanwoo Kim, Abhinav Garg, Dhananjaya Gowda, Seongkyu Mun, Changwoo Han

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71] arXiv:2105.01531 (cross-list from cs.SD) [pdf, other]: Title: VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding

Authors: Javier Nistal, Cyran Aouameur, Stefan Lattner, Gaël Richard

Comments: 5 pages, 1 figure, 1 table; accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Journal-ref: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72] arXiv:2105.01570 (cross-list from q-bio.NC) [pdf, other]: Title: Simple and Cheap Setup for Timing Tapping Responses Synchronized to Auditory Stimuli

Authors: Martin Miguel, Pablo Riera, Diego Fernandez Slezak

Subjects: Neurons and Cognition (q-bio.NC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2105.01836 (cross-list from cs.SD) [pdf, ps, other]: Title: Acoustic Scene Classification Using Multichannel Observation with Partially Missing Channels

Authors: Keisuke Imoto

Comments: Accepted to EUSIPCO2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2105.01897 (cross-list from cs.SD) [pdf, other]: Title: Improved feature extraction for CRNN-based multiple sound source localization

Authors: Pierre-Amaury Grumiaux, Srdan Kitic, Laurent Girin, Alexandre Guérin

Comments: 5 pages, 2 figures. Accepted to EUSIPCO 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:2105.02096 (cross-list from cs.SD) [pdf, other]: Title: End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

Authors: Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey

Comments: 5 pages, 2 figures, ICASSP 2021

Journal-ref: ICASSP 2021, SPE-54.1

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

[ total of 161 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-161 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2406, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in May 2021, skipping first 50