Current browse context:
eess.AS
Change to browse by:
References & Citations
Electrical Engineering and Systems Science > Audio and Speech Processing
Title: Point Cloud Audio Processing
(Submitted on 6 May 2021 (v1), last revised 29 Jul 2021 (this version, v2))
Abstract: Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.
Submission history
From: Krishna Subramani [view email][v1] Thu, 6 May 2021 07:04:59 GMT (301kb,D)
[v2] Thu, 29 Jul 2021 06:32:18 GMT (297kb,D)
Link back to: arXiv, form interface, contact.