We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Point Cloud Audio Processing

Abstract: Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.
Comments: Accepted at WASPAA 2021, Code: this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
DOI: 10.1109/WASPAA52581.2021.9632668
Cite as: arXiv:2105.02469 [eess.AS]
  (or arXiv:2105.02469v2 [eess.AS] for this version)

Submission history

From: Krishna Subramani [view email]
[v1] Thu, 6 May 2021 07:04:59 GMT (301kb,D)
[v2] Thu, 29 Jul 2021 06:32:18 GMT (297kb,D)

Link back to: arXiv, form interface, contact.