Point Cloud Audio Processing

Subramani, Krishna; Smaragdis, Paris

doi:10.1109/WASPAA52581.2021.9632668

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2105

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Point Cloud Audio Processing

Authors: Krishna Subramani, Paris Smaragdis

(Submitted on 6 May 2021 (v1), last revised 29 Jul 2021 (this version, v2))

Abstract: Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT size or the sampling rate. Additionally, we observe that these methods result in smaller models, and allow us to significantly subsample the input representation with minimal effects to a trained model performance.

Comments:	Accepted at WASPAA 2021, Code: this https URL
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
DOI:	10.1109/WASPAA52581.2021.9632668
Cite as:	arXiv:2105.02469 [eess.AS]
	(or arXiv:2105.02469v2 [eess.AS] for this version)

Submission history

From: Krishna Subramani [view email]
[v1] Thu, 6 May 2021 07:04:59 GMT (301kb,D)
[v2] Thu, 29 Jul 2021 06:32:18 GMT (297kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2105.02469

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Point Cloud Audio Processing

Submission history