A Global-local Attention Framework for Weakly Labelled Audio Tagging

Wang, Helin; Zou, Yuexian; Wang, Wenwu

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2102

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: A Global-local Attention Framework for Weakly Labelled Audio Tagging

Authors: Helin Wang, Yuexian Zou, Wenwu Wang

(Submitted on 3 Feb 2021)

Abstract: Weakly labelled audio tagging aims to predict the classes of sound events within an audio clip, where the onset and offset times of the sound events are not provided. Previous works have used the multiple instance learning (MIL) framework, and exploited the information of the whole audio clip by MIL pooling functions. However, the detailed information of sound events such as their durations may not be considered under this framework. To address this issue, we propose a novel two-stream framework for audio tagging by exploiting the global and local information of sound events. The global stream aims to analyze the whole audio clip in order to capture the local clips that need to be attended using a class-wise selection module. These clips are then fed to the local stream to exploit the detailed information for a better decision. Experimental results on the AudioSet show that our proposed method can significantly improve the performance of audio tagging under different baseline network architectures.

Comments:	Accepted to ICASSP2021
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2102.01931 [eess.AS]
	(or arXiv:2102.01931v1 [eess.AS] for this version)

Submission history

From: Helin Wang [view email]
[v1] Wed, 3 Feb 2021 08:13:47 GMT (2854kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2102.01931

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: A Global-local Attention Framework for Weakly Labelled Audio Tagging

Submission history