Exploring Missing Modality in Multimodal Egocentric Datasets

Ramazanova, Merey; Pardo, Alejandro; Alwassel, Humam; Ghanem, Bernard

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2401

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Exploring Missing Modality in Multimodal Egocentric Datasets

Authors: Merey Ramazanova, Alejandro Pardo, Humam Alwassel, Bernard Ghanem

(Submitted on 21 Jan 2024 (v1), last revised 17 Apr 2024 (this version, v2))

Abstract: Multimodal video understanding is crucial for analyzing egocentric videos, where integrating multiple sensory signals significantly enhances action recognition and moment localization. However, practical applications often grapple with incomplete modalities due to factors like privacy concerns, efficiency demands, or hardware malfunctions. Addressing this, our study delves into the impact of missing modalities on egocentric action recognition, particularly within transformer-based models. We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent, a strategy that proves effective in the Ego4D, Epic-Kitchens, and Epic-Sounds datasets. Our method mitigates the performance loss, reducing it from its original $\sim 30\%$ drop to only $\sim 10\%$ when half of the test set is modal-incomplete. Through extensive experimentation, we demonstrate the adaptability of MMT to different training scenarios and its superiority in handling missing modalities compared to current methods. Our research contributes a comprehensive analysis and an innovative approach, opening avenues for more resilient multimodal systems in real-world settings.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.11470 [cs.CV]
	(or arXiv:2401.11470v2 [cs.CV] for this version)

Submission history

From: Merey Ramazanova [view email]
[v1] Sun, 21 Jan 2024 11:55:42 GMT (455kb,D)
[v2] Wed, 17 Apr 2024 13:25:38 GMT (565kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2401.11470

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Exploring Missing Modality in Multimodal Egocentric Datasets

Submission history