Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

Xiao, Xiongye; Liu, Gengshuo; Gupta, Gaurav; Cao, Defu; Li, Shixuan; Li, Yaxing; Fang, Tianqing; Cheng, Mingxi; Bogdan, Paul

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2404

Change to browse by:

Computer Science > Machine Learning

Title: Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

(Submitted on 15 Apr 2024 (v1), last revised 22 Apr 2024 (this version, v2))

Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most traditional fusion models that incorporate all modalities identically in neural networks, our model designates a prime modality and regards the remaining modalities as detectors in the information pathway, serving to distill the flow of information. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of multimodal representation learning. Experimental evaluations on the MUStARD, CMU-MOSI, and CMU-MOSEI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks. Remarkably, on the CMU-MOSI dataset, ITHP surpasses human-level performance in the multimodal sentiment binary classification task across all evaluation metrics (i.e., Binary Accuracy, F1 Score, Mean Absolute Error, and Pearson Correlation).

Comments:	Accepted by ICLR 2024. Camera Ready Version
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2404.09403 [cs.LG]
	(or arXiv:2404.09403v2 [cs.LG] for this version)

Submission history

From: Xiongye Xiao [view email]
[v1] Mon, 15 Apr 2024 01:34:44 GMT (14817kb,D)
[v2] Mon, 22 Apr 2024 20:50:53 GMT (14817kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.09403

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

Submission history