Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Kim, Taehyeong; Song, Hyeonseop; Zhang, Byoung-Tak

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2208

Computer Science > Computer Vision and Pattern Recognition

Title: Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Authors: Taehyeong Kim, Hyeonseop Song, Byoung-Tak Zhang

(Submitted on 31 Jul 2022)

Abstract: Human infants learn the names of objects and develop their own conceptual systems without explicit supervision. In this study, we propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms. The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks. Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks. It allows entities of different modalities with conceptually the same meaning to have similar semantic representation vectors. We quantitatively and qualitatively evaluate our method, including object-to-word mapping and zero-shot learning tasks, showing that the proposed model significantly outperforms the baselines and that each conceptual system is topologically aligned.

Comments:	19 pages, 4 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2208.01744 [cs.CV]
	(or arXiv:2208.01744v1 [cs.CV] for this version)

Submission history

From: Taehyeong Kim [view email]
[v1] Sun, 31 Jul 2022 08:39:53 GMT (1212kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2208.01744

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Cross-Modal Alignment Learning of Vision-Language Conceptual Systems

Submission history