Learning to Visually Connect Actions and their Effects

Peh, Eric; Parmar, Paritosh; Fernando, Basura

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2401

Computer Science > Computer Vision and Pattern Recognition

Title: Learning to Visually Connect Actions and their Effects

Authors: Eric Peh, Paritosh Parmar, Basura Fernando

(Submitted on 19 Jan 2024 (v1), last revised 26 Apr 2024 (this version, v2))

Abstract: In this work, we introduce the novel concept of visually Connecting Actions and Their Effects (CATE) in video understanding. CATE can have applications in areas like task planning and learning from demonstration. We identify and explore two different aspects of the concept of CATE: Action Selection and Effect-Affinity Assessment, where video understanding models connect actions and effects at semantic and fine-grained levels, respectively. We observe that different formulations produce representations capturing intuitive action properties. We also design various baseline models for Action Selection and Effect-Affinity Assessment. Despite the intuitive nature of the task, we observe that models struggle, and humans outperform them by a large margin. The study aims to establish a foundation for future efforts, showcasing the flexibility and versatility of connecting actions and effects in video understanding, with the hope of inspiring advanced formulations and models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2401.10805 [cs.CV]
	(or arXiv:2401.10805v2 [cs.CV] for this version)

Submission history

From: Paritosh Parmar [view email]
[v1] Fri, 19 Jan 2024 16:48:49 GMT (3827kb,D)
[v2] Fri, 26 Apr 2024 17:59:51 GMT (4643kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2401.10805

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Learning to Visually Connect Actions and their Effects

Submission history