Refine and Represent: Region-to-Object Representation Learning

Gokul, Akash; Kallidromitis, Konstantinos; Li, Shufan; Kato, Yusuke; Kozuka, Kazuki; Darrell, Trevor; Reed, Colorado J

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2208

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Refine and Represent: Region-to-Object Representation Learning

Authors: Akash Gokul, Konstantinos Kallidromitis, Shufan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell, Colorado J Reed

(Submitted on 25 Aug 2022 (v1), last revised 20 Dec 2022 (this version, v2))

Abstract: Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives. In this paper, we present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining. R2O operates by training an encoder to dynamically refine region-based segments into object-centric masks and then jointly learns representations of the contents within the mask. R2O uses a "region refinement module" to group small image regions, generated using a region-level prior, into larger regions which tend to correspond to objects by clustering region-level features. As pretraining progresses, R2O follows a region-to-object curriculum which encourages learning region-level features early on and gradually progresses to train object-centric representations. Representations learned using R2O lead to state-of-the art performance in semantic segmentation for PASCAL VOC (+0.7 mIOU) and Cityscapes (+0.4 mIOU) and instance segmentation on MS COCO (+0.3 mask AP). Further, after pretraining on ImageNet, R2O pretrained models are able to surpass existing state-of-the-art in unsupervised object segmentation on the Caltech-UCSD Birds 200-2011 dataset (+2.9 mIoU) without any further training. We provide the code/models from this work at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2208.11821 [cs.CV]
	(or arXiv:2208.11821v2 [cs.CV] for this version)

Submission history

From: Shufan Li [view email]
[v1] Thu, 25 Aug 2022 01:44:28 GMT (3189kb,D)
[v2] Tue, 20 Dec 2022 23:36:52 GMT (5258kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2208.11821

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Refine and Represent: Region-to-Object Representation Learning

Submission history