References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Refine and Represent: Region-to-Object Representation Learning
(Submitted on 25 Aug 2022 (v1), last revised 20 Dec 2022 (this version, v2))
Abstract: Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives. In this paper, we present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining. R2O operates by training an encoder to dynamically refine region-based segments into object-centric masks and then jointly learns representations of the contents within the mask. R2O uses a "region refinement module" to group small image regions, generated using a region-level prior, into larger regions which tend to correspond to objects by clustering region-level features. As pretraining progresses, R2O follows a region-to-object curriculum which encourages learning region-level features early on and gradually progresses to train object-centric representations. Representations learned using R2O lead to state-of-the art performance in semantic segmentation for PASCAL VOC (+0.7 mIOU) and Cityscapes (+0.4 mIOU) and instance segmentation on MS COCO (+0.3 mask AP). Further, after pretraining on ImageNet, R2O pretrained models are able to surpass existing state-of-the-art in unsupervised object segmentation on the Caltech-UCSD Birds 200-2011 dataset (+2.9 mIoU) without any further training. We provide the code/models from this work at this https URL
Submission history
From: Shufan Li [view email][v1] Thu, 25 Aug 2022 01:44:28 GMT (3189kb,D)
[v2] Tue, 20 Dec 2022 23:36:52 GMT (5258kb,D)
Link back to: arXiv, form interface, contact.