ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

Tran, Minh; Bounsavy, Winston; Vo, Khoa; Nguyen, Anh; Nguyen, Tri; Le, Ngan

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2403

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

Authors: Minh Tran, Winston Bounsavy, Khoa Vo, Anh Nguyen, Tri Nguyen, Ngan Le

(Submitted on 18 Mar 2024 (v1), last revised 17 Apr 2024 (this version, v4))

Abstract: Amodal Instance Segmentation (AIS) presents a challenging task as it involves predicting both visible and occluded parts of objects within images. Existing AIS methods rely on a bidirectional approach, encompassing both the transition from amodal features to visible features (amodal-to-visible) and from visible features to amodal features (visible-to-amodal). Our observation shows that the utilization of amodal features through the amodal-to-visible can confuse the visible features due to the extra information of occluded/hidden segments not presented in visible display. Consequently, this compromised quality of visible features during the subsequent visible-to-amodal transition. To tackle this issue, we introduce ShapeFormer, a decoupled Transformer-based model with a visible-to-amodal transition. It facilitates the explicit relationship between output segmentations and avoids the need for amodal-to-visible transitions. ShapeFormer comprises three key modules: (i) Visible-Occluding Mask Head for predicting visible segmentation with occlusion awareness, (ii) Shape-Prior Amodal Mask Head for predicting amodal and occluded masks, and (iii) Category-Specific Shape Prior Retriever aims to provide shape prior knowledge. Comprehensive experiments and extensive ablation studies across various AIS benchmarks demonstrate the effectiveness of our ShapeFormer. The code is available at: \url{this https URL}

Comments:	Accepted to IJCNN2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2403.11376 [cs.CV]
	(or arXiv:2403.11376v4 [cs.CV] for this version)

Submission history

From: Minh Tran [view email]
[v1] Mon, 18 Mar 2024 00:03:48 GMT (45445kb,D)
[v2] Fri, 22 Mar 2024 14:25:14 GMT (9851kb,D)
[v3] Sat, 13 Apr 2024 20:42:17 GMT (10427kb,D)
[v4] Wed, 17 Apr 2024 16:46:02 GMT (10427kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.11376

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

Submission history