AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

Wang, Xinzhou; Wang, Yikai; Ye, Junliang; Wang, Zhengyi; Sun, Fuchun; Liu, Pengkun; Wang, Ling; Sun, Kai; Wang, Xintong; He, Bin

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2312

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

Authors: Xinzhou Wang, Yikai Wang, Junliang Ye, Zhengyi Wang, Fuchun Sun, Pengkun Liu, Ling Wang, Kai Sun, Xintong Wang, Bin He

(Submitted on 6 Dec 2023 (this version), latest version 28 Mar 2024 (v3))

Abstract: Text-to-3D model adaptations have advanced static 3D model quality, but sequential 3D model generation, particularly for animatable objects with large motions, is still scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects while adhering to the object motions extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which simplifies the generation dimension from 4D to 3D by denoising over different frames in the time-varying camera spaces while conducting the distillation process in a unique canonical space shared per video. Concretely, CSD ensures that score gradients back-propagate to the canonical space through differentiable warping, hence guaranteeing the time-consistent generation and maintaining morphological plausibility across different poses. By lifting the 3D generator to 4D with warping functions, AnimatableDreamer offers a novel perspective on non-rigid 3D model generation and reconstruction. Besides, with inductive knowledge from a multi-view consistent diffusion model, CSD regularizes reconstruction from novel views, thus cyclically enhancing the generation process. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over typical non-rigid reconstruction methods. Project page this https URL

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
ACM classes:	I.4.5
Cite as:	arXiv:2312.03795 [cs.CV]
	(or arXiv:2312.03795v1 [cs.CV] for this version)

Submission history

From: Xinzhou Wang [view email]
[v1] Wed, 6 Dec 2023 14:13:54 GMT (23273kb,D)
[v2] Wed, 20 Dec 2023 07:52:24 GMT (23273kb,D)
[v3] Thu, 28 Mar 2024 09:40:08 GMT (29743kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2312.03795v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

Submission history