Icon Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

Zhangyang Qi1,2*, Yunhan Yang1*, Mengchen Zhang2, Long Xing3, Xiaoyang Wu1, Tong Wu2,3, Dahua Lin2,3, Xihui Liu1, Jiaqi Wang2✉, Hengshuang Zhao1

* Equation Contribution, Corresponding Authors,

1 The University of Hong Kong, 2 Shanghai AI Laboratory, 3 The Chinese University of Hong Kong,

Abstract and Pipeline

Recent advances in 3D AIGC show promise for creating 3D objects directly from text and images, cutting costs in animation and product design. However, detailed customization of 3D assets remains challenging compared to their 2D counterparts.

To address this, we propose Tailor3D, a novel pipeline creating customized 3D assets from editable dual-side images and feed-forward reconstruction methods. This approach mimics a tailor's local object changes and style transfers:

1) Use image editing methods to edit the front-view image. The front-view image can be provided or generated from text.
2) Use multi-view diffusion techniques (e.g., Zero-1-to-3) to generate the back view of the object.
3) Use image editing methods to edit the back-view image.
4) Use our proposed Dual-sided LRM (large reconstruction model) to seamlessly combine front and back images and get the customized 3D asset.

Each step takes only a few seconds, allowing users to interactively obtain the 3D objects they desire. Experimental results show Tailor3D's effectiveness in 3D generative fill and style transfer, providing an efficient solution for 3D asset editing.

Description of the image

Method Architecture of Dual-sided LRM

Explanation Overview

Note that we have already edited the front and back images. And we have three steps to get the final 3D object.

Step 1: We use the same image encoder (DINO-v2) to get the front-view and back-view image features.
Step 2: The two image features are processed separately using the LoRA Triplane Transformer but share the same front-view camera extrinsics..
Step 3: After obtaining two triplanes, we 'tailor' the two triplane features through rotation and Viewpoint Cross-Attention to obtain the 3D object.

3D Generative Geometry/Pattern Fill


3D Style Transfer and Fusion


Citation


                      @misc{qi2024tailor3dcustomized3dassets,
                        title={Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images}, 
                        author={Zhangyang Qi and Yunhan Yang and Mengchen Zhang and Long Xing and Xiaoyang Wu and Tong Wu and Dahua Lin and Xihui Liu and Jiaqi Wang and Hengshuang Zhao},
                        year={2024},
                        eprint={2407.06191},
                        archivePrefix={arXiv},
                        primaryClass={cs.CV},
                        url={https://arxiv.org/abs/2407.06191}, 
                  }