DreamComposer: Controllable 3D Object Generation
via Multi-View Conditions

CVPR 2024


Yunhan Yang1*, Yukun Huang1*, Xiaoyang Wu1, Yuan-Chen Guo3,4, Song-Hai Zhang4, Hengshuang Zhao1, Tong He2, Xihui Liu1

*Equal Contribution    1The University of Hong Kong    2Shanghai AI Laboratory     3VAST     4Tsinghua University

Abstract


We present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions.

Utilizing pre-trained 2D large-scale generative models, recent works are capable of generating high-quality novel views from a single in-the-wild image. However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views. In this paper, we present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions. Specifically, DreamComposer first uses a view-aware 3D lifting module to obtain 3D representations of an object from multiple views. Then, it renders the latent features of the target view from 3D representations with the multi-view feature fusion module. Finally the target view features extracted from multi-view inputs are injected into a pre-trained diffusion model. Experiments show that DreamComposer is compatible with state-of-the-art diffusion models for zero-shot novel view synthesis, further enhancing them to generate high-fidelity novel view images with multi-view conditions, ready for controllable 3D object reconstruction and various other applications.


Demo Video



Method


Given multiple input images from different views, DreamComposer extracts their 2D latent features and uses a 3D lifting module to produce tri-plane 3D representations. Then, the multi-view condition rendered from 3D representations is injected into the pre-trained diffusion model to provide target-view auxiliary information.


Results on the Google Scanned Object dataset


Qualitative comparison of 3D reconstruction results between SyncDreamer ("SyncD.") and our DC-SyncDreamer ("SyncD. + Ours") on the Google Scanned Object dataset, where the additional back-view input images (marked in ) are generated by Zero-1-to-3.


Results on the Objaverse dataset


Qualitative comparison of 3D reconstruction results between SyncDreamer ("SyncD.") and DC-SyncDreamer ("SyncD. + Ours") on the Objaverse dataset.


Ablation Analysis on the Number of Input Views


DreamComposer can handle arbitrary numbers of input views, and its controllability is strengthened with the increasing number of input views.


Application: Controllable 3D Editing


We present the results of (a) Personalize Editing with InstructPix2Pix; (b) Drag Editing with DragGAN, DragDiffusion; and (c) Color Editing.


Application: 3D Character Modeling


We present the NVS and reconstruction results of complex 3D characters from a few multi-view 2D paintings.


Citation


@article{yang2023dreamcomposer,
    title={DreamComposer: Controllable 3D Object Generation via Multi-View Conditions},
    author={Yang, Yunhan and Huang, Yukun and Wu, Xiaoyang and Guo, Yuan-Chen and Zhang, Song-Hai and Zhao, Hengshuang and He, Tong and Liu, Xihui},
    journal={arXiv preprint arXiv:2312.03611},
    year={2023}
}