arXiv (Cornell University)
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model
April 2024 • Xiaolong Li, Jiawei Mo, Ying Wang, Chethan M. Parameshwara, Xiaohan Fei, Ashwin Swaminathan, Chris Taylor, Zhuowen Tu, Paolo Favaro, Stefano Soatto
In this paper, we propose an effective two-stage approach named Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts while achieving high fidelity by using a pre-trained multi-view diffusion model. Multi-view diffusion models, such as MVDream, have shown to generate high-fidelity 3D assets using score distillation sampling (SDS). However, applied naively, these methods often fail to comprehend compositional text prompts, and may often entirely omit certain subjects …