Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2404.18065
In this paper, we propose an effective two-stage approach named Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts while achieving high fidelity by using a pre-trained multi-view diffusion model. Multi-view diffusion models, such as MVDream, have shown to generate high-fidelity 3D assets using score distillation sampling (SDS). However, applied naively, these methods often fail to comprehend compositional text prompts, and may often entirely omit certain subjects or parts. To address this issue, we first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline. We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation, without the necessity to re-train the multi-view diffusion model or craft a high-quality compositional 3D dataset. We further propose a hybrid optimization strategy to encourage synergy between the SDS loss and the sparse RGB reference images. Our method consistently outperforms previous state-of-the-art (SOTA) methods in generating compositional 3D assets, excelling in both quality and accuracy, and enabling diverse 3D from the same text prompt.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2404.18065
- https://arxiv.org/pdf/2404.18065
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4396819684
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4396819684Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2404.18065Digital Object Identifier
- Title
-
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion ModelWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-04-28Full publication date if available
- Authors
-
Xiaolong Li, Jiawei Mo, Ying Wang, Chethan M. Parameshwara, Xiaohan Fei, Ashwin Swaminathan, Chris Taylor, Zhuowen Tu, Paolo Favaro, Stefano SoattoList of authors in order
- Landing page
-
https://arxiv.org/abs/2404.18065Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2404.18065Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2404.18065Direct OA link when available
- Concepts
-
Diffusion, Computer science, Natural language processing, Physics, ThermodynamicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4396819684 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2404.18065 |
| ids.doi | https://doi.org/10.48550/arxiv.2404.18065 |
| ids.openalex | https://openalex.org/W4396819684 |
| fwci | |
| type | preprint |
| title | Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T14339 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.814300000667572 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Image Processing and 3D Reconstruction |
| topics[1].id | https://openalex.org/T10719 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.7095000147819519 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2206 |
| topics[1].subfield.display_name | Computational Mechanics |
| topics[1].display_name | 3D Shape Modeling and Analysis |
| topics[2].id | https://openalex.org/T10824 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.6653000116348267 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Image Retrieval and Classification Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C69357855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6120791435241699 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q163214 |
| concepts[0].display_name | Diffusion |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5238237977027893 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C204321447 |
| concepts[2].level | 1 |
| concepts[2].score | 0.32265186309814453 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[2].display_name | Natural language processing |
| concepts[3].id | https://openalex.org/C121332964 |
| concepts[3].level | 0 |
| concepts[3].score | 0.07975718379020691 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[3].display_name | Physics |
| concepts[4].id | https://openalex.org/C97355855 |
| concepts[4].level | 1 |
| concepts[4].score | 0.0532512366771698 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11473 |
| concepts[4].display_name | Thermodynamics |
| keywords[0].id | https://openalex.org/keywords/diffusion |
| keywords[0].score | 0.6120791435241699 |
| keywords[0].display_name | Diffusion |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5238237977027893 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/natural-language-processing |
| keywords[2].score | 0.32265186309814453 |
| keywords[2].display_name | Natural language processing |
| keywords[3].id | https://openalex.org/keywords/physics |
| keywords[3].score | 0.07975718379020691 |
| keywords[3].display_name | Physics |
| keywords[4].id | https://openalex.org/keywords/thermodynamics |
| keywords[4].score | 0.0532512366771698 |
| keywords[4].display_name | Thermodynamics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2404.18065 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2404.18065 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2404.18065 |
| locations[1].id | doi:10.48550/arxiv.2404.18065 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2404.18065 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100371548 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-6111-9000 |
| authorships[0].author.display_name | Xiaolong Li |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Li, Xiaolong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5087799311 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1043-5353 |
| authorships[1].author.display_name | Jiawei Mo |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Mo, Jiawei |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100713924 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-7063-0070 |
| authorships[2].author.display_name | Ying Wang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Ying |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5075055783 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Chethan M. Parameshwara |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Parameshwara, Chethan |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5087030407 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-1030-2286 |
| authorships[4].author.display_name | Xiaohan Fei |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Fei, Xiaohan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5084355679 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-4279-369X |
| authorships[5].author.display_name | Ashwin Swaminathan |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Swaminathan, Ashwin |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5067224766 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-7867-9533 |
| authorships[6].author.display_name | Chris Taylor |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Taylor, CJ |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5001760915 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-1900-2124 |
| authorships[7].author.display_name | Zhuowen Tu |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Tu, Zhuowen |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5070940574 |
| authorships[8].author.orcid | https://orcid.org/0000-0003-3546-8247 |
| authorships[8].author.display_name | Paolo Favaro |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Favaro, Paolo |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5038328783 |
| authorships[9].author.orcid | https://orcid.org/0000-0003-2902-6362 |
| authorships[9].author.display_name | Stefano Soatto |
| authorships[9].author_position | last |
| authorships[9].raw_author_name | Soatto, Stefano |
| authorships[9].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2404.18065 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T14339 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.814300000667572 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Image Processing and 3D Reconstruction |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2404.18065 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2404.18065 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2404.18065 |
| primary_location.id | pmh:oai:arXiv.org:2404.18065 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2404.18065 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2404.18065 |
| publication_date | 2024-04-28 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 29, 115, 123 |
| abstract_inverted_index.3D | 13, 45, 118, 151, 162 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.To | 73 |
| abstract_inverted_index.We | 91, 120 |
| abstract_inverted_index.an | 5, 94 |
| abstract_inverted_index.as | 38, 84 |
| abstract_inverted_index.by | 27 |
| abstract_inverted_index.in | 87, 148, 154 |
| abstract_inverted_index.or | 71, 113 |
| abstract_inverted_index.to | 11, 42, 59, 98, 107, 127 |
| abstract_inverted_index.we | 3, 77 |
| abstract_inverted_index.Our | 140 |
| abstract_inverted_index.RGB | 137 |
| abstract_inverted_index.SDS | 132 |
| abstract_inverted_index.and | 64, 134, 157, 159 |
| abstract_inverted_index.can | 16 |
| abstract_inverted_index.may | 65 |
| abstract_inverted_index.the | 85, 88, 105, 109, 131, 135, 164 |
| abstract_inverted_index.both | 155 |
| abstract_inverted_index.fail | 58 |
| abstract_inverted_index.from | 163 |
| abstract_inverted_index.have | 40 |
| abstract_inverted_index.high | 25 |
| abstract_inverted_index.loss | 133 |
| abstract_inverted_index.omit | 68 |
| abstract_inverted_index.same | 165 |
| abstract_inverted_index.such | 37 |
| abstract_inverted_index.text | 21, 62, 166 |
| abstract_inverted_index.that | 15 |
| abstract_inverted_index.then | 92 |
| abstract_inverted_index.this | 1, 75 |
| abstract_inverted_index.craft | 114 |
| abstract_inverted_index.first | 78 |
| abstract_inverted_index.image | 102 |
| abstract_inverted_index.model | 112 |
| abstract_inverted_index.named | 9 |
| abstract_inverted_index.often | 57, 66 |
| abstract_inverted_index.score | 48 |
| abstract_inverted_index.shown | 41 |
| abstract_inverted_index.these | 55 |
| abstract_inverted_index.using | 28, 47 |
| abstract_inverted_index.while | 23 |
| abstract_inverted_index.(SDS). | 51 |
| abstract_inverted_index.(SOTA) | 146 |
| abstract_inverted_index.4-view | 82, 101 |
| abstract_inverted_index.assets | 14, 46 |
| abstract_inverted_index.follow | 18 |
| abstract_inverted_index.hybrid | 124 |
| abstract_inverted_index.images | 83 |
| abstract_inverted_index.issue, | 76 |
| abstract_inverted_index.method | 141 |
| abstract_inverted_index.model. | 33 |
| abstract_inverted_index.paper, | 2 |
| abstract_inverted_index.parts. | 72 |
| abstract_inverted_index.sparse | 136 |
| abstract_inverted_index.address | 74 |
| abstract_inverted_index.applied | 53 |
| abstract_inverted_index.assets, | 152 |
| abstract_inverted_index.between | 130 |
| abstract_inverted_index.certain | 69 |
| abstract_inverted_index.diverse | 161 |
| abstract_inverted_index.further | 121 |
| abstract_inverted_index.images. | 139 |
| abstract_inverted_index.methods | 56, 147 |
| abstract_inverted_index.models, | 36 |
| abstract_inverted_index.prompt. | 167 |
| abstract_inverted_index.prompts | 22 |
| abstract_inverted_index.propose | 4, 122 |
| abstract_inverted_index.quality | 156 |
| abstract_inverted_index.synergy | 129 |
| abstract_inverted_index.without | 104 |
| abstract_inverted_index.However, | 52 |
| abstract_inverted_index.MVDream, | 39 |
| abstract_inverted_index.advocate | 79 |
| abstract_inverted_index.approach | 8 |
| abstract_inverted_index.complex, | 19 |
| abstract_inverted_index.dataset. | 119 |
| abstract_inverted_index.enabling | 160 |
| abstract_inverted_index.entirely | 67 |
| abstract_inverted_index.fidelity | 26 |
| abstract_inverted_index.generate | 12, 43 |
| abstract_inverted_index.naively, | 54 |
| abstract_inverted_index.previous | 144 |
| abstract_inverted_index.prompts, | 63 |
| abstract_inverted_index.re-train | 108 |
| abstract_inverted_index.sampling | 50 |
| abstract_inverted_index.strategy | 126 |
| abstract_inverted_index.subjects | 70 |
| abstract_inverted_index.accuracy, | 158 |
| abstract_inverted_index.achieving | 24 |
| abstract_inverted_index.attention | 95 |
| abstract_inverted_index.diffusion | 32, 35, 111 |
| abstract_inverted_index.effective | 6 |
| abstract_inverted_index.encourage | 99, 128 |
| abstract_inverted_index.excelling | 153 |
| abstract_inverted_index.introduce | 93 |
| abstract_inverted_index.mechanism | 97 |
| abstract_inverted_index.necessity | 106 |
| abstract_inverted_index.pipeline. | 90 |
| abstract_inverted_index.reference | 138 |
| abstract_inverted_index.two-stage | 7 |
| abstract_inverted_index.Multi-view | 34 |
| abstract_inverted_index.accurately | 17 |
| abstract_inverted_index.bottleneck | 86 |
| abstract_inverted_index.comprehend | 60 |
| abstract_inverted_index.generating | 149 |
| abstract_inverted_index.leveraging | 80 |
| abstract_inverted_index.multi-view | 31, 110 |
| abstract_inverted_index.refocusing | 96 |
| abstract_inverted_index.text-to-3D | 89 |
| abstract_inverted_index.generation, | 103 |
| abstract_inverted_index.outperforms | 143 |
| abstract_inverted_index.pre-trained | 30 |
| abstract_inverted_index.text-guided | 81 |
| abstract_inverted_index.consistently | 142 |
| abstract_inverted_index.distillation | 49 |
| abstract_inverted_index.high-quality | 116 |
| abstract_inverted_index.optimization | 125 |
| abstract_inverted_index.text-aligned | 100 |
| abstract_inverted_index.compositional | 20, 61, 117, 150 |
| abstract_inverted_index.high-fidelity | 44 |
| abstract_inverted_index.Grounded-Dreamer | 10 |
| abstract_inverted_index.state-of-the-art | 145 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 10 |
| citation_normalized_percentile |