Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2411.16503
Diffusion models have achieved impressive success in generating photorealistic images, but challenges remain in ensuring precise semantic alignment with input prompts. Optimizing the initial noisy latent offers a more efficient alternative to modifying model architectures or prompt engineering for improving semantic alignment. A latest approach, InitNo, refines the initial noisy latent by leveraging attention maps; however, these maps capture only limited information, and the effectiveness of InitNo is highly dependent on the initial starting point, as it tends to converge on a local optimum near this point. To this end, this paper proposes leveraging the language comprehension capabilities of large vision-language models (LVLMs) to guide the optimization of the initial noisy latent, and introduces the Noise Diffusion process, which updates the noisy latent to generate semantically faithful images while preserving distribution consistency. Furthermore, we provide a theoretical analysis of the condition under which the update improves semantic faithfulness. Experimental results demonstrate the effectiveness and adaptability of our framework, consistently enhancing semantic alignment across various diffusion models. The code is available at https://github.com/Bomingmiao/NoiseDiffusion.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2411.16503
- https://arxiv.org/pdf/2411.16503
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404987888
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404987888Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2411.16503Digital Object Identifier
- Title
-
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image SynthesisWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-11-25Full publication date if available
- Authors
-
Boming Miao, Chunxiao Li, Xiaoxiao Wang, Andi Zhang, Rui Sun, Zizhe Wang, Yao ZhuList of authors in order
- Landing page
-
https://arxiv.org/abs/2411.16503Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2411.16503Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2411.16503Direct OA link when available
- Concepts
-
Computer science, Image (mathematics), Noise (video), Diffusion, Artificial intelligence, Natural language processing, Computer vision, Information retrieval, Physics, ThermodynamicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404987888 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2411.16503 |
| ids.doi | https://doi.org/10.48550/arxiv.2411.16503 |
| ids.openalex | https://openalex.org/W4404987888 |
| fwci | |
| type | preprint |
| title | Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10824 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9092000126838684 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Image Retrieval and Classification Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.5502951145172119 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C115961682 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5484512448310852 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[1].display_name | Image (mathematics) |
| concepts[2].id | https://openalex.org/C99498987 |
| concepts[2].level | 3 |
| concepts[2].score | 0.5209696888923645 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2210247 |
| concepts[2].display_name | Noise (video) |
| concepts[3].id | https://openalex.org/C69357855 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5111272931098938 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q163214 |
| concepts[3].display_name | Diffusion |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.4473608136177063 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C204321447 |
| concepts[5].level | 1 |
| concepts[5].score | 0.41010576486587524 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[5].display_name | Natural language processing |
| concepts[6].id | https://openalex.org/C31972630 |
| concepts[6].level | 1 |
| concepts[6].score | 0.35579872131347656 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[6].display_name | Computer vision |
| concepts[7].id | https://openalex.org/C23123220 |
| concepts[7].level | 1 |
| concepts[7].score | 0.32849642634391785 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[7].display_name | Information retrieval |
| concepts[8].id | https://openalex.org/C121332964 |
| concepts[8].level | 0 |
| concepts[8].score | 0.10917580127716064 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[8].display_name | Physics |
| concepts[9].id | https://openalex.org/C97355855 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11473 |
| concepts[9].display_name | Thermodynamics |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.5502951145172119 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/image |
| keywords[1].score | 0.5484512448310852 |
| keywords[1].display_name | Image (mathematics) |
| keywords[2].id | https://openalex.org/keywords/noise |
| keywords[2].score | 0.5209696888923645 |
| keywords[2].display_name | Noise (video) |
| keywords[3].id | https://openalex.org/keywords/diffusion |
| keywords[3].score | 0.5111272931098938 |
| keywords[3].display_name | Diffusion |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.4473608136177063 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/natural-language-processing |
| keywords[5].score | 0.41010576486587524 |
| keywords[5].display_name | Natural language processing |
| keywords[6].id | https://openalex.org/keywords/computer-vision |
| keywords[6].score | 0.35579872131347656 |
| keywords[6].display_name | Computer vision |
| keywords[7].id | https://openalex.org/keywords/information-retrieval |
| keywords[7].score | 0.32849642634391785 |
| keywords[7].display_name | Information retrieval |
| keywords[8].id | https://openalex.org/keywords/physics |
| keywords[8].score | 0.10917580127716064 |
| keywords[8].display_name | Physics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2411.16503 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2411.16503 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2411.16503 |
| locations[1].id | doi:10.48550/arxiv.2411.16503 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2411.16503 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5011135352 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-8223-3056 |
| authorships[0].author.display_name | Boming Miao |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Miao, Boming |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100459277 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9959-5029 |
| authorships[1].author.display_name | Chunxiao Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Chunxiao |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100355063 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-5851-3525 |
| authorships[2].author.display_name | Xiaoxiao Wang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Wang, Xiaoxiao |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5077911588 |
| authorships[3].author.orcid | https://orcid.org/0009-0007-4855-5442 |
| authorships[3].author.display_name | Andi Zhang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhang, Andi |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5055899152 |
| authorships[4].author.orcid | https://orcid.org/0000-0001-6353-2643 |
| authorships[4].author.display_name | Rui Sun |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Sun, Rui |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5074368451 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-5716-3607 |
| authorships[5].author.display_name | Zizhe Wang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Wang, Zizhe |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5101567070 |
| authorships[6].author.orcid | https://orcid.org/0009-0000-6731-4475 |
| authorships[6].author.display_name | Yao Zhu |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Zhu, Yao |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2411.16503 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10824 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9092000126838684 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Image Retrieval and Classification Techniques |
| related_works | https://openalex.org/W2755342338, https://openalex.org/W2779427294, https://openalex.org/W2775347418, https://openalex.org/W2625805835, https://openalex.org/W2079911747, https://openalex.org/W3116076068, https://openalex.org/W3003936178, https://openalex.org/W2145652935, https://openalex.org/W2563206327, https://openalex.org/W2069885731 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2411.16503 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2411.16503 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2411.16503 |
| primary_location.id | pmh:oai:arXiv.org:2411.16503 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2411.16503 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2411.16503 |
| publication_date | 2024-11-25 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 42 |
| abstract_inverted_index.a | 27, 81, 135 |
| abstract_inverted_index.To | 87 |
| abstract_inverted_index.as | 75 |
| abstract_inverted_index.at | 170 |
| abstract_inverted_index.by | 51 |
| abstract_inverted_index.in | 6, 13 |
| abstract_inverted_index.is | 67, 168 |
| abstract_inverted_index.it | 76 |
| abstract_inverted_index.of | 65, 98, 107, 138, 155 |
| abstract_inverted_index.on | 70, 80 |
| abstract_inverted_index.or | 35 |
| abstract_inverted_index.to | 31, 78, 103, 123 |
| abstract_inverted_index.we | 133 |
| abstract_inverted_index.The | 166 |
| abstract_inverted_index.and | 62, 112, 153 |
| abstract_inverted_index.but | 10 |
| abstract_inverted_index.for | 38 |
| abstract_inverted_index.our | 156 |
| abstract_inverted_index.the | 22, 47, 63, 71, 94, 105, 108, 114, 120, 139, 143, 151 |
| abstract_inverted_index.code | 167 |
| abstract_inverted_index.end, | 89 |
| abstract_inverted_index.have | 2 |
| abstract_inverted_index.maps | 57 |
| abstract_inverted_index.more | 28 |
| abstract_inverted_index.near | 84 |
| abstract_inverted_index.only | 59 |
| abstract_inverted_index.this | 85, 88, 90 |
| abstract_inverted_index.with | 18 |
| abstract_inverted_index.Noise | 115 |
| abstract_inverted_index.guide | 104 |
| abstract_inverted_index.input | 19 |
| abstract_inverted_index.large | 99 |
| abstract_inverted_index.local | 82 |
| abstract_inverted_index.maps; | 54 |
| abstract_inverted_index.model | 33 |
| abstract_inverted_index.noisy | 24, 49, 110, 121 |
| abstract_inverted_index.paper | 91 |
| abstract_inverted_index.tends | 77 |
| abstract_inverted_index.these | 56 |
| abstract_inverted_index.under | 141 |
| abstract_inverted_index.which | 118, 142 |
| abstract_inverted_index.while | 128 |
| abstract_inverted_index.InitNo | 66 |
| abstract_inverted_index.across | 162 |
| abstract_inverted_index.highly | 68 |
| abstract_inverted_index.images | 127 |
| abstract_inverted_index.latent | 25, 50, 122 |
| abstract_inverted_index.latest | 43 |
| abstract_inverted_index.models | 1, 101 |
| abstract_inverted_index.offers | 26 |
| abstract_inverted_index.point, | 74 |
| abstract_inverted_index.point. | 86 |
| abstract_inverted_index.prompt | 36 |
| abstract_inverted_index.remain | 12 |
| abstract_inverted_index.update | 144 |
| abstract_inverted_index.(LVLMs) | 102 |
| abstract_inverted_index.InitNo, | 45 |
| abstract_inverted_index.capture | 58 |
| abstract_inverted_index.images, | 9 |
| abstract_inverted_index.initial | 23, 48, 72, 109 |
| abstract_inverted_index.latent, | 111 |
| abstract_inverted_index.limited | 60 |
| abstract_inverted_index.models. | 165 |
| abstract_inverted_index.optimum | 83 |
| abstract_inverted_index.precise | 15 |
| abstract_inverted_index.provide | 134 |
| abstract_inverted_index.refines | 46 |
| abstract_inverted_index.results | 149 |
| abstract_inverted_index.success | 5 |
| abstract_inverted_index.updates | 119 |
| abstract_inverted_index.various | 163 |
| abstract_inverted_index.achieved | 3 |
| abstract_inverted_index.analysis | 137 |
| abstract_inverted_index.converge | 79 |
| abstract_inverted_index.ensuring | 14 |
| abstract_inverted_index.faithful | 126 |
| abstract_inverted_index.generate | 124 |
| abstract_inverted_index.however, | 55 |
| abstract_inverted_index.improves | 145 |
| abstract_inverted_index.language | 95 |
| abstract_inverted_index.process, | 117 |
| abstract_inverted_index.prompts. | 20 |
| abstract_inverted_index.proposes | 92 |
| abstract_inverted_index.semantic | 16, 40, 146, 160 |
| abstract_inverted_index.starting | 73 |
| abstract_inverted_index.Diffusion | 0, 116 |
| abstract_inverted_index.alignment | 17, 161 |
| abstract_inverted_index.approach, | 44 |
| abstract_inverted_index.attention | 53 |
| abstract_inverted_index.available | 169 |
| abstract_inverted_index.condition | 140 |
| abstract_inverted_index.dependent | 69 |
| abstract_inverted_index.diffusion | 164 |
| abstract_inverted_index.efficient | 29 |
| abstract_inverted_index.enhancing | 159 |
| abstract_inverted_index.improving | 39 |
| abstract_inverted_index.modifying | 32 |
| abstract_inverted_index.Optimizing | 21 |
| abstract_inverted_index.alignment. | 41 |
| abstract_inverted_index.challenges | 11 |
| abstract_inverted_index.framework, | 157 |
| abstract_inverted_index.generating | 7 |
| abstract_inverted_index.impressive | 4 |
| abstract_inverted_index.introduces | 113 |
| abstract_inverted_index.leveraging | 52, 93 |
| abstract_inverted_index.preserving | 129 |
| abstract_inverted_index.alternative | 30 |
| abstract_inverted_index.demonstrate | 150 |
| abstract_inverted_index.engineering | 37 |
| abstract_inverted_index.theoretical | 136 |
| abstract_inverted_index.Experimental | 148 |
| abstract_inverted_index.Furthermore, | 132 |
| abstract_inverted_index.adaptability | 154 |
| abstract_inverted_index.capabilities | 97 |
| abstract_inverted_index.consistency. | 131 |
| abstract_inverted_index.consistently | 158 |
| abstract_inverted_index.distribution | 130 |
| abstract_inverted_index.information, | 61 |
| abstract_inverted_index.optimization | 106 |
| abstract_inverted_index.semantically | 125 |
| abstract_inverted_index.architectures | 34 |
| abstract_inverted_index.comprehension | 96 |
| abstract_inverted_index.effectiveness | 64, 152 |
| abstract_inverted_index.faithfulness. | 147 |
| abstract_inverted_index.photorealistic | 8 |
| abstract_inverted_index.vision-language | 100 |
| abstract_inverted_index.https://github.com/Bomingmiao/NoiseDiffusion. | 171 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |