Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2404.07389
Text-to-image diffusion models have shown great success in generating high-quality text-guided images. Yet, these models may still fail to semantically align generated images with the provided text prompts, leading to problems like incorrect attribute binding and/or catastrophic object neglect. Given the pervasive object-oriented structure underlying text prompts, we introduce a novel object-conditioned Energy-Based Attention Map Alignment (EBAMA) method to address the aforementioned problems. We show that an object-centric attribute binding loss naturally emerges by approximately maximizing the log-likelihood of a $z$-parameterized energy-based model with the help of the negative sampling technique. We further propose an object-centric intensity regularizer to prevent excessive shifts of objects attention towards their attributes. Extensive qualitative and quantitative experiments, including human evaluation, on several challenging benchmarks demonstrate the superior performance of our method over previous strong counterparts. With better aligned attention maps, our approach shows great promise in further enhancing the text-controlled image editing ability of diffusion models.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2404.07389
- https://arxiv.org/pdf/2404.07389
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4394780873
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4394780873Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2404.07389Digital Object Identifier
- Title
-
Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-04-10Full publication date if available
- Authors
-
Yasi Zhang, Peiyu Yu, Ying WuList of authors in order
- Landing page
-
https://arxiv.org/abs/2404.07389Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2404.07389Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2404.07389Direct OA link when available
- Concepts
-
Object (grammar), Computer science, Parameterized complexity, Image (mathematics), Artificial intelligence, Energy (signal processing), Quality (philosophy), Computer vision, Pattern recognition (psychology), Machine learning, Data mining, Theoretical computer science, Algorithm, Mathematics, Statistics, Philosophy, EpistemologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4394780873 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2404.07389 |
| ids.doi | https://doi.org/10.48550/arxiv.2404.07389 |
| ids.openalex | https://openalex.org/W4394780873 |
| fwci | |
| type | preprint |
| title | Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10775 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9945999979972839 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Generative Adversarial Networks and Image Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2781238097 |
| concepts[0].level | 2 |
| concepts[0].score | 0.776227593421936 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[0].display_name | Object (grammar) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7451872825622559 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C165464430 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6856979131698608 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1570441 |
| concepts[2].display_name | Parameterized complexity |
| concepts[3].id | https://openalex.org/C115961682 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6089908480644226 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[3].display_name | Image (mathematics) |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.5732265710830688 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C186370098 |
| concepts[5].level | 2 |
| concepts[5].score | 0.5230982303619385 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q442787 |
| concepts[5].display_name | Energy (signal processing) |
| concepts[6].id | https://openalex.org/C2779530757 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4129025340080261 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1207505 |
| concepts[6].display_name | Quality (philosophy) |
| concepts[7].id | https://openalex.org/C31972630 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4105801582336426 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[7].display_name | Computer vision |
| concepts[8].id | https://openalex.org/C153180895 |
| concepts[8].level | 2 |
| concepts[8].score | 0.34804266691207886 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[8].display_name | Pattern recognition (psychology) |
| concepts[9].id | https://openalex.org/C119857082 |
| concepts[9].level | 1 |
| concepts[9].score | 0.3335055708885193 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[9].display_name | Machine learning |
| concepts[10].id | https://openalex.org/C124101348 |
| concepts[10].level | 1 |
| concepts[10].score | 0.3331141471862793 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[10].display_name | Data mining |
| concepts[11].id | https://openalex.org/C80444323 |
| concepts[11].level | 1 |
| concepts[11].score | 0.320550799369812 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q2878974 |
| concepts[11].display_name | Theoretical computer science |
| concepts[12].id | https://openalex.org/C11413529 |
| concepts[12].level | 1 |
| concepts[12].score | 0.23808503150939941 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[12].display_name | Algorithm |
| concepts[13].id | https://openalex.org/C33923547 |
| concepts[13].level | 0 |
| concepts[13].score | 0.1378127932548523 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[13].display_name | Mathematics |
| concepts[14].id | https://openalex.org/C105795698 |
| concepts[14].level | 1 |
| concepts[14].score | 0.10726672410964966 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[14].display_name | Statistics |
| concepts[15].id | https://openalex.org/C138885662 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[15].display_name | Philosophy |
| concepts[16].id | https://openalex.org/C111472728 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[16].display_name | Epistemology |
| keywords[0].id | https://openalex.org/keywords/object |
| keywords[0].score | 0.776227593421936 |
| keywords[0].display_name | Object (grammar) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7451872825622559 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/parameterized-complexity |
| keywords[2].score | 0.6856979131698608 |
| keywords[2].display_name | Parameterized complexity |
| keywords[3].id | https://openalex.org/keywords/image |
| keywords[3].score | 0.6089908480644226 |
| keywords[3].display_name | Image (mathematics) |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.5732265710830688 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/energy |
| keywords[5].score | 0.5230982303619385 |
| keywords[5].display_name | Energy (signal processing) |
| keywords[6].id | https://openalex.org/keywords/quality |
| keywords[6].score | 0.4129025340080261 |
| keywords[6].display_name | Quality (philosophy) |
| keywords[7].id | https://openalex.org/keywords/computer-vision |
| keywords[7].score | 0.4105801582336426 |
| keywords[7].display_name | Computer vision |
| keywords[8].id | https://openalex.org/keywords/pattern-recognition |
| keywords[8].score | 0.34804266691207886 |
| keywords[8].display_name | Pattern recognition (psychology) |
| keywords[9].id | https://openalex.org/keywords/machine-learning |
| keywords[9].score | 0.3335055708885193 |
| keywords[9].display_name | Machine learning |
| keywords[10].id | https://openalex.org/keywords/data-mining |
| keywords[10].score | 0.3331141471862793 |
| keywords[10].display_name | Data mining |
| keywords[11].id | https://openalex.org/keywords/theoretical-computer-science |
| keywords[11].score | 0.320550799369812 |
| keywords[11].display_name | Theoretical computer science |
| keywords[12].id | https://openalex.org/keywords/algorithm |
| keywords[12].score | 0.23808503150939941 |
| keywords[12].display_name | Algorithm |
| keywords[13].id | https://openalex.org/keywords/mathematics |
| keywords[13].score | 0.1378127932548523 |
| keywords[13].display_name | Mathematics |
| keywords[14].id | https://openalex.org/keywords/statistics |
| keywords[14].score | 0.10726672410964966 |
| keywords[14].display_name | Statistics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2404.07389 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2404.07389 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2404.07389 |
| locations[1].id | doi:10.48550/arxiv.2404.07389 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2404.07389 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5070817221 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Yasi Zhang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhang, Yasi |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5025863269 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Peiyu Yu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yu, Peiyu |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5101780958 |
| authorships[2].author.orcid | https://orcid.org/0009-0001-6768-5118 |
| authorships[2].author.display_name | Ying Wu |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Wu, Ying Nian |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2404.07389 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-04-13T00:00:00 |
| display_name | Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10775 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9945999979972839 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Generative Adversarial Networks and Image Synthesis |
| related_works | https://openalex.org/W2051058708, https://openalex.org/W1494268238, https://openalex.org/W154868527, https://openalex.org/W1983207144, https://openalex.org/W2490706771, https://openalex.org/W2480116122, https://openalex.org/W4255576661, https://openalex.org/W1516574938, https://openalex.org/W2053685668, https://openalex.org/W2914625303 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2404.07389 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2404.07389 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2404.07389 |
| primary_location.id | pmh:oai:arXiv.org:2404.07389 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2404.07389 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2404.07389 |
| publication_date | 2024-04-10 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 49, 79 |
| abstract_inverted_index.We | 63, 91 |
| abstract_inverted_index.an | 66, 94 |
| abstract_inverted_index.by | 73 |
| abstract_inverted_index.in | 7, 141 |
| abstract_inverted_index.of | 78, 86, 102, 124, 149 |
| abstract_inverted_index.on | 116 |
| abstract_inverted_index.to | 18, 29, 58, 98 |
| abstract_inverted_index.we | 47 |
| abstract_inverted_index.Map | 54 |
| abstract_inverted_index.and | 110 |
| abstract_inverted_index.may | 15 |
| abstract_inverted_index.our | 125, 136 |
| abstract_inverted_index.the | 24, 40, 60, 76, 84, 87, 121, 144 |
| abstract_inverted_index.With | 131 |
| abstract_inverted_index.Yet, | 12 |
| abstract_inverted_index.fail | 17 |
| abstract_inverted_index.have | 3 |
| abstract_inverted_index.help | 85 |
| abstract_inverted_index.like | 31 |
| abstract_inverted_index.loss | 70 |
| abstract_inverted_index.over | 127 |
| abstract_inverted_index.show | 64 |
| abstract_inverted_index.text | 26, 45 |
| abstract_inverted_index.that | 65 |
| abstract_inverted_index.with | 23, 83 |
| abstract_inverted_index.Given | 39 |
| abstract_inverted_index.align | 20 |
| abstract_inverted_index.great | 5, 139 |
| abstract_inverted_index.human | 114 |
| abstract_inverted_index.image | 146 |
| abstract_inverted_index.maps, | 135 |
| abstract_inverted_index.model | 82 |
| abstract_inverted_index.novel | 50 |
| abstract_inverted_index.shown | 4 |
| abstract_inverted_index.shows | 138 |
| abstract_inverted_index.still | 16 |
| abstract_inverted_index.their | 106 |
| abstract_inverted_index.these | 13 |
| abstract_inverted_index.and/or | 35 |
| abstract_inverted_index.better | 132 |
| abstract_inverted_index.images | 22 |
| abstract_inverted_index.method | 57, 126 |
| abstract_inverted_index.models | 2, 14 |
| abstract_inverted_index.object | 37 |
| abstract_inverted_index.shifts | 101 |
| abstract_inverted_index.strong | 129 |
| abstract_inverted_index.(EBAMA) | 56 |
| abstract_inverted_index.ability | 148 |
| abstract_inverted_index.address | 59 |
| abstract_inverted_index.aligned | 133 |
| abstract_inverted_index.binding | 34, 69 |
| abstract_inverted_index.editing | 147 |
| abstract_inverted_index.emerges | 72 |
| abstract_inverted_index.further | 92, 142 |
| abstract_inverted_index.images. | 11 |
| abstract_inverted_index.leading | 28 |
| abstract_inverted_index.models. | 151 |
| abstract_inverted_index.objects | 103 |
| abstract_inverted_index.prevent | 99 |
| abstract_inverted_index.promise | 140 |
| abstract_inverted_index.propose | 93 |
| abstract_inverted_index.several | 117 |
| abstract_inverted_index.success | 6 |
| abstract_inverted_index.towards | 105 |
| abstract_inverted_index.approach | 137 |
| abstract_inverted_index.negative | 88 |
| abstract_inverted_index.neglect. | 38 |
| abstract_inverted_index.previous | 128 |
| abstract_inverted_index.problems | 30 |
| abstract_inverted_index.prompts, | 27, 46 |
| abstract_inverted_index.provided | 25 |
| abstract_inverted_index.sampling | 89 |
| abstract_inverted_index.superior | 122 |
| abstract_inverted_index.Alignment | 55 |
| abstract_inverted_index.Attention | 53 |
| abstract_inverted_index.Extensive | 108 |
| abstract_inverted_index.attention | 104, 134 |
| abstract_inverted_index.attribute | 33, 68 |
| abstract_inverted_index.diffusion | 1, 150 |
| abstract_inverted_index.enhancing | 143 |
| abstract_inverted_index.excessive | 100 |
| abstract_inverted_index.generated | 21 |
| abstract_inverted_index.including | 113 |
| abstract_inverted_index.incorrect | 32 |
| abstract_inverted_index.intensity | 96 |
| abstract_inverted_index.introduce | 48 |
| abstract_inverted_index.naturally | 71 |
| abstract_inverted_index.pervasive | 41 |
| abstract_inverted_index.problems. | 62 |
| abstract_inverted_index.structure | 43 |
| abstract_inverted_index.benchmarks | 119 |
| abstract_inverted_index.generating | 8 |
| abstract_inverted_index.maximizing | 75 |
| abstract_inverted_index.technique. | 90 |
| abstract_inverted_index.underlying | 44 |
| abstract_inverted_index.attributes. | 107 |
| abstract_inverted_index.challenging | 118 |
| abstract_inverted_index.demonstrate | 120 |
| abstract_inverted_index.evaluation, | 115 |
| abstract_inverted_index.performance | 123 |
| abstract_inverted_index.qualitative | 109 |
| abstract_inverted_index.regularizer | 97 |
| abstract_inverted_index.text-guided | 10 |
| abstract_inverted_index.Energy-Based | 52 |
| abstract_inverted_index.catastrophic | 36 |
| abstract_inverted_index.energy-based | 81 |
| abstract_inverted_index.experiments, | 112 |
| abstract_inverted_index.high-quality | 9 |
| abstract_inverted_index.quantitative | 111 |
| abstract_inverted_index.semantically | 19 |
| abstract_inverted_index.Text-to-image | 0 |
| abstract_inverted_index.approximately | 74 |
| abstract_inverted_index.counterparts. | 130 |
| abstract_inverted_index.aforementioned | 61 |
| abstract_inverted_index.log-likelihood | 77 |
| abstract_inverted_index.object-centric | 67, 95 |
| abstract_inverted_index.object-oriented | 42 |
| abstract_inverted_index.text-controlled | 145 |
| abstract_inverted_index.$z$-parameterized | 80 |
| abstract_inverted_index.object-conditioned | 51 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/7 |
| sustainable_development_goals[0].score | 0.6299999952316284 |
| sustainable_development_goals[0].display_name | Affordable and clean energy |
| citation_normalized_percentile |