On Discrete Prompt Optimization for Diffusion Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2407.01606
This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently finding a solution to this problem: (1) Enormous Domain Space: Setting the domain to the entire language space poses significant difficulty to the optimization process. (2) Text Gradient: Efficiently computing the text gradient is challenging, as it requires backpropagating through the inference steps of the diffusion model and a non-differentiable embedding lookup table. Beyond the problem formulation, our main technical contributions lie in solving the above challenges. First, we design a family of dynamically generated compact subspaces comprised of only the most relevant words to user input, substantially restricting the domain space. Second, we introduce "Shortcut Text Gradient" -- an effective replacement for the text gradient that can be obtained with constant memory and runtime. Empirical evaluation on prompts collected from diverse sources (DiffusionDB, ChatGPT, COCO) suggests that our method can discover prompts that substantially improve (prompt enhancement) or destroy (adversarial attack) the faithfulness of images generated by the text-to-image diffusion model.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2407.01606
- https://arxiv.org/pdf/2407.01606
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4400377573
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4400377573Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2407.01606Digital Object Identifier
- Title
-
On Discrete Prompt Optimization for Diffusion ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-27Full publication date if available
- Authors
-
Ruochen Wang, Ting Liu, Cho‐Jui Hsieh, Boqing GongList of authors in order
- Landing page
-
https://arxiv.org/abs/2407.01606Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2407.01606Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2407.01606Direct OA link when available
- Concepts
-
Diffusion, Computer science, Mathematical optimization, Mathematics, Physics, ThermodynamicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4400377573 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2407.01606 |
| ids.doi | https://doi.org/10.48550/arxiv.2407.01606 |
| ids.openalex | https://openalex.org/W4400377573 |
| fwci | 0.0 |
| type | preprint |
| title | On Discrete Prompt Optimization for Diffusion Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10792 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.6481999754905701 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1703 |
| topics[0].subfield.display_name | Computational Theory and Mathematics |
| topics[0].display_name | Matrix Theory and Algorithms |
| topics[1].id | https://openalex.org/T11206 |
| topics[1].field.id | https://openalex.org/fields/31 |
| topics[1].field.display_name | Physics and Astronomy |
| topics[1].score | 0.5774999856948853 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/3109 |
| topics[1].subfield.display_name | Statistical and Nonlinear Physics |
| topics[1].display_name | Model Reduction and Neural Networks |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C69357855 |
| concepts[0].level | 2 |
| concepts[0].score | 0.6136953830718994 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q163214 |
| concepts[0].display_name | Diffusion |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.4985365867614746 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C126255220 |
| concepts[2].level | 1 |
| concepts[2].score | 0.36147063970565796 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[2].display_name | Mathematical optimization |
| concepts[3].id | https://openalex.org/C33923547 |
| concepts[3].level | 0 |
| concepts[3].score | 0.2570306360721588 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[3].display_name | Mathematics |
| concepts[4].id | https://openalex.org/C121332964 |
| concepts[4].level | 0 |
| concepts[4].score | 0.18802547454833984 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[4].display_name | Physics |
| concepts[5].id | https://openalex.org/C97355855 |
| concepts[5].level | 1 |
| concepts[5].score | 0.06122773885726929 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11473 |
| concepts[5].display_name | Thermodynamics |
| keywords[0].id | https://openalex.org/keywords/diffusion |
| keywords[0].score | 0.6136953830718994 |
| keywords[0].display_name | Diffusion |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.4985365867614746 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[2].score | 0.36147063970565796 |
| keywords[2].display_name | Mathematical optimization |
| keywords[3].id | https://openalex.org/keywords/mathematics |
| keywords[3].score | 0.2570306360721588 |
| keywords[3].display_name | Mathematics |
| keywords[4].id | https://openalex.org/keywords/physics |
| keywords[4].score | 0.18802547454833984 |
| keywords[4].display_name | Physics |
| keywords[5].id | https://openalex.org/keywords/thermodynamics |
| keywords[5].score | 0.06122773885726929 |
| keywords[5].display_name | Thermodynamics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2407.01606 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2407.01606 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2407.01606 |
| locations[1].id | doi:10.48550/arxiv.2407.01606 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article-journal |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2407.01606 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5080683301 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-2014-4282 |
| authorships[0].author.display_name | Ruochen Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Ruochen |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5057382261 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-3458-6567 |
| authorships[1].author.display_name | Ting Liu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Liu, Ting |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5010841999 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3520-9627 |
| authorships[2].author.display_name | Cho‐Jui Hsieh |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Hsieh, Cho-Jui |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5017319429 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-3915-5977 |
| authorships[3].author.display_name | Boqing Gong |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Gong, Boqing |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2407.01606 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | On Discrete Prompt Optimization for Diffusion Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10792 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.6481999754905701 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1703 |
| primary_topic.subfield.display_name | Computational Theory and Mathematics |
| primary_topic.display_name | Matrix Theory and Algorithms |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2407.01606 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2407.01606 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2407.01606 |
| primary_location.id | pmh:oai:arXiv.org:2407.01606 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2407.01606 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2407.01606 |
| publication_date | 2024-06-27 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 19, 34, 81, 103 |
| abstract_inverted_index.-- | 131 |
| abstract_inverted_index.We | 14 |
| abstract_inverted_index.an | 132 |
| abstract_inverted_index.as | 18, 68 |
| abstract_inverted_index.be | 141 |
| abstract_inverted_index.by | 180 |
| abstract_inverted_index.in | 10, 31, 95 |
| abstract_inverted_index.is | 66 |
| abstract_inverted_index.it | 69 |
| abstract_inverted_index.of | 76, 105, 111, 177 |
| abstract_inverted_index.on | 150 |
| abstract_inverted_index.or | 171 |
| abstract_inverted_index.to | 36, 46, 54, 117 |
| abstract_inverted_index.we | 101, 126 |
| abstract_inverted_index.(1) | 39 |
| abstract_inverted_index.(2) | 58 |
| abstract_inverted_index.Two | 27 |
| abstract_inverted_index.and | 80, 146 |
| abstract_inverted_index.can | 140, 163 |
| abstract_inverted_index.for | 7, 135 |
| abstract_inverted_index.lie | 94 |
| abstract_inverted_index.our | 90, 161 |
| abstract_inverted_index.the | 3, 24, 44, 47, 55, 63, 73, 77, 87, 97, 113, 122, 136, 175, 181 |
| abstract_inverted_index.Text | 59, 129 |
| abstract_inverted_index.This | 0 |
| abstract_inverted_index.from | 153 |
| abstract_inverted_index.main | 91 |
| abstract_inverted_index.most | 114 |
| abstract_inverted_index.only | 112 |
| abstract_inverted_index.over | 23 |
| abstract_inverted_index.text | 64, 137 |
| abstract_inverted_index.that | 139, 160, 166 |
| abstract_inverted_index.this | 37 |
| abstract_inverted_index.user | 118 |
| abstract_inverted_index.with | 143 |
| abstract_inverted_index.COCO) | 158 |
| abstract_inverted_index.above | 98 |
| abstract_inverted_index.arise | 30 |
| abstract_inverted_index.first | 4 |
| abstract_inverted_index.major | 28 |
| abstract_inverted_index.model | 79 |
| abstract_inverted_index.paper | 1 |
| abstract_inverted_index.poses | 51 |
| abstract_inverted_index.space | 50 |
| abstract_inverted_index.steps | 75 |
| abstract_inverted_index.words | 116 |
| abstract_inverted_index.Beyond | 86 |
| abstract_inverted_index.Domain | 41 |
| abstract_inverted_index.First, | 100 |
| abstract_inverted_index.Space: | 42 |
| abstract_inverted_index.design | 102 |
| abstract_inverted_index.domain | 45, 123 |
| abstract_inverted_index.entire | 48 |
| abstract_inverted_index.family | 104 |
| abstract_inverted_index.images | 178 |
| abstract_inverted_index.input, | 119 |
| abstract_inverted_index.lookup | 84 |
| abstract_inverted_index.memory | 145 |
| abstract_inverted_index.method | 162 |
| abstract_inverted_index.model. | 184 |
| abstract_inverted_index.prompt | 8, 16 |
| abstract_inverted_index.space. | 26, 124 |
| abstract_inverted_index.table. | 85 |
| abstract_inverted_index.(prompt | 169 |
| abstract_inverted_index.Second, | 125 |
| abstract_inverted_index.Setting | 43 |
| abstract_inverted_index.attack) | 174 |
| abstract_inverted_index.compact | 108 |
| abstract_inverted_index.destroy | 172 |
| abstract_inverted_index.diverse | 154 |
| abstract_inverted_index.finding | 33 |
| abstract_inverted_index.improve | 168 |
| abstract_inverted_index.models. | 13 |
| abstract_inverted_index.problem | 22, 88 |
| abstract_inverted_index.prompts | 151, 165 |
| abstract_inverted_index.solving | 96 |
| abstract_inverted_index.sources | 155 |
| abstract_inverted_index.through | 72 |
| abstract_inverted_index.ChatGPT, | 157 |
| abstract_inverted_index.Enormous | 40 |
| abstract_inverted_index.constant | 144 |
| abstract_inverted_index.discover | 164 |
| abstract_inverted_index.discrete | 20 |
| abstract_inverted_index.gradient | 65, 138 |
| abstract_inverted_index.language | 25, 49 |
| abstract_inverted_index.obtained | 142 |
| abstract_inverted_index.problem: | 38 |
| abstract_inverted_index.process. | 57 |
| abstract_inverted_index.relevant | 115 |
| abstract_inverted_index.requires | 70 |
| abstract_inverted_index.runtime. | 147 |
| abstract_inverted_index.solution | 35 |
| abstract_inverted_index.suggests | 159 |
| abstract_inverted_index."Shortcut | 128 |
| abstract_inverted_index.Empirical | 148 |
| abstract_inverted_index.Gradient" | 130 |
| abstract_inverted_index.Gradient: | 60 |
| abstract_inverted_index.collected | 152 |
| abstract_inverted_index.comprised | 110 |
| abstract_inverted_index.computing | 62 |
| abstract_inverted_index.diffusion | 12, 78, 183 |
| abstract_inverted_index.effective | 133 |
| abstract_inverted_index.embedding | 83 |
| abstract_inverted_index.formulate | 15 |
| abstract_inverted_index.framework | 6 |
| abstract_inverted_index.generated | 107, 179 |
| abstract_inverted_index.inference | 74 |
| abstract_inverted_index.introduce | 127 |
| abstract_inverted_index.subspaces | 109 |
| abstract_inverted_index.technical | 92 |
| abstract_inverted_index.challenges | 29 |
| abstract_inverted_index.difficulty | 53 |
| abstract_inverted_index.evaluation | 149 |
| abstract_inverted_index.introduces | 2 |
| abstract_inverted_index.Efficiently | 61 |
| abstract_inverted_index.challenges. | 99 |
| abstract_inverted_index.dynamically | 106 |
| abstract_inverted_index.efficiently | 32 |
| abstract_inverted_index.engineering | 17 |
| abstract_inverted_index.replacement | 134 |
| abstract_inverted_index.restricting | 121 |
| abstract_inverted_index.significant | 52 |
| abstract_inverted_index.(adversarial | 173 |
| abstract_inverted_index.challenging, | 67 |
| abstract_inverted_index.enhancement) | 170 |
| abstract_inverted_index.faithfulness | 176 |
| abstract_inverted_index.formulation, | 89 |
| abstract_inverted_index.optimization | 9, 21, 56 |
| abstract_inverted_index.(DiffusionDB, | 156 |
| abstract_inverted_index.contributions | 93 |
| abstract_inverted_index.substantially | 120, 167 |
| abstract_inverted_index.text-to-image | 11, 182 |
| abstract_inverted_index.gradient-based | 5 |
| abstract_inverted_index.backpropagating | 71 |
| abstract_inverted_index.non-differentiable | 82 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile.value | 0.15273259 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |