GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2303.10056
Text-to-image (T2I) models based on diffusion processes have achieved remarkable success in controllable image generation using user-provided captions. However, the tight coupling between the current text encoder and image decoder in T2I models makes it challenging to replace or upgrade. Such changes often require massive fine-tuning or even training from scratch with the prohibitive expense. To address this problem, we propose GlueGen, which applies a newly proposed GlueNet model to align features from single-modal or multi-modal encoders with the latent space of an existing T2I model. The approach introduces a new training objective that leverages parallel corpora to align the representation spaces of different encoders. Empirical results show that GlueNet can be trained efficiently and enables various capabilities beyond previous state-of-the-art models: 1) multilingual language models such as XLM-Roberta can be aligned with existing T2I models, allowing for the generation of high-quality images from captions beyond English; 2) GlueNet can align multi-modal encoders such as AudioCLIP with the Stable Diffusion model, enabling sound-to-image generation; 3) it can also upgrade the current text encoder of the latent diffusion model for challenging case generation. By the alignment of various feature representations, the GlueNet allows for flexible and efficient integration of new functionality into existing T2I models and sheds light on X-to-image (X2I) generation.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2303.10056
- https://arxiv.org/pdf/2303.10056
- OA Status
- green
- Cited By
- 3
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4327992912
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4327992912Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2303.10056Digital Object Identifier
- Title
-
GlueGen: Plug and Play Multi-modal Encoders for X-to-image GenerationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-03-17Full publication date if available
- Authors
-
Can Qin, Ning Yu, Xing Chen, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, Ran XuList of authors in order
- Landing page
-
https://arxiv.org/abs/2303.10056Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2303.10056Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2303.10056Direct OA link when available
- Concepts
-
Computer science, Encoder, Representation (politics), Feature (linguistics), Modal, Language model, Image (mathematics), Computer engineering, Upgrade, Artificial intelligence, Operating system, Political science, Politics, Chemistry, Law, Philosophy, Linguistics, Polymer chemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
3Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2024: 1, 2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4327992912 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2303.10056 |
| ids.doi | https://doi.org/10.48550/arxiv.2303.10056 |
| ids.openalex | https://openalex.org/W4327992912 |
| fwci | |
| type | preprint |
| title | GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.995199978351593 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10775 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.963100016117096 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Generative Adversarial Networks and Image Synthesis |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7889397144317627 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C118505674 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7019914388656616 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[1].display_name | Encoder |
| concepts[2].id | https://openalex.org/C2776359362 |
| concepts[2].level | 3 |
| concepts[2].score | 0.5407358407974243 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q2145286 |
| concepts[2].display_name | Representation (politics) |
| concepts[3].id | https://openalex.org/C2776401178 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5295639634132385 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q12050496 |
| concepts[3].display_name | Feature (linguistics) |
| concepts[4].id | https://openalex.org/C71139939 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5259414315223694 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q910194 |
| concepts[4].display_name | Modal |
| concepts[5].id | https://openalex.org/C137293760 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4766536355018616 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[5].display_name | Language model |
| concepts[6].id | https://openalex.org/C115961682 |
| concepts[6].level | 2 |
| concepts[6].score | 0.4646010100841522 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[6].display_name | Image (mathematics) |
| concepts[7].id | https://openalex.org/C113775141 |
| concepts[7].level | 1 |
| concepts[7].score | 0.41828566789627075 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q428691 |
| concepts[7].display_name | Computer engineering |
| concepts[8].id | https://openalex.org/C2780615140 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4169561564922333 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q920419 |
| concepts[8].display_name | Upgrade |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.39806386828422546 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C111919701 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[10].display_name | Operating system |
| concepts[11].id | https://openalex.org/C17744445 |
| concepts[11].level | 0 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[11].display_name | Political science |
| concepts[12].id | https://openalex.org/C94625758 |
| concepts[12].level | 2 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7163 |
| concepts[12].display_name | Politics |
| concepts[13].id | https://openalex.org/C185592680 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[13].display_name | Chemistry |
| concepts[14].id | https://openalex.org/C199539241 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[14].display_name | Law |
| concepts[15].id | https://openalex.org/C138885662 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[15].display_name | Philosophy |
| concepts[16].id | https://openalex.org/C41895202 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[16].display_name | Linguistics |
| concepts[17].id | https://openalex.org/C188027245 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q750446 |
| concepts[17].display_name | Polymer chemistry |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7889397144317627 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/encoder |
| keywords[1].score | 0.7019914388656616 |
| keywords[1].display_name | Encoder |
| keywords[2].id | https://openalex.org/keywords/representation |
| keywords[2].score | 0.5407358407974243 |
| keywords[2].display_name | Representation (politics) |
| keywords[3].id | https://openalex.org/keywords/feature |
| keywords[3].score | 0.5295639634132385 |
| keywords[3].display_name | Feature (linguistics) |
| keywords[4].id | https://openalex.org/keywords/modal |
| keywords[4].score | 0.5259414315223694 |
| keywords[4].display_name | Modal |
| keywords[5].id | https://openalex.org/keywords/language-model |
| keywords[5].score | 0.4766536355018616 |
| keywords[5].display_name | Language model |
| keywords[6].id | https://openalex.org/keywords/image |
| keywords[6].score | 0.4646010100841522 |
| keywords[6].display_name | Image (mathematics) |
| keywords[7].id | https://openalex.org/keywords/computer-engineering |
| keywords[7].score | 0.41828566789627075 |
| keywords[7].display_name | Computer engineering |
| keywords[8].id | https://openalex.org/keywords/upgrade |
| keywords[8].score | 0.4169561564922333 |
| keywords[8].display_name | Upgrade |
| keywords[9].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[9].score | 0.39806386828422546 |
| keywords[9].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2303.10056 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2303.10056 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2303.10056 |
| locations[1].id | doi:10.48550/arxiv.2303.10056 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2303.10056 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5021042598 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-0712-5378 |
| authorships[0].author.display_name | Can Qin |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Qin, Can |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100752315 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7140-5505 |
| authorships[1].author.display_name | Ning Yu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yu, Ning |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100371766 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-2003-3657 |
| authorships[2].author.display_name | Xing Chen |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Xing, Chen |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100452839 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-3431-744X |
| authorships[3].author.display_name | Shu Zhang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhang, Shu |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5101631517 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-0442-8501 |
| authorships[4].author.display_name | Zeyuan Chen |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Chen, Zeyuan |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5091179481 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-0039-2887 |
| authorships[5].author.display_name | Stefano Ermon |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Ermon, Stefano |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5005819096 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-5098-2853 |
| authorships[6].author.display_name | Yun Fu |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Fu, Yun |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5032046813 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-0349-8628 |
| authorships[7].author.display_name | Caiming Xiong |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Xiong, Caiming |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5100687200 |
| authorships[8].author.orcid | https://orcid.org/0000-0003-1387-6696 |
| authorships[8].author.display_name | Ran Xu |
| authorships[8].author_position | last |
| authorships[8].raw_author_name | Xu, Ran |
| authorships[8].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2303.10056 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.995199978351593 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2368672678, https://openalex.org/W2370626080, https://openalex.org/W2965111880, https://openalex.org/W2368576029, https://openalex.org/W2377210208, https://openalex.org/W116478885, https://openalex.org/W2391279445, https://openalex.org/W2390420166, https://openalex.org/W2354998446, https://openalex.org/W2378085033 |
| cited_by_count | 3 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2303.10056 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2303.10056 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2303.10056 |
| primary_location.id | pmh:oai:arXiv.org:2303.10056 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2303.10056 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2303.10056 |
| publication_date | 2023-03-17 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 64, 89 |
| abstract_inverted_index.1) | 122 |
| abstract_inverted_index.2) | 147 |
| abstract_inverted_index.3) | 164 |
| abstract_inverted_index.By | 182 |
| abstract_inverted_index.To | 55 |
| abstract_inverted_index.an | 82 |
| abstract_inverted_index.as | 127, 154 |
| abstract_inverted_index.be | 111, 130 |
| abstract_inverted_index.in | 11, 30 |
| abstract_inverted_index.it | 34, 165 |
| abstract_inverted_index.of | 81, 102, 140, 173, 185, 197 |
| abstract_inverted_index.on | 4, 207 |
| abstract_inverted_index.or | 38, 46, 74 |
| abstract_inverted_index.to | 36, 69, 97 |
| abstract_inverted_index.we | 59 |
| abstract_inverted_index.T2I | 31, 84, 134, 202 |
| abstract_inverted_index.The | 86 |
| abstract_inverted_index.and | 27, 114, 194, 204 |
| abstract_inverted_index.can | 110, 129, 149, 166 |
| abstract_inverted_index.for | 137, 178, 192 |
| abstract_inverted_index.new | 90, 198 |
| abstract_inverted_index.the | 19, 23, 52, 78, 99, 138, 157, 169, 174, 183, 189 |
| abstract_inverted_index.Such | 40 |
| abstract_inverted_index.also | 167 |
| abstract_inverted_index.case | 180 |
| abstract_inverted_index.even | 47 |
| abstract_inverted_index.from | 49, 72, 143 |
| abstract_inverted_index.have | 7 |
| abstract_inverted_index.into | 200 |
| abstract_inverted_index.show | 107 |
| abstract_inverted_index.such | 126, 153 |
| abstract_inverted_index.text | 25, 171 |
| abstract_inverted_index.that | 93, 108 |
| abstract_inverted_index.this | 57 |
| abstract_inverted_index.with | 51, 77, 132, 156 |
| abstract_inverted_index.(T2I) | 1 |
| abstract_inverted_index.(X2I) | 209 |
| abstract_inverted_index.align | 70, 98, 150 |
| abstract_inverted_index.based | 3 |
| abstract_inverted_index.image | 13, 28 |
| abstract_inverted_index.light | 206 |
| abstract_inverted_index.makes | 33 |
| abstract_inverted_index.model | 68, 177 |
| abstract_inverted_index.newly | 65 |
| abstract_inverted_index.often | 42 |
| abstract_inverted_index.sheds | 205 |
| abstract_inverted_index.space | 80 |
| abstract_inverted_index.tight | 20 |
| abstract_inverted_index.using | 15 |
| abstract_inverted_index.which | 62 |
| abstract_inverted_index.Stable | 158 |
| abstract_inverted_index.allows | 191 |
| abstract_inverted_index.beyond | 118, 145 |
| abstract_inverted_index.images | 142 |
| abstract_inverted_index.latent | 79, 175 |
| abstract_inverted_index.model, | 160 |
| abstract_inverted_index.model. | 85 |
| abstract_inverted_index.models | 2, 32, 125, 203 |
| abstract_inverted_index.spaces | 101 |
| abstract_inverted_index.GlueNet | 67, 109, 148, 190 |
| abstract_inverted_index.address | 56 |
| abstract_inverted_index.aligned | 131 |
| abstract_inverted_index.applies | 63 |
| abstract_inverted_index.between | 22 |
| abstract_inverted_index.changes | 41 |
| abstract_inverted_index.corpora | 96 |
| abstract_inverted_index.current | 24, 170 |
| abstract_inverted_index.decoder | 29 |
| abstract_inverted_index.enables | 115 |
| abstract_inverted_index.encoder | 26, 172 |
| abstract_inverted_index.feature | 187 |
| abstract_inverted_index.massive | 44 |
| abstract_inverted_index.models, | 135 |
| abstract_inverted_index.models: | 121 |
| abstract_inverted_index.propose | 60 |
| abstract_inverted_index.replace | 37 |
| abstract_inverted_index.require | 43 |
| abstract_inverted_index.results | 106 |
| abstract_inverted_index.scratch | 50 |
| abstract_inverted_index.success | 10 |
| abstract_inverted_index.trained | 112 |
| abstract_inverted_index.upgrade | 168 |
| abstract_inverted_index.various | 116, 186 |
| abstract_inverted_index.English; | 146 |
| abstract_inverted_index.GlueGen, | 61 |
| abstract_inverted_index.However, | 18 |
| abstract_inverted_index.achieved | 8 |
| abstract_inverted_index.allowing | 136 |
| abstract_inverted_index.approach | 87 |
| abstract_inverted_index.captions | 144 |
| abstract_inverted_index.coupling | 21 |
| abstract_inverted_index.enabling | 161 |
| abstract_inverted_index.encoders | 76, 152 |
| abstract_inverted_index.existing | 83, 133, 201 |
| abstract_inverted_index.expense. | 54 |
| abstract_inverted_index.features | 71 |
| abstract_inverted_index.flexible | 193 |
| abstract_inverted_index.language | 124 |
| abstract_inverted_index.parallel | 95 |
| abstract_inverted_index.previous | 119 |
| abstract_inverted_index.problem, | 58 |
| abstract_inverted_index.proposed | 66 |
| abstract_inverted_index.training | 48, 91 |
| abstract_inverted_index.upgrade. | 39 |
| abstract_inverted_index.AudioCLIP | 155 |
| abstract_inverted_index.Diffusion | 159 |
| abstract_inverted_index.Empirical | 105 |
| abstract_inverted_index.alignment | 184 |
| abstract_inverted_index.captions. | 17 |
| abstract_inverted_index.different | 103 |
| abstract_inverted_index.diffusion | 5, 176 |
| abstract_inverted_index.efficient | 195 |
| abstract_inverted_index.encoders. | 104 |
| abstract_inverted_index.leverages | 94 |
| abstract_inverted_index.objective | 92 |
| abstract_inverted_index.processes | 6 |
| abstract_inverted_index.X-to-image | 208 |
| abstract_inverted_index.generation | 14, 139 |
| abstract_inverted_index.introduces | 88 |
| abstract_inverted_index.remarkable | 9 |
| abstract_inverted_index.XLM-Roberta | 128 |
| abstract_inverted_index.challenging | 35, 179 |
| abstract_inverted_index.efficiently | 113 |
| abstract_inverted_index.fine-tuning | 45 |
| abstract_inverted_index.generation. | 181, 210 |
| abstract_inverted_index.generation; | 163 |
| abstract_inverted_index.integration | 196 |
| abstract_inverted_index.multi-modal | 75, 151 |
| abstract_inverted_index.prohibitive | 53 |
| abstract_inverted_index.capabilities | 117 |
| abstract_inverted_index.controllable | 12 |
| abstract_inverted_index.high-quality | 141 |
| abstract_inverted_index.multilingual | 123 |
| abstract_inverted_index.single-modal | 73 |
| abstract_inverted_index.Text-to-image | 0 |
| abstract_inverted_index.functionality | 199 |
| abstract_inverted_index.user-provided | 16 |
| abstract_inverted_index.representation | 100 |
| abstract_inverted_index.sound-to-image | 162 |
| abstract_inverted_index.representations, | 188 |
| abstract_inverted_index.state-of-the-art | 120 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 9 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7099999785423279 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |