Towards Language-Driven Video Inpainting via Multimodal Large Language Models Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2401.10226
We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 videos and 9,091 inpainting results, to support training and evaluation for this task. We also propose a novel diffusion-based language-driven video inpainting framework, the first end-to-end baseline for this task, integrating Multimodal Large Language Models to understand and execute complex language-based inpainting requests effectively. Our comprehensive results showcase the dataset's versatility and the model's effectiveness in various language-instructed inpainting scenarios. We will make datasets, code, and models publicly available.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2401.10226
- https://arxiv.org/pdf/2401.10226
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4391047466
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4391047466Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2401.10226Digital Object Identifier
- Title
-
Towards Language-Driven Video Inpainting via Multimodal Large Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-01-18Full publication date if available
- Authors
-
Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change LoyList of authors in order
- Landing page
-
https://arxiv.org/abs/2401.10226Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2401.10226Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2401.10226Direct OA link when available
- Concepts
-
Inpainting, Computer science, Task (project management), Process (computing), Artificial intelligence, Code (set theory), Natural language, Natural language processing, Binary number, Image (mathematics), Computer vision, Programming language, Engineering, Mathematics, Set (abstract data type), Systems engineering, ArithmeticTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4391047466 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2401.10226 |
| ids.doi | https://doi.org/10.48550/arxiv.2401.10226 |
| ids.openalex | https://openalex.org/W4391047466 |
| fwci | |
| type | preprint |
| title | Towards Language-Driven Video Inpainting via Multimodal Large Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9983000159263611 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10775 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9947999715805054 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Generative Adversarial Networks and Image Synthesis |
| topics[2].id | https://openalex.org/T11439 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9797999858856201 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Video Analysis and Summarization |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C11727466 |
| concepts[0].level | 3 |
| concepts[0].score | 0.97467041015625 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1628157 |
| concepts[0].display_name | Inpainting |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.8054118156433105 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C2780451532 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7186487317085266 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[2].display_name | Task (project management) |
| concepts[3].id | https://openalex.org/C98045186 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6442747116088867 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q205663 |
| concepts[3].display_name | Process (computing) |
| concepts[4].id | https://openalex.org/C154945302 |
| concepts[4].level | 1 |
| concepts[4].score | 0.6020812392234802 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[4].display_name | Artificial intelligence |
| concepts[5].id | https://openalex.org/C2776760102 |
| concepts[5].level | 3 |
| concepts[5].score | 0.4588168263435364 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q5139990 |
| concepts[5].display_name | Code (set theory) |
| concepts[6].id | https://openalex.org/C195324797 |
| concepts[6].level | 2 |
| concepts[6].score | 0.451699435710907 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q33742 |
| concepts[6].display_name | Natural language |
| concepts[7].id | https://openalex.org/C204321447 |
| concepts[7].level | 1 |
| concepts[7].score | 0.44221633672714233 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[7].display_name | Natural language processing |
| concepts[8].id | https://openalex.org/C48372109 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4355008006095886 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q3913 |
| concepts[8].display_name | Binary number |
| concepts[9].id | https://openalex.org/C115961682 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4240344166755676 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[9].display_name | Image (mathematics) |
| concepts[10].id | https://openalex.org/C31972630 |
| concepts[10].level | 1 |
| concepts[10].score | 0.37577828764915466 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q844240 |
| concepts[10].display_name | Computer vision |
| concepts[11].id | https://openalex.org/C199360897 |
| concepts[11].level | 1 |
| concepts[11].score | 0.13738998770713806 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[11].display_name | Programming language |
| concepts[12].id | https://openalex.org/C127413603 |
| concepts[12].level | 0 |
| concepts[12].score | 0.07719135284423828 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[12].display_name | Engineering |
| concepts[13].id | https://openalex.org/C33923547 |
| concepts[13].level | 0 |
| concepts[13].score | 0.062214285135269165 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[13].display_name | Mathematics |
| concepts[14].id | https://openalex.org/C177264268 |
| concepts[14].level | 2 |
| concepts[14].score | 0.059583842754364014 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[14].display_name | Set (abstract data type) |
| concepts[15].id | https://openalex.org/C201995342 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q682496 |
| concepts[15].display_name | Systems engineering |
| concepts[16].id | https://openalex.org/C94375191 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q11205 |
| concepts[16].display_name | Arithmetic |
| keywords[0].id | https://openalex.org/keywords/inpainting |
| keywords[0].score | 0.97467041015625 |
| keywords[0].display_name | Inpainting |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.8054118156433105 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/task |
| keywords[2].score | 0.7186487317085266 |
| keywords[2].display_name | Task (project management) |
| keywords[3].id | https://openalex.org/keywords/process |
| keywords[3].score | 0.6442747116088867 |
| keywords[3].display_name | Process (computing) |
| keywords[4].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[4].score | 0.6020812392234802 |
| keywords[4].display_name | Artificial intelligence |
| keywords[5].id | https://openalex.org/keywords/code |
| keywords[5].score | 0.4588168263435364 |
| keywords[5].display_name | Code (set theory) |
| keywords[6].id | https://openalex.org/keywords/natural-language |
| keywords[6].score | 0.451699435710907 |
| keywords[6].display_name | Natural language |
| keywords[7].id | https://openalex.org/keywords/natural-language-processing |
| keywords[7].score | 0.44221633672714233 |
| keywords[7].display_name | Natural language processing |
| keywords[8].id | https://openalex.org/keywords/binary-number |
| keywords[8].score | 0.4355008006095886 |
| keywords[8].display_name | Binary number |
| keywords[9].id | https://openalex.org/keywords/image |
| keywords[9].score | 0.4240344166755676 |
| keywords[9].display_name | Image (mathematics) |
| keywords[10].id | https://openalex.org/keywords/computer-vision |
| keywords[10].score | 0.37577828764915466 |
| keywords[10].display_name | Computer vision |
| keywords[11].id | https://openalex.org/keywords/programming-language |
| keywords[11].score | 0.13738998770713806 |
| keywords[11].display_name | Programming language |
| keywords[12].id | https://openalex.org/keywords/engineering |
| keywords[12].score | 0.07719135284423828 |
| keywords[12].display_name | Engineering |
| keywords[13].id | https://openalex.org/keywords/mathematics |
| keywords[13].score | 0.062214285135269165 |
| keywords[13].display_name | Mathematics |
| keywords[14].id | https://openalex.org/keywords/set |
| keywords[14].score | 0.059583842754364014 |
| keywords[14].display_name | Set (abstract data type) |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2401.10226 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2401.10226 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2401.10226 |
| locations[1].id | doi:10.48550/arxiv.2401.10226 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2401.10226 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5024765927 |
| authorships[0].author.orcid | https://orcid.org/0009-0007-4559-7970 |
| authorships[0].author.display_name | Jianzong Wu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wu, Jianzong |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5029645676 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Xiangtai Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Xiangtai |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5023000066 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3354-1968 |
| authorships[2].author.display_name | Chenyang Si |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Si, Chenyang |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5023785888 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-8201-8877 |
| authorships[3].author.display_name | Shangchen Zhou |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zhou, Shangchen |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5075948339 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-9424-254X |
| authorships[4].author.display_name | Jingkang Yang |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Yang, Jingkang |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5021861529 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-8891-6766 |
| authorships[5].author.display_name | Jiangning Zhang |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Zhang, Jiangning |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5101546250 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-4761-2293 |
| authorships[6].author.display_name | Yining Li |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Li, Yining |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5048500768 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-3930-8294 |
| authorships[7].author.display_name | Kai Chen |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Chen, Kai |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5024097240 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-8735-2516 |
| authorships[8].author.display_name | Yunhai Tong |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Tong, Yunhai |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5100406050 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-4220-5958 |
| authorships[9].author.display_name | Ziwei Liu |
| authorships[9].author_position | middle |
| authorships[9].raw_author_name | Liu, Ziwei |
| authorships[9].is_corresponding | False |
| authorships[10].author.id | https://openalex.org/A5005626854 |
| authorships[10].author.orcid | https://orcid.org/0000-0001-5345-1591 |
| authorships[10].author.display_name | Chen Change Loy |
| authorships[10].author_position | last |
| authorships[10].raw_author_name | Loy, Chen Change |
| authorships[10].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2401.10226 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-01-20T00:00:00 |
| display_name | Towards Language-Driven Video Inpainting via Multimodal Large Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9983000159263611 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2017457812, https://openalex.org/W3178025616, https://openalex.org/W2131831293, https://openalex.org/W2946160871, https://openalex.org/W3035059915, https://openalex.org/W1995073329, https://openalex.org/W2060947339, https://openalex.org/W425542480, https://openalex.org/W49967185, https://openalex.org/W2107727507 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2401.10226 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2401.10226 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2401.10226 |
| primary_location.id | pmh:oai:arXiv.org:2401.10226 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2401.10226 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2401.10226 |
| publication_date | 2024-01-18 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 2, 36, 71 |
| abstract_inverted_index.-- | 5 |
| abstract_inverted_index.We | 0, 42, 68, 115 |
| abstract_inverted_index.by | 49 |
| abstract_inverted_index.in | 110 |
| abstract_inverted_index.of | 24 |
| abstract_inverted_index.on | 31 |
| abstract_inverted_index.to | 14, 60, 90 |
| abstract_inverted_index.Our | 99 |
| abstract_inverted_index.and | 40, 56, 63, 92, 106, 120 |
| abstract_inverted_index.for | 65, 82 |
| abstract_inverted_index.new | 3 |
| abstract_inverted_index.the | 16, 22, 44, 78, 103, 107 |
| abstract_inverted_index.This | 19 |
| abstract_inverted_index.also | 69 |
| abstract_inverted_index.from | 47 |
| abstract_inverted_index.make | 117 |
| abstract_inverted_index.task | 4 |
| abstract_inverted_index.that | 29 |
| abstract_inverted_index.this | 66, 83 |
| abstract_inverted_index.uses | 10 |
| abstract_inverted_index.will | 116 |
| abstract_inverted_index.5,650 | 54 |
| abstract_inverted_index.9,091 | 57 |
| abstract_inverted_index.Large | 87 |
| abstract_inverted_index.code, | 119 |
| abstract_inverted_index.first | 79 |
| abstract_inverted_index.guide | 15 |
| abstract_inverted_index.novel | 72 |
| abstract_inverted_index.often | 38 |
| abstract_inverted_index.task, | 84 |
| abstract_inverted_index.task. | 67 |
| abstract_inverted_index.video | 7, 26, 75 |
| abstract_inverted_index.which | 9 |
| abstract_inverted_index.(ROVI) | 51 |
| abstract_inverted_index.Models | 89 |
| abstract_inverted_index.Remove | 45 |
| abstract_inverted_index.Videos | 48 |
| abstract_inverted_index.binary | 34 |
| abstract_inverted_index.depend | 30 |
| abstract_inverted_index.masks, | 35 |
| abstract_inverted_index.models | 121 |
| abstract_inverted_index.videos | 55 |
| abstract_inverted_index.Objects | 46 |
| abstract_inverted_index.complex | 94 |
| abstract_inverted_index.execute | 93 |
| abstract_inverted_index.labeled | 33 |
| abstract_inverted_index.methods | 28 |
| abstract_inverted_index.model's | 108 |
| abstract_inverted_index.natural | 11 |
| abstract_inverted_index.present | 43 |
| abstract_inverted_index.process | 37 |
| abstract_inverted_index.propose | 70 |
| abstract_inverted_index.results | 101 |
| abstract_inverted_index.support | 61 |
| abstract_inverted_index.tedious | 39 |
| abstract_inverted_index.various | 111 |
| abstract_inverted_index.Language | 88 |
| abstract_inverted_index.approach | 20 |
| abstract_inverted_index.baseline | 81 |
| abstract_inverted_index.dataset, | 52 |
| abstract_inverted_index.language | 12 |
| abstract_inverted_index.manually | 32 |
| abstract_inverted_index.process. | 18 |
| abstract_inverted_index.publicly | 122 |
| abstract_inverted_index.requests | 97 |
| abstract_inverted_index.results, | 59 |
| abstract_inverted_index.showcase | 102 |
| abstract_inverted_index.training | 62 |
| abstract_inverted_index.dataset's | 104 |
| abstract_inverted_index.datasets, | 118 |
| abstract_inverted_index.introduce | 1 |
| abstract_inverted_index.overcomes | 21 |
| abstract_inverted_index.Multimodal | 86 |
| abstract_inverted_index.available. | 123 |
| abstract_inverted_index.containing | 53 |
| abstract_inverted_index.end-to-end | 80 |
| abstract_inverted_index.evaluation | 64 |
| abstract_inverted_index.framework, | 77 |
| abstract_inverted_index.inpainting | 17, 27, 58, 76, 96, 113 |
| abstract_inverted_index.scenarios. | 114 |
| abstract_inverted_index.understand | 91 |
| abstract_inverted_index.inpainting, | 8 |
| abstract_inverted_index.integrating | 85 |
| abstract_inverted_index.limitations | 23 |
| abstract_inverted_index.traditional | 25 |
| abstract_inverted_index.versatility | 105 |
| abstract_inverted_index.Instructions | 50 |
| abstract_inverted_index.effectively. | 98 |
| abstract_inverted_index.instructions | 13 |
| abstract_inverted_index.comprehensive | 100 |
| abstract_inverted_index.effectiveness | 109 |
| abstract_inverted_index.language-based | 95 |
| abstract_inverted_index.diffusion-based | 73 |
| abstract_inverted_index.language-driven | 6, 74 |
| abstract_inverted_index.labor-intensive. | 41 |
| abstract_inverted_index.language-instructed | 112 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 11 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/8 |
| sustainable_development_goals[0].score | 0.5600000023841858 |
| sustainable_development_goals[0].display_name | Decent work and economic growth |
| citation_normalized_percentile |