GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2311.16511
While the recent advances in Multimodal Large Language Models (MLLMs) constitute a significant leap forward in the field, these models are predominantly confined to the realm of input-side multimodal comprehension, lacking the capacity for multimodal content generation. To fill this gap, we present GPT4Video, a unified multi-model framework that empowers Large Language Models (LLMs) with the capability of both video understanding and generation. Specifically, we develop an instruction-following-based approach integrated with the stable diffusion generative model, which has demonstrated to effectively and securely handle video generation scenarios. GPT4Video offers the following benefits: 1) It exhibits impressive capabilities in both video understanding and generation scenarios. For example, GPT4Video outperforms Valley by 11.8\% on the Video Question Answering task, and surpasses NExt-GPT by 2.3\% on the Text to Video generation task. 2) it endows the LLM/MLLM with video generation capabilities without requiring additional training parameters and can flexibly interface with a wide range of models to perform video generation. 3) it maintains a safe and healthy conversation not only in output-side but also the input side in an end-to-end manner. Qualitative and qualitative experiments demonstrate that GPT4Video holds the potential to function as a effective, safe and Humanoid-like video assistant that can handle both video understanding and generation scenarios.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2311.16511
- https://arxiv.org/pdf/2311.16511
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4389156693
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4389156693Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2311.16511Digital Object Identifier
- Title
-
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware GenerationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-11-25Full publication date if available
- Authors
-
Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng TuList of authors in order
- Landing page
-
https://arxiv.org/abs/2311.16511Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2311.16511Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2311.16511Direct OA link when available
- Concepts
-
Computer science, Task (project management), Generative grammar, Conversation, Realm, Field (mathematics), Language model, Human–computer interaction, Multimedia, Artificial intelligence, Systems engineering, Engineering, Political science, Law, Mathematics, Pure mathematics, Linguistics, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4389156693 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2311.16511 |
| ids.doi | https://doi.org/10.48550/arxiv.2311.16511 |
| ids.openalex | https://openalex.org/W4389156693 |
| fwci | |
| type | preprint |
| title | GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9994999766349792 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.996399998664856 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T10181 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9890000224113464 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Natural Language Processing Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7991091012954712 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C2780451532 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6222060322761536 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[1].display_name | Task (project management) |
| concepts[2].id | https://openalex.org/C39890363 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5785314440727234 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q36108 |
| concepts[2].display_name | Generative grammar |
| concepts[3].id | https://openalex.org/C2777200299 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5298086404800415 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q52943 |
| concepts[3].display_name | Conversation |
| concepts[4].id | https://openalex.org/C2778757428 |
| concepts[4].level | 2 |
| concepts[4].score | 0.509928822517395 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1250464 |
| concepts[4].display_name | Realm |
| concepts[5].id | https://openalex.org/C9652623 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4990212917327881 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q190109 |
| concepts[5].display_name | Field (mathematics) |
| concepts[6].id | https://openalex.org/C137293760 |
| concepts[6].level | 2 |
| concepts[6].score | 0.48728692531585693 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[6].display_name | Language model |
| concepts[7].id | https://openalex.org/C107457646 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4620164632797241 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q207434 |
| concepts[7].display_name | Human–computer interaction |
| concepts[8].id | https://openalex.org/C49774154 |
| concepts[8].level | 1 |
| concepts[8].score | 0.43042975664138794 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q131765 |
| concepts[8].display_name | Multimedia |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.36037495732307434 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C201995342 |
| concepts[10].level | 1 |
| concepts[10].score | 0.1024673581123352 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q682496 |
| concepts[10].display_name | Systems engineering |
| concepts[11].id | https://openalex.org/C127413603 |
| concepts[11].level | 0 |
| concepts[11].score | 0.07242435216903687 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[11].display_name | Engineering |
| concepts[12].id | https://openalex.org/C17744445 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[12].display_name | Political science |
| concepts[13].id | https://openalex.org/C199539241 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[13].display_name | Law |
| concepts[14].id | https://openalex.org/C33923547 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[14].display_name | Mathematics |
| concepts[15].id | https://openalex.org/C202444582 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q837863 |
| concepts[15].display_name | Pure mathematics |
| concepts[16].id | https://openalex.org/C41895202 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[16].display_name | Linguistics |
| concepts[17].id | https://openalex.org/C138885662 |
| concepts[17].level | 0 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[17].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7991091012954712 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/task |
| keywords[1].score | 0.6222060322761536 |
| keywords[1].display_name | Task (project management) |
| keywords[2].id | https://openalex.org/keywords/generative-grammar |
| keywords[2].score | 0.5785314440727234 |
| keywords[2].display_name | Generative grammar |
| keywords[3].id | https://openalex.org/keywords/conversation |
| keywords[3].score | 0.5298086404800415 |
| keywords[3].display_name | Conversation |
| keywords[4].id | https://openalex.org/keywords/realm |
| keywords[4].score | 0.509928822517395 |
| keywords[4].display_name | Realm |
| keywords[5].id | https://openalex.org/keywords/field |
| keywords[5].score | 0.4990212917327881 |
| keywords[5].display_name | Field (mathematics) |
| keywords[6].id | https://openalex.org/keywords/language-model |
| keywords[6].score | 0.48728692531585693 |
| keywords[6].display_name | Language model |
| keywords[7].id | https://openalex.org/keywords/human–computer-interaction |
| keywords[7].score | 0.4620164632797241 |
| keywords[7].display_name | Human–computer interaction |
| keywords[8].id | https://openalex.org/keywords/multimedia |
| keywords[8].score | 0.43042975664138794 |
| keywords[8].display_name | Multimedia |
| keywords[9].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[9].score | 0.36037495732307434 |
| keywords[9].display_name | Artificial intelligence |
| keywords[10].id | https://openalex.org/keywords/systems-engineering |
| keywords[10].score | 0.1024673581123352 |
| keywords[10].display_name | Systems engineering |
| keywords[11].id | https://openalex.org/keywords/engineering |
| keywords[11].score | 0.07242435216903687 |
| keywords[11].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2311.16511 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-sa |
| locations[0].pdf_url | https://arxiv.org/pdf/2311.16511 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-sa |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2311.16511 |
| locations[1].id | doi:10.48550/arxiv.2311.16511 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2311.16511 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100676104 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2079-4931 |
| authorships[0].author.display_name | Zhanyu Wang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wang, Zhanyu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5088191810 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9062-6183 |
| authorships[1].author.display_name | Longyue Wang |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wang, Longyue |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5072694516 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-0796-4078 |
| authorships[2].author.display_name | Zhen Zhao |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zhao, Zhen |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5022151811 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-7185-0318 |
| authorships[3].author.display_name | Minghao Wu |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Wu, Minghao |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5043763521 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-3403-7050 |
| authorships[4].author.display_name | Chenyang Lyu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Lyu, Chenyang |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5101425369 |
| authorships[5].author.orcid | https://orcid.org/0000-0003-3539-8648 |
| authorships[5].author.display_name | Huayang Li |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Li, Huayang |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5037942269 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-9817-4065 |
| authorships[6].author.display_name | Deng Cai |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Cai, Deng |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5100643784 |
| authorships[7].author.orcid | https://orcid.org/0000-0001-8762-2424 |
| authorships[7].author.display_name | Luping Zhou |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Zhou, Luping |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5087920747 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-7018-0682 |
| authorships[8].author.display_name | Shuming Shi |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Shi, Shuming |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5110579339 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-7900-6055 |
| authorships[9].author.display_name | Zhaopeng Tu |
| authorships[9].author_position | last |
| authorships[9].raw_author_name | Tu, Zhaopeng |
| authorships[9].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2311.16511 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2023-11-30T00:00:00 |
| display_name | GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9994999766349792 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2359053655, https://openalex.org/W2387777532, https://openalex.org/W2382709029, https://openalex.org/W2389147080, https://openalex.org/W2377883125, https://openalex.org/W2362479786, https://openalex.org/W2392455911, https://openalex.org/W2374248756, https://openalex.org/W2375492428, https://openalex.org/W2350419982 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2311.16511 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-sa |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2311.16511 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2311.16511 |
| primary_location.id | pmh:oai:arXiv.org:2311.16511 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-sa |
| primary_location.pdf_url | https://arxiv.org/pdf/2311.16511 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-sa |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2311.16511 |
| publication_date | 2023-11-25 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 11, 44, 148, 160, 191 |
| abstract_inverted_index.1) | 92 |
| abstract_inverted_index.2) | 129 |
| abstract_inverted_index.3) | 157 |
| abstract_inverted_index.It | 93 |
| abstract_inverted_index.To | 37 |
| abstract_inverted_index.an | 66, 175 |
| abstract_inverted_index.as | 190 |
| abstract_inverted_index.by | 109, 120 |
| abstract_inverted_index.in | 4, 15, 97, 167, 174 |
| abstract_inverted_index.it | 130, 158 |
| abstract_inverted_index.of | 26, 57, 151 |
| abstract_inverted_index.on | 111, 122 |
| abstract_inverted_index.to | 23, 79, 125, 153, 188 |
| abstract_inverted_index.we | 41, 64 |
| abstract_inverted_index.For | 104 |
| abstract_inverted_index.and | 61, 81, 101, 117, 143, 162, 179, 194, 204 |
| abstract_inverted_index.are | 20 |
| abstract_inverted_index.but | 169 |
| abstract_inverted_index.can | 144, 199 |
| abstract_inverted_index.for | 33 |
| abstract_inverted_index.has | 77 |
| abstract_inverted_index.not | 165 |
| abstract_inverted_index.the | 1, 16, 24, 31, 55, 71, 89, 112, 123, 132, 171, 186 |
| abstract_inverted_index.Text | 124 |
| abstract_inverted_index.also | 170 |
| abstract_inverted_index.both | 58, 98, 201 |
| abstract_inverted_index.fill | 38 |
| abstract_inverted_index.gap, | 40 |
| abstract_inverted_index.leap | 13 |
| abstract_inverted_index.only | 166 |
| abstract_inverted_index.safe | 161, 193 |
| abstract_inverted_index.side | 173 |
| abstract_inverted_index.that | 48, 183, 198 |
| abstract_inverted_index.this | 39 |
| abstract_inverted_index.wide | 149 |
| abstract_inverted_index.with | 54, 70, 134, 147 |
| abstract_inverted_index.2.3\% | 121 |
| abstract_inverted_index.Large | 6, 50 |
| abstract_inverted_index.Video | 113, 126 |
| abstract_inverted_index.While | 0 |
| abstract_inverted_index.holds | 185 |
| abstract_inverted_index.input | 172 |
| abstract_inverted_index.range | 150 |
| abstract_inverted_index.realm | 25 |
| abstract_inverted_index.task, | 116 |
| abstract_inverted_index.task. | 128 |
| abstract_inverted_index.these | 18 |
| abstract_inverted_index.video | 59, 84, 99, 135, 155, 196, 202 |
| abstract_inverted_index.which | 76 |
| abstract_inverted_index.(LLMs) | 53 |
| abstract_inverted_index.11.8\% | 110 |
| abstract_inverted_index.Models | 8, 52 |
| abstract_inverted_index.Valley | 108 |
| abstract_inverted_index.endows | 131 |
| abstract_inverted_index.field, | 17 |
| abstract_inverted_index.handle | 83, 200 |
| abstract_inverted_index.model, | 75 |
| abstract_inverted_index.models | 19, 152 |
| abstract_inverted_index.offers | 88 |
| abstract_inverted_index.recent | 2 |
| abstract_inverted_index.stable | 72 |
| abstract_inverted_index.(MLLMs) | 9 |
| abstract_inverted_index.content | 35 |
| abstract_inverted_index.develop | 65 |
| abstract_inverted_index.forward | 14 |
| abstract_inverted_index.healthy | 163 |
| abstract_inverted_index.lacking | 30 |
| abstract_inverted_index.manner. | 177 |
| abstract_inverted_index.perform | 154 |
| abstract_inverted_index.present | 42 |
| abstract_inverted_index.unified | 45 |
| abstract_inverted_index.without | 138 |
| abstract_inverted_index.LLM/MLLM | 133 |
| abstract_inverted_index.Language | 7, 51 |
| abstract_inverted_index.NExt-GPT | 119 |
| abstract_inverted_index.Question | 114 |
| abstract_inverted_index.advances | 3 |
| abstract_inverted_index.approach | 68 |
| abstract_inverted_index.capacity | 32 |
| abstract_inverted_index.confined | 22 |
| abstract_inverted_index.empowers | 49 |
| abstract_inverted_index.example, | 105 |
| abstract_inverted_index.exhibits | 94 |
| abstract_inverted_index.flexibly | 145 |
| abstract_inverted_index.function | 189 |
| abstract_inverted_index.securely | 82 |
| abstract_inverted_index.training | 141 |
| abstract_inverted_index.Answering | 115 |
| abstract_inverted_index.GPT4Video | 87, 106, 184 |
| abstract_inverted_index.assistant | 197 |
| abstract_inverted_index.benefits: | 91 |
| abstract_inverted_index.diffusion | 73 |
| abstract_inverted_index.following | 90 |
| abstract_inverted_index.framework | 47 |
| abstract_inverted_index.interface | 146 |
| abstract_inverted_index.maintains | 159 |
| abstract_inverted_index.potential | 187 |
| abstract_inverted_index.requiring | 139 |
| abstract_inverted_index.surpasses | 118 |
| abstract_inverted_index.GPT4Video, | 43 |
| abstract_inverted_index.Multimodal | 5 |
| abstract_inverted_index.additional | 140 |
| abstract_inverted_index.capability | 56 |
| abstract_inverted_index.constitute | 10 |
| abstract_inverted_index.effective, | 192 |
| abstract_inverted_index.end-to-end | 176 |
| abstract_inverted_index.generation | 85, 102, 127, 136, 205 |
| abstract_inverted_index.generative | 74 |
| abstract_inverted_index.impressive | 95 |
| abstract_inverted_index.input-side | 27 |
| abstract_inverted_index.integrated | 69 |
| abstract_inverted_index.multimodal | 28, 34 |
| abstract_inverted_index.parameters | 142 |
| abstract_inverted_index.scenarios. | 86, 103, 206 |
| abstract_inverted_index.Qualitative | 178 |
| abstract_inverted_index.demonstrate | 182 |
| abstract_inverted_index.effectively | 80 |
| abstract_inverted_index.experiments | 181 |
| abstract_inverted_index.generation. | 36, 62, 156 |
| abstract_inverted_index.multi-model | 46 |
| abstract_inverted_index.outperforms | 107 |
| abstract_inverted_index.output-side | 168 |
| abstract_inverted_index.qualitative | 180 |
| abstract_inverted_index.significant | 12 |
| abstract_inverted_index.capabilities | 96, 137 |
| abstract_inverted_index.conversation | 164 |
| abstract_inverted_index.demonstrated | 78 |
| abstract_inverted_index.Humanoid-like | 195 |
| abstract_inverted_index.Specifically, | 63 |
| abstract_inverted_index.predominantly | 21 |
| abstract_inverted_index.understanding | 60, 100, 203 |
| abstract_inverted_index.comprehension, | 29 |
| abstract_inverted_index.instruction-following-based | 67 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 10 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.5699999928474426 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |