Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2105.11541
GuessWhat?! is a two-player visual dialog guessing game where player A asks a sequence of yes/no questions (Questioner) and makes a final guess (Guesser) about a target object in an image, based on answers from player B (Oracle). Based on this dialog history between the Questioner and the Oracle, a Guesser makes a final guess of the target object. Previous baseline Oracle model encodes no visual information in the model, and it cannot fully understand complex questions about color, shape, relationships and so on. Most existing work for Guesser encode the dialog history as a whole and train the Guesser models from scratch on the GuessWhat?! dataset. This is problematic since language encoder tend to forget long-term history and the GuessWhat?! data is sparse in terms of learning visual grounding of objects. Previous work for Questioner introduces state tracking mechanism into the model, but it is learned as a soft intermediates without any prior vision-linguistic insights. To bridge these gaps, in this paper we propose Vilbert-based Oracle, Guesser and Questioner, which are all built on top of pretrained vision-linguistic model, Vilbert. We introduce two-way background/target fusion mechanism into Vilbert-Oracle to account for both intra and inter-object questions. We propose a unified framework for Vilbert-Guesser and Vilbert-Questioner, where state-estimator is introduced to best utilize Vilbert's power on single-turn referring expression comprehension. Experimental results show that our proposed models outperform state-of-the-art models significantly by 7%, 10%, 12% for Oracle, Guesser and End-to-End Questioner respectively.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2105.11541
- https://arxiv.org/pdf/2105.11541
- OA Status
- green
- References
- 41
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W3165282223
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W3165282223Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2105.11541Digital Object Identifier
- Title
-
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic RepresentationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-05-24Full publication date if available
- Authors
-
Tao Tu, Qing Ping, Govind Thattai, Gökhan Tür, Prem NatarajanList of authors in order
- Landing page
-
https://arxiv.org/abs/2105.11541Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2105.11541Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2105.11541Direct OA link when available
- Concepts
-
Oracle, Computer science, Dialog box, Artificial intelligence, Object (grammar), Natural language processing, Encoder, Language model, Programming language, World Wide Web, Operating systemTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
41Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W3165282223 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2105.11541 |
| ids.doi | https://doi.org/10.48550/arxiv.2105.11541 |
| ids.mag | 3165282223 |
| ids.openalex | https://openalex.org/W3165282223 |
| fwci | |
| type | preprint |
| title | Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 1.0 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T11307 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9948999881744385 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Domain Adaptation and Few-Shot Learning |
| topics[2].id | https://openalex.org/T10627 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9941999912261963 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Advanced Image and Video Retrieval Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C55166926 |
| concepts[0].level | 2 |
| concepts[0].score | 0.8425924181938171 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q2892946 |
| concepts[0].display_name | Oracle |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7819083333015442 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C173853756 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6674461960792542 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q86915 |
| concepts[2].display_name | Dialog box |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5986602902412415 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C2781238097 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5961677432060242 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q175026 |
| concepts[4].display_name | Object (grammar) |
| concepts[5].id | https://openalex.org/C204321447 |
| concepts[5].level | 1 |
| concepts[5].score | 0.5817002058029175 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[5].display_name | Natural language processing |
| concepts[6].id | https://openalex.org/C118505674 |
| concepts[6].level | 2 |
| concepts[6].score | 0.5671873688697815 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[6].display_name | Encoder |
| concepts[7].id | https://openalex.org/C137293760 |
| concepts[7].level | 2 |
| concepts[7].score | 0.49129050970077515 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[7].display_name | Language model |
| concepts[8].id | https://openalex.org/C199360897 |
| concepts[8].level | 1 |
| concepts[8].score | 0.1067880392074585 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[8].display_name | Programming language |
| concepts[9].id | https://openalex.org/C136764020 |
| concepts[9].level | 1 |
| concepts[9].score | 0.0 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q466 |
| concepts[9].display_name | World Wide Web |
| concepts[10].id | https://openalex.org/C111919701 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[10].display_name | Operating system |
| keywords[0].id | https://openalex.org/keywords/oracle |
| keywords[0].score | 0.8425924181938171 |
| keywords[0].display_name | Oracle |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7819083333015442 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/dialog-box |
| keywords[2].score | 0.6674461960792542 |
| keywords[2].display_name | Dialog box |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.5986602902412415 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/object |
| keywords[4].score | 0.5961677432060242 |
| keywords[4].display_name | Object (grammar) |
| keywords[5].id | https://openalex.org/keywords/natural-language-processing |
| keywords[5].score | 0.5817002058029175 |
| keywords[5].display_name | Natural language processing |
| keywords[6].id | https://openalex.org/keywords/encoder |
| keywords[6].score | 0.5671873688697815 |
| keywords[6].display_name | Encoder |
| keywords[7].id | https://openalex.org/keywords/language-model |
| keywords[7].score | 0.49129050970077515 |
| keywords[7].display_name | Language model |
| keywords[8].id | https://openalex.org/keywords/programming-language |
| keywords[8].score | 0.1067880392074585 |
| keywords[8].display_name | Programming language |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2105.11541 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-nd |
| locations[0].pdf_url | https://arxiv.org/pdf/2105.11541 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-nd |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2105.11541 |
| locations[1].id | doi:10.48550/arxiv.2105.11541 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2105.11541 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5014724876 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9191-7938 |
| authorships[0].author.display_name | Tao Tu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Tao Tu |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5047553143 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8919-9364 |
| authorships[1].author.display_name | Qing Ping |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Qing Ping |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5088771920 |
| authorships[2].author.orcid | https://orcid.org/0009-0005-1010-8896 |
| authorships[2].author.display_name | Govind Thattai |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Govind Thattai |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5087941479 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Gökhan Tür |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Gokhan Tur |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5066184920 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-4386-6651 |
| authorships[4].author.display_name | Prem Natarajan |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Prem Natarajan |
| authorships[4].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2105.11541 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2021-06-07T00:00:00 |
| display_name | Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 1.0 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2098987383, https://openalex.org/W2417260800, https://openalex.org/W1596203174, https://openalex.org/W2117933979, https://openalex.org/W2283130723, https://openalex.org/W103938586, https://openalex.org/W2104718772, https://openalex.org/W4233992201, https://openalex.org/W2980207396, https://openalex.org/W3156493709 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2105.11541 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-nd |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2105.11541 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2105.11541 |
| primary_location.id | pmh:oai:arXiv.org:2105.11541 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-nd |
| primary_location.pdf_url | https://arxiv.org/pdf/2105.11541 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2105.11541 |
| publication_date | 2021-05-24 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W1686810756, https://openalex.org/W2970608575, https://openalex.org/W2277195237, https://openalex.org/W2963010567, https://openalex.org/W2064675550, https://openalex.org/W2966683369, https://openalex.org/W2768661419, https://openalex.org/W2884093133, https://openalex.org/W2963954913, https://openalex.org/W2988023442, https://openalex.org/W2296712013, https://openalex.org/W2953524275, https://openalex.org/W2964154063, https://openalex.org/W2768282280, https://openalex.org/W2963800628, https://openalex.org/W2970231061, https://openalex.org/W2895088466, https://openalex.org/W2886247548, https://openalex.org/W3034727271, https://openalex.org/W2905039743, https://openalex.org/W2119717200, https://openalex.org/W2963341956, https://openalex.org/W2995460200, https://openalex.org/W2963456134, https://openalex.org/W2914324282, https://openalex.org/W2966046055, https://openalex.org/W3035448310, https://openalex.org/W2997117909, https://openalex.org/W2799263800, https://openalex.org/W2963546667, https://openalex.org/W3034758614, https://openalex.org/W3046201071, https://openalex.org/W2945087694, https://openalex.org/W2489434015, https://openalex.org/W3091588028, https://openalex.org/W2558809543, https://openalex.org/W3096415518, https://openalex.org/W2963521239, https://openalex.org/W2975501350, https://openalex.org/W2770171824, https://openalex.org/W2194775991 |
| referenced_works_count | 41 |
| abstract_inverted_index.A | 10 |
| abstract_inverted_index.B | 36 |
| abstract_inverted_index.a | 2, 12, 20, 25, 49, 52, 94, 148, 199 |
| abstract_inverted_index.To | 156 |
| abstract_inverted_index.We | 181, 197 |
| abstract_inverted_index.an | 29 |
| abstract_inverted_index.as | 93, 147 |
| abstract_inverted_index.by | 231 |
| abstract_inverted_index.in | 28, 67, 124, 160 |
| abstract_inverted_index.is | 1, 108, 122, 145, 208 |
| abstract_inverted_index.it | 71, 144 |
| abstract_inverted_index.no | 64 |
| abstract_inverted_index.of | 14, 55, 126, 130, 176 |
| abstract_inverted_index.on | 32, 39, 103, 174, 215 |
| abstract_inverted_index.so | 82 |
| abstract_inverted_index.to | 114, 189, 210 |
| abstract_inverted_index.we | 163 |
| abstract_inverted_index.12% | 234 |
| abstract_inverted_index.7%, | 232 |
| abstract_inverted_index.all | 172 |
| abstract_inverted_index.and | 18, 46, 70, 81, 96, 118, 168, 194, 204, 238 |
| abstract_inverted_index.any | 152 |
| abstract_inverted_index.are | 171 |
| abstract_inverted_index.but | 143 |
| abstract_inverted_index.for | 87, 134, 191, 202, 235 |
| abstract_inverted_index.on. | 83 |
| abstract_inverted_index.our | 224 |
| abstract_inverted_index.the | 44, 47, 56, 68, 90, 98, 104, 119, 141 |
| abstract_inverted_index.top | 175 |
| abstract_inverted_index.10%, | 233 |
| abstract_inverted_index.Most | 84 |
| abstract_inverted_index.This | 107 |
| abstract_inverted_index.asks | 11 |
| abstract_inverted_index.best | 211 |
| abstract_inverted_index.both | 192 |
| abstract_inverted_index.data | 121 |
| abstract_inverted_index.from | 34, 101 |
| abstract_inverted_index.game | 7 |
| abstract_inverted_index.into | 140, 187 |
| abstract_inverted_index.show | 222 |
| abstract_inverted_index.soft | 149 |
| abstract_inverted_index.tend | 113 |
| abstract_inverted_index.that | 223 |
| abstract_inverted_index.this | 40, 161 |
| abstract_inverted_index.work | 86, 133 |
| abstract_inverted_index.Based | 38 |
| abstract_inverted_index.about | 24, 77 |
| abstract_inverted_index.based | 31 |
| abstract_inverted_index.built | 173 |
| abstract_inverted_index.final | 21, 53 |
| abstract_inverted_index.fully | 73 |
| abstract_inverted_index.gaps, | 159 |
| abstract_inverted_index.guess | 22, 54 |
| abstract_inverted_index.intra | 193 |
| abstract_inverted_index.makes | 19, 51 |
| abstract_inverted_index.model | 62 |
| abstract_inverted_index.paper | 162 |
| abstract_inverted_index.power | 214 |
| abstract_inverted_index.prior | 153 |
| abstract_inverted_index.since | 110 |
| abstract_inverted_index.state | 137 |
| abstract_inverted_index.terms | 125 |
| abstract_inverted_index.these | 158 |
| abstract_inverted_index.train | 97 |
| abstract_inverted_index.where | 8, 206 |
| abstract_inverted_index.which | 170 |
| abstract_inverted_index.whole | 95 |
| abstract_inverted_index.Oracle | 61 |
| abstract_inverted_index.bridge | 157 |
| abstract_inverted_index.cannot | 72 |
| abstract_inverted_index.color, | 78 |
| abstract_inverted_index.dialog | 5, 41, 91 |
| abstract_inverted_index.encode | 89 |
| abstract_inverted_index.forget | 115 |
| abstract_inverted_index.fusion | 185 |
| abstract_inverted_index.image, | 30 |
| abstract_inverted_index.model, | 69, 142, 179 |
| abstract_inverted_index.models | 100, 226, 229 |
| abstract_inverted_index.object | 27 |
| abstract_inverted_index.player | 9, 35 |
| abstract_inverted_index.shape, | 79 |
| abstract_inverted_index.sparse | 123 |
| abstract_inverted_index.target | 26, 57 |
| abstract_inverted_index.visual | 4, 65, 128 |
| abstract_inverted_index.yes/no | 15 |
| abstract_inverted_index.Guesser | 50, 88, 99, 167, 237 |
| abstract_inverted_index.Oracle, | 48, 166, 236 |
| abstract_inverted_index.account | 190 |
| abstract_inverted_index.answers | 33 |
| abstract_inverted_index.between | 43 |
| abstract_inverted_index.complex | 75 |
| abstract_inverted_index.encoder | 112 |
| abstract_inverted_index.encodes | 63 |
| abstract_inverted_index.history | 42, 92, 117 |
| abstract_inverted_index.learned | 146 |
| abstract_inverted_index.object. | 58 |
| abstract_inverted_index.propose | 164, 198 |
| abstract_inverted_index.results | 221 |
| abstract_inverted_index.scratch | 102 |
| abstract_inverted_index.two-way | 183 |
| abstract_inverted_index.unified | 200 |
| abstract_inverted_index.utilize | 212 |
| abstract_inverted_index.without | 151 |
| abstract_inverted_index.Previous | 59, 132 |
| abstract_inverted_index.Vilbert. | 180 |
| abstract_inverted_index.baseline | 60 |
| abstract_inverted_index.dataset. | 106 |
| abstract_inverted_index.existing | 85 |
| abstract_inverted_index.guessing | 6 |
| abstract_inverted_index.language | 111 |
| abstract_inverted_index.learning | 127 |
| abstract_inverted_index.objects. | 131 |
| abstract_inverted_index.proposed | 225 |
| abstract_inverted_index.sequence | 13 |
| abstract_inverted_index.tracking | 138 |
| abstract_inverted_index.(Guesser) | 23 |
| abstract_inverted_index.(Oracle). | 37 |
| abstract_inverted_index.Vilbert's | 213 |
| abstract_inverted_index.framework | 201 |
| abstract_inverted_index.grounding | 129 |
| abstract_inverted_index.insights. | 155 |
| abstract_inverted_index.introduce | 182 |
| abstract_inverted_index.long-term | 116 |
| abstract_inverted_index.mechanism | 139, 186 |
| abstract_inverted_index.questions | 16, 76 |
| abstract_inverted_index.referring | 217 |
| abstract_inverted_index.End-to-End | 239 |
| abstract_inverted_index.Questioner | 45, 135, 240 |
| abstract_inverted_index.expression | 218 |
| abstract_inverted_index.introduced | 209 |
| abstract_inverted_index.introduces | 136 |
| abstract_inverted_index.outperform | 227 |
| abstract_inverted_index.pretrained | 177 |
| abstract_inverted_index.questions. | 196 |
| abstract_inverted_index.two-player | 3 |
| abstract_inverted_index.understand | 74 |
| abstract_inverted_index.GuessWhat?! | 0, 105, 120 |
| abstract_inverted_index.Questioner, | 169 |
| abstract_inverted_index.information | 66 |
| abstract_inverted_index.problematic | 109 |
| abstract_inverted_index.single-turn | 216 |
| abstract_inverted_index.(Questioner) | 17 |
| abstract_inverted_index.Experimental | 220 |
| abstract_inverted_index.inter-object | 195 |
| abstract_inverted_index.Vilbert-based | 165 |
| abstract_inverted_index.intermediates | 150 |
| abstract_inverted_index.relationships | 80 |
| abstract_inverted_index.respectively. | 241 |
| abstract_inverted_index.significantly | 230 |
| abstract_inverted_index.Vilbert-Oracle | 188 |
| abstract_inverted_index.comprehension. | 219 |
| abstract_inverted_index.Vilbert-Guesser | 203 |
| abstract_inverted_index.state-estimator | 207 |
| abstract_inverted_index.state-of-the-art | 228 |
| abstract_inverted_index.background/target | 184 |
| abstract_inverted_index.vision-linguistic | 154, 178 |
| abstract_inverted_index.Vilbert-Questioner, | 205 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile |