Multi-Modal Answer Validation for Knowledge-Based VQA Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2103.12248
The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in various forms, including visual, textual, and commonsense knowledge. Using more knowledge sources increases the chance of retrieving more irrelevant or noisy facts, making it challenging to comprehend the facts and find the answer. To address this challenge, we propose Multi-modal Answer Validation using External knowledge (MAVEx), where the idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval. Instead of searching for the answer in a vast collection of often irrelevant facts as most existing approaches do, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source. Our multi-modal setting is the first to leverage external visual knowledge (images searched using Google), in addition to textual knowledge in the form of Wikipedia sentences and ConceptNet concepts. Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results. Our code is available at https://github.com/jialinwu17/MAVEX
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2103.12248
- https://arxiv.org/pdf/2103.12248
- OA Status
- green
- Cited By
- 16
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4309067651
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4309067651Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2103.12248Digital Object Identifier
- Title
-
Multi-Modal Answer Validation for Knowledge-Based VQAWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-03-23Full publication date if available
- Authors
-
Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh MottaghiList of authors in order
- Landing page
-
https://arxiv.org/abs/2103.12248Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2103.12248Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2103.12248Direct OA link when available
- Concepts
-
Question answering, Computer science, Leverage (statistics), Commonsense knowledge, Information retrieval, Knowledge retrieval, Knowledge extraction, Modal, Set (abstract data type), Artificial intelligence, Programming language, Polymer chemistry, ChemistryTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
16Total citation count in OpenAlex
- Citations by year (recent)
-
2024: 3, 2023: 7, 2022: 5, 2021: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4309067651 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2103.12248 |
| ids.doi | https://doi.org/10.48550/arxiv.2103.12248 |
| ids.openalex | https://openalex.org/W4309067651 |
| fwci | |
| type | preprint |
| title | Multi-Modal Answer Validation for Knowledge-Based VQA |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 1.0 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10627 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9851999878883362 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Image and Video Retrieval Techniques |
| topics[2].id | https://openalex.org/T11307 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.961899995803833 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Domain Adaptation and Few-Shot Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C44291984 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7904847860336304 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1074173 |
| concepts[0].display_name | Question answering |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7788995504379272 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C153083717 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7126209735870361 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q6535263 |
| concepts[2].display_name | Leverage (statistics) |
| concepts[3].id | https://openalex.org/C30542707 |
| concepts[3].level | 3 |
| concepts[3].score | 0.6997485756874084 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1603203 |
| concepts[3].display_name | Commonsense knowledge |
| concepts[4].id | https://openalex.org/C23123220 |
| concepts[4].level | 1 |
| concepts[4].score | 0.6050793528556824 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[4].display_name | Information retrieval |
| concepts[5].id | https://openalex.org/C2780613888 |
| concepts[5].level | 3 |
| concepts[5].score | 0.4984748363494873 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q6423394 |
| concepts[5].display_name | Knowledge retrieval |
| concepts[6].id | https://openalex.org/C120567893 |
| concepts[6].level | 2 |
| concepts[6].score | 0.49010786414146423 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q1582085 |
| concepts[6].display_name | Knowledge extraction |
| concepts[7].id | https://openalex.org/C71139939 |
| concepts[7].level | 2 |
| concepts[7].score | 0.47848957777023315 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q910194 |
| concepts[7].display_name | Modal |
| concepts[8].id | https://openalex.org/C177264268 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4643357992172241 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[8].display_name | Set (abstract data type) |
| concepts[9].id | https://openalex.org/C154945302 |
| concepts[9].level | 1 |
| concepts[9].score | 0.33756813406944275 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[9].display_name | Artificial intelligence |
| concepts[10].id | https://openalex.org/C199360897 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[10].display_name | Programming language |
| concepts[11].id | https://openalex.org/C188027245 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q750446 |
| concepts[11].display_name | Polymer chemistry |
| concepts[12].id | https://openalex.org/C185592680 |
| concepts[12].level | 0 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q2329 |
| concepts[12].display_name | Chemistry |
| keywords[0].id | https://openalex.org/keywords/question-answering |
| keywords[0].score | 0.7904847860336304 |
| keywords[0].display_name | Question answering |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7788995504379272 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/leverage |
| keywords[2].score | 0.7126209735870361 |
| keywords[2].display_name | Leverage (statistics) |
| keywords[3].id | https://openalex.org/keywords/commonsense-knowledge |
| keywords[3].score | 0.6997485756874084 |
| keywords[3].display_name | Commonsense knowledge |
| keywords[4].id | https://openalex.org/keywords/information-retrieval |
| keywords[4].score | 0.6050793528556824 |
| keywords[4].display_name | Information retrieval |
| keywords[5].id | https://openalex.org/keywords/knowledge-retrieval |
| keywords[5].score | 0.4984748363494873 |
| keywords[5].display_name | Knowledge retrieval |
| keywords[6].id | https://openalex.org/keywords/knowledge-extraction |
| keywords[6].score | 0.49010786414146423 |
| keywords[6].display_name | Knowledge extraction |
| keywords[7].id | https://openalex.org/keywords/modal |
| keywords[7].score | 0.47848957777023315 |
| keywords[7].display_name | Modal |
| keywords[8].id | https://openalex.org/keywords/set |
| keywords[8].score | 0.4643357992172241 |
| keywords[8].display_name | Set (abstract data type) |
| keywords[9].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[9].score | 0.33756813406944275 |
| keywords[9].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2103.12248 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2103.12248 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2103.12248 |
| locations[1].id | doi:10.48550/arxiv.2103.12248 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2103.12248 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5101825497 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-4684-5212 |
| authorships[0].author.display_name | Jialin Wu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Wu, Jialin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101055775 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Jiasen Lu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Lu, Jiasen |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5077726785 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Ashish Sabharwal |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Sabharwal, Ashish |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5070375939 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Roozbeh Mottaghi |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Mottaghi, Roozbeh |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2103.12248 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Multi-Modal Answer Validation for Knowledge-Based VQA |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 1.0 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W4313191056, https://openalex.org/W3015759694, https://openalex.org/W4320086306, https://openalex.org/W4376624582, https://openalex.org/W4390645603, https://openalex.org/W2971132369, https://openalex.org/W3115965961, https://openalex.org/W2617136920, https://openalex.org/W2367076628, https://openalex.org/W2950120176 |
| cited_by_count | 16 |
| counts_by_year[0].year | 2024 |
| counts_by_year[0].cited_by_count | 3 |
| counts_by_year[1].year | 2023 |
| counts_by_year[1].cited_by_count | 7 |
| counts_by_year[2].year | 2022 |
| counts_by_year[2].cited_by_count | 5 |
| counts_by_year[3].year | 2021 |
| counts_by_year[3].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2103.12248 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2103.12248 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2103.12248 |
| primary_location.id | pmh:oai:arXiv.org:2103.12248 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2103.12248 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2103.12248 |
| publication_date | 2021-03-23 |
| publication_year | 2021 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 79, 97, 172 |
| abstract_inverted_index.To | 60 |
| abstract_inverted_index.as | 104 |
| abstract_inverted_index.at | 188 |
| abstract_inverted_index.in | 14, 26, 96, 154, 159 |
| abstract_inverted_index.is | 76, 142, 186 |
| abstract_inverted_index.it | 50 |
| abstract_inverted_index.of | 2, 19, 42, 81, 91, 100, 162 |
| abstract_inverted_index.on | 86 |
| abstract_inverted_index.or | 46 |
| abstract_inverted_index.to | 16, 52, 77, 111, 114, 124, 132, 145, 156 |
| abstract_inverted_index.we | 64 |
| abstract_inverted_index.Our | 139, 168, 184 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.VQA | 175 |
| abstract_inverted_index.and | 32, 56, 130, 165 |
| abstract_inverted_index.do, | 108 |
| abstract_inverted_index.for | 93, 126 |
| abstract_inverted_index.how | 113, 131 |
| abstract_inverted_index.new | 181 |
| abstract_inverted_index.set | 80 |
| abstract_inverted_index.the | 17, 20, 40, 54, 58, 74, 94, 134, 143, 160 |
| abstract_inverted_index.Such | 22 |
| abstract_inverted_index.aims | 110 |
| abstract_inverted_index.code | 185 |
| abstract_inverted_index.each | 127 |
| abstract_inverted_index.find | 57 |
| abstract_inverted_index.form | 161 |
| abstract_inverted_index.from | 118 |
| abstract_inverted_index.idea | 75 |
| abstract_inverted_index.more | 36, 44 |
| abstract_inverted_index.most | 105 |
| abstract_inverted_index.that | 10, 137, 178 |
| abstract_inverted_index.this | 62 |
| abstract_inverted_index.vast | 98 |
| abstract_inverted_index.with | 170 |
| abstract_inverted_index.MAVEx | 109, 179 |
| abstract_inverted_index.Using | 35 |
| abstract_inverted_index.based | 85 |
| abstract_inverted_index.comes | 25 |
| abstract_inverted_index.facts | 55, 103 |
| abstract_inverted_index.first | 144 |
| abstract_inverted_index.learn | 112 |
| abstract_inverted_index.noisy | 47, 119 |
| abstract_inverted_index.often | 101 |
| abstract_inverted_index.trust | 125 |
| abstract_inverted_index.using | 69, 136, 152 |
| abstract_inverted_index.where | 73 |
| abstract_inverted_index.which | 121 |
| abstract_inverted_index.Answer | 67 |
| abstract_inverted_index.answer | 83, 95, 128 |
| abstract_inverted_index.chance | 41 |
| abstract_inverted_index.facts, | 48 |
| abstract_inverted_index.forms, | 28 |
| abstract_inverted_index.image. | 21 |
| abstract_inverted_index.making | 49 |
| abstract_inverted_index.source | 123 |
| abstract_inverted_index.visual | 4, 148 |
| abstract_inverted_index.(images | 150 |
| abstract_inverted_index.Instead | 90 |
| abstract_inverted_index.OK-VQA, | 171 |
| abstract_inverted_index.address | 61 |
| abstract_inverted_index.answer. | 59 |
| abstract_inverted_index.content | 18 |
| abstract_inverted_index.extract | 115 |
| abstract_inverted_index.problem | 1 |
| abstract_inverted_index.propose | 65 |
| abstract_inverted_index.require | 11 |
| abstract_inverted_index.setting | 141 |
| abstract_inverted_index.source. | 138 |
| abstract_inverted_index.sources | 38 |
| abstract_inverted_index.textual | 157 |
| abstract_inverted_index.various | 27 |
| abstract_inverted_index.visual, | 30 |
| abstract_inverted_index.(MAVEx), | 72 |
| abstract_inverted_index.External | 70 |
| abstract_inverted_index.Google), | 153 |
| abstract_inverted_index.achieves | 180 |
| abstract_inverted_index.addition | 15, 155 |
| abstract_inverted_index.dataset, | 176 |
| abstract_inverted_index.existing | 106 |
| abstract_inverted_index.external | 12, 147 |
| abstract_inverted_index.involves | 7 |
| abstract_inverted_index.leverage | 146 |
| abstract_inverted_index.question | 5 |
| abstract_inverted_index.relevant | 116 |
| abstract_inverted_index.results. | 183 |
| abstract_inverted_index.searched | 151 |
| abstract_inverted_index.sources, | 120 |
| abstract_inverted_index.textual, | 31 |
| abstract_inverted_index.validate | 78, 133 |
| abstract_inverted_index.Wikipedia | 163 |
| abstract_inverted_index.answering | 6, 8 |
| abstract_inverted_index.available | 187 |
| abstract_inverted_index.candidate | 135 |
| abstract_inverted_index.concepts. | 167 |
| abstract_inverted_index.including | 29 |
| abstract_inverted_index.increases | 39 |
| abstract_inverted_index.knowledge | 13, 23, 37, 71, 88, 117, 122, 149, 158 |
| abstract_inverted_index.promising | 82 |
| abstract_inverted_index.questions | 9 |
| abstract_inverted_index.searching | 92 |
| abstract_inverted_index.sentences | 164 |
| abstract_inverted_index.typically | 24 |
| abstract_inverted_index.ConceptNet | 166 |
| abstract_inverted_index.Validation | 68 |
| abstract_inverted_index.approaches | 107 |
| abstract_inverted_index.candidate, | 129 |
| abstract_inverted_index.candidates | 84 |
| abstract_inverted_index.challenge, | 63 |
| abstract_inverted_index.collection | 99 |
| abstract_inverted_index.comprehend | 53 |
| abstract_inverted_index.irrelevant | 45, 102 |
| abstract_inverted_index.knowledge. | 34 |
| abstract_inverted_index.retrieval. | 89 |
| abstract_inverted_index.retrieving | 43 |
| abstract_inverted_index.Multi-modal | 66 |
| abstract_inverted_index.challenging | 51, 173 |
| abstract_inverted_index.commonsense | 33 |
| abstract_inverted_index.demonstrate | 177 |
| abstract_inverted_index.experiments | 169 |
| abstract_inverted_index.multi-modal | 140 |
| abstract_inverted_index.answer-specific | 87 |
| abstract_inverted_index.knowledge-based | 3, 174 |
| abstract_inverted_index.state-of-the-art | 182 |
| abstract_inverted_index.https://github.com/jialinwu17/MAVEX | 189 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.800000011920929 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |