Towards Reasoning-Aware Explainable VQA Article Swipe
YOU?
·
· 2022
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2211.05190
The domain of joint vision-language understanding, especially in the context of reasoning in Visual Question Answering (VQA) models, has garnered significant attention in the recent past. While most of the existing VQA models focus on improving the accuracy of VQA, the way models arrive at an answer is oftentimes a black box. As a step towards making the VQA task more explainable and interpretable, our method is built upon the SOTA VQA framework by augmenting it with an end-to-end explanation generation module. In this paper, we investigate two network architectures, including Long Short-Term Memory (LSTM) and Transformer decoder, as the explanation generator. Our method generates human-readable textual explanations while maintaining SOTA VQA accuracy on the GQA-REX (77.49%) and VQA-E (71.48%) datasets. Approximately 65.16% of the generated explanations are approved by humans as valid. Roughly 60.5% of the generated explanations are valid and lead to the correct answers.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2211.05190
- https://arxiv.org/pdf/2211.05190
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4308827733
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4308827733Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2211.05190Digital Object Identifier
- Title
-
Towards Reasoning-Aware Explainable VQAWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2022Year of publication
- Publication date
-
2022-11-09Full publication date if available
- Authors
-
Rakesh Vaideeswaran, Feng Gao, Abhinav Mathur, Govind ThattaiList of authors in order
- Landing page
-
https://arxiv.org/abs/2211.05190Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2211.05190Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2211.05190Direct OA link when available
- Concepts
-
Computer science, Transformer, Question answering, Artificial intelligence, Black box, Context (archaeology), Domain (mathematical analysis), Natural language processing, Focus (optics), Machine learning, Engineering, Mathematical analysis, Electrical engineering, Mathematics, Voltage, Paleontology, Optics, Physics, BiologyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2023: 2Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4308827733 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2211.05190 |
| ids.doi | https://doi.org/10.48550/arxiv.2211.05190 |
| ids.openalex | https://openalex.org/W4308827733 |
| fwci | |
| type | preprint |
| title | Towards Reasoning-Aware Explainable VQA |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11714 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9998999834060669 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1707 |
| topics[0].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[0].display_name | Multimodal Machine Learning Applications |
| topics[1].id | https://openalex.org/T10627 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9550999999046326 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1707 |
| topics[1].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[1].display_name | Advanced Image and Video Retrieval Techniques |
| topics[2].id | https://openalex.org/T10028 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9416999816894531 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Topic Modeling |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7418344020843506 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C66322947 |
| concepts[1].level | 3 |
| concepts[1].score | 0.7088727355003357 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[1].display_name | Transformer |
| concepts[2].id | https://openalex.org/C44291984 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7031437158584595 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1074173 |
| concepts[2].display_name | Question answering |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5742824077606201 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C94966114 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5434004664421082 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q29256 |
| concepts[4].display_name | Black box |
| concepts[5].id | https://openalex.org/C2779343474 |
| concepts[5].level | 2 |
| concepts[5].score | 0.47725793719291687 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q3109175 |
| concepts[5].display_name | Context (archaeology) |
| concepts[6].id | https://openalex.org/C36503486 |
| concepts[6].level | 2 |
| concepts[6].score | 0.44986841082572937 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11235244 |
| concepts[6].display_name | Domain (mathematical analysis) |
| concepts[7].id | https://openalex.org/C204321447 |
| concepts[7].level | 1 |
| concepts[7].score | 0.43867695331573486 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[7].display_name | Natural language processing |
| concepts[8].id | https://openalex.org/C192209626 |
| concepts[8].level | 2 |
| concepts[8].score | 0.41308173537254333 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q190909 |
| concepts[8].display_name | Focus (optics) |
| concepts[9].id | https://openalex.org/C119857082 |
| concepts[9].level | 1 |
| concepts[9].score | 0.3876357972621918 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[9].display_name | Machine learning |
| concepts[10].id | https://openalex.org/C127413603 |
| concepts[10].level | 0 |
| concepts[10].score | 0.05969378352165222 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[10].display_name | Engineering |
| concepts[11].id | https://openalex.org/C134306372 |
| concepts[11].level | 1 |
| concepts[11].score | 0.0 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[11].display_name | Mathematical analysis |
| concepts[12].id | https://openalex.org/C119599485 |
| concepts[12].level | 1 |
| concepts[12].score | 0.0 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[12].display_name | Electrical engineering |
| concepts[13].id | https://openalex.org/C33923547 |
| concepts[13].level | 0 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[13].display_name | Mathematics |
| concepts[14].id | https://openalex.org/C165801399 |
| concepts[14].level | 2 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[14].display_name | Voltage |
| concepts[15].id | https://openalex.org/C151730666 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q7205 |
| concepts[15].display_name | Paleontology |
| concepts[16].id | https://openalex.org/C120665830 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q14620 |
| concepts[16].display_name | Optics |
| concepts[17].id | https://openalex.org/C121332964 |
| concepts[17].level | 0 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q413 |
| concepts[17].display_name | Physics |
| concepts[18].id | https://openalex.org/C86803240 |
| concepts[18].level | 0 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[18].display_name | Biology |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7418344020843506 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/transformer |
| keywords[1].score | 0.7088727355003357 |
| keywords[1].display_name | Transformer |
| keywords[2].id | https://openalex.org/keywords/question-answering |
| keywords[2].score | 0.7031437158584595 |
| keywords[2].display_name | Question answering |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.5742824077606201 |
| keywords[3].display_name | Artificial intelligence |
| keywords[4].id | https://openalex.org/keywords/black-box |
| keywords[4].score | 0.5434004664421082 |
| keywords[4].display_name | Black box |
| keywords[5].id | https://openalex.org/keywords/context |
| keywords[5].score | 0.47725793719291687 |
| keywords[5].display_name | Context (archaeology) |
| keywords[6].id | https://openalex.org/keywords/domain |
| keywords[6].score | 0.44986841082572937 |
| keywords[6].display_name | Domain (mathematical analysis) |
| keywords[7].id | https://openalex.org/keywords/natural-language-processing |
| keywords[7].score | 0.43867695331573486 |
| keywords[7].display_name | Natural language processing |
| keywords[8].id | https://openalex.org/keywords/focus |
| keywords[8].score | 0.41308173537254333 |
| keywords[8].display_name | Focus (optics) |
| keywords[9].id | https://openalex.org/keywords/machine-learning |
| keywords[9].score | 0.3876357972621918 |
| keywords[9].display_name | Machine learning |
| keywords[10].id | https://openalex.org/keywords/engineering |
| keywords[10].score | 0.05969378352165222 |
| keywords[10].display_name | Engineering |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2211.05190 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2211.05190 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2211.05190 |
| locations[1].id | doi:10.48550/arxiv.2211.05190 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2211.05190 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5040030278 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Rakesh Vaideeswaran |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Vaideeswaran, Rakesh |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100416254 |
| authorships[1].author.orcid | https://orcid.org/0009-0006-1843-3180 |
| authorships[1].author.display_name | Feng Gao |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Gao, Feng |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5025799121 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-5669-9488 |
| authorships[2].author.display_name | Abhinav Mathur |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Mathur, Abhinav |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5088771920 |
| authorships[3].author.orcid | https://orcid.org/0009-0005-1010-8896 |
| authorships[3].author.display_name | Govind Thattai |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Thattai, Govind |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2211.05190 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Towards Reasoning-Aware Explainable VQA |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11714 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9998999834060669 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1707 |
| primary_topic.subfield.display_name | Computer Vision and Pattern Recognition |
| primary_topic.display_name | Multimodal Machine Learning Applications |
| related_works | https://openalex.org/W2384605597, https://openalex.org/W2387743295, https://openalex.org/W2115758952, https://openalex.org/W3082787378, https://openalex.org/W2136007095, https://openalex.org/W2366230879, https://openalex.org/W4381058564, https://openalex.org/W3003945460, https://openalex.org/W2964413124, https://openalex.org/W4288267738 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2023 |
| counts_by_year[0].cited_by_count | 2 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2211.05190 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2211.05190 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2211.05190 |
| primary_location.id | pmh:oai:arXiv.org:2211.05190 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2211.05190 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2211.05190 |
| publication_date | 2022-11-09 |
| publication_year | 2022 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 49, 53 |
| abstract_inverted_index.As | 52 |
| abstract_inverted_index.In | 82 |
| abstract_inverted_index.an | 45, 77 |
| abstract_inverted_index.as | 98, 131 |
| abstract_inverted_index.at | 44 |
| abstract_inverted_index.by | 73, 129 |
| abstract_inverted_index.in | 7, 12, 22 |
| abstract_inverted_index.is | 47, 66 |
| abstract_inverted_index.it | 75 |
| abstract_inverted_index.of | 2, 10, 28, 38, 123, 135 |
| abstract_inverted_index.on | 34, 113 |
| abstract_inverted_index.to | 143 |
| abstract_inverted_index.we | 85 |
| abstract_inverted_index.Our | 102 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.VQA | 31, 58, 71, 111 |
| abstract_inverted_index.and | 62, 95, 117, 141 |
| abstract_inverted_index.are | 127, 139 |
| abstract_inverted_index.has | 18 |
| abstract_inverted_index.our | 64 |
| abstract_inverted_index.the | 8, 23, 29, 36, 40, 57, 69, 99, 114, 124, 136, 144 |
| abstract_inverted_index.two | 87 |
| abstract_inverted_index.way | 41 |
| abstract_inverted_index.Long | 91 |
| abstract_inverted_index.SOTA | 70, 110 |
| abstract_inverted_index.VQA, | 39 |
| abstract_inverted_index.box. | 51 |
| abstract_inverted_index.lead | 142 |
| abstract_inverted_index.more | 60 |
| abstract_inverted_index.most | 27 |
| abstract_inverted_index.step | 54 |
| abstract_inverted_index.task | 59 |
| abstract_inverted_index.this | 83 |
| abstract_inverted_index.upon | 68 |
| abstract_inverted_index.with | 76 |
| abstract_inverted_index.(VQA) | 16 |
| abstract_inverted_index.60.5% | 134 |
| abstract_inverted_index.VQA-E | 118 |
| abstract_inverted_index.While | 26 |
| abstract_inverted_index.black | 50 |
| abstract_inverted_index.built | 67 |
| abstract_inverted_index.focus | 33 |
| abstract_inverted_index.joint | 3 |
| abstract_inverted_index.past. | 25 |
| abstract_inverted_index.valid | 140 |
| abstract_inverted_index.while | 108 |
| abstract_inverted_index.(LSTM) | 94 |
| abstract_inverted_index.65.16% | 122 |
| abstract_inverted_index.Memory | 93 |
| abstract_inverted_index.Visual | 13 |
| abstract_inverted_index.answer | 46 |
| abstract_inverted_index.arrive | 43 |
| abstract_inverted_index.domain | 1 |
| abstract_inverted_index.humans | 130 |
| abstract_inverted_index.making | 56 |
| abstract_inverted_index.method | 65, 103 |
| abstract_inverted_index.models | 32, 42 |
| abstract_inverted_index.paper, | 84 |
| abstract_inverted_index.recent | 24 |
| abstract_inverted_index.valid. | 132 |
| abstract_inverted_index.GQA-REX | 115 |
| abstract_inverted_index.Roughly | 133 |
| abstract_inverted_index.context | 9 |
| abstract_inverted_index.correct | 145 |
| abstract_inverted_index.models, | 17 |
| abstract_inverted_index.module. | 81 |
| abstract_inverted_index.network | 88 |
| abstract_inverted_index.textual | 106 |
| abstract_inverted_index.towards | 55 |
| abstract_inverted_index.(71.48%) | 119 |
| abstract_inverted_index.(77.49%) | 116 |
| abstract_inverted_index.Question | 14 |
| abstract_inverted_index.accuracy | 37, 112 |
| abstract_inverted_index.answers. | 146 |
| abstract_inverted_index.approved | 128 |
| abstract_inverted_index.decoder, | 97 |
| abstract_inverted_index.existing | 30 |
| abstract_inverted_index.garnered | 19 |
| abstract_inverted_index.Answering | 15 |
| abstract_inverted_index.attention | 21 |
| abstract_inverted_index.datasets. | 120 |
| abstract_inverted_index.framework | 72 |
| abstract_inverted_index.generated | 125, 137 |
| abstract_inverted_index.generates | 104 |
| abstract_inverted_index.improving | 35 |
| abstract_inverted_index.including | 90 |
| abstract_inverted_index.reasoning | 11 |
| abstract_inverted_index.Short-Term | 92 |
| abstract_inverted_index.augmenting | 74 |
| abstract_inverted_index.end-to-end | 78 |
| abstract_inverted_index.especially | 6 |
| abstract_inverted_index.generation | 80 |
| abstract_inverted_index.generator. | 101 |
| abstract_inverted_index.oftentimes | 48 |
| abstract_inverted_index.Transformer | 96 |
| abstract_inverted_index.explainable | 61 |
| abstract_inverted_index.explanation | 79, 100 |
| abstract_inverted_index.investigate | 86 |
| abstract_inverted_index.maintaining | 109 |
| abstract_inverted_index.significant | 20 |
| abstract_inverted_index.explanations | 107, 126, 138 |
| abstract_inverted_index.Approximately | 121 |
| abstract_inverted_index.architectures, | 89 |
| abstract_inverted_index.human-readable | 105 |
| abstract_inverted_index.interpretable, | 63 |
| abstract_inverted_index.understanding, | 5 |
| abstract_inverted_index.vision-language | 4 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.7400000095367432 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |