TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2504.04099
Large Vision-Language Models have demonstrated remarkable performance across various tasks; however, the challenge of hallucinations constrains their practical applications. The hallucination problem arises from multiple factors, including the inherent hallucinations in language models, the limitations of visual encoders in perception, and biases introduced by multimodal data. Extensive research has explored ways to mitigate hallucinations. For instance, OPERA prevents the model from overly focusing on "anchor tokens", thereby reducing hallucinations, whereas VCD mitigates hallucinations by employing a contrastive decoding approach. In this paper, we investigate the correlation between the decay of attention to image tokens and the occurrence of hallucinations. Based on this finding, we propose Temporal Attention Real-time Accumulative Connection (TARAC), a novel training-free method that dynamically accumulates and updates LVLMs' attention on image tokens during generation. By enhancing the model's attention to image tokens, TARAC mitigates hallucinations caused by the decay of attention on image tokens. We validate the effectiveness of TARAC across multiple models and datasets, demonstrating that our approach substantially mitigates hallucinations. In particular, TARAC reduces $C_S$ by 25.2 and $C_I$ by 8.7 compared to VCD on the CHAIR benchmark.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2504.04099
- https://arxiv.org/pdf/2504.04099
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416125230
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416125230Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2504.04099Digital Object Identifier
- Title
-
TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative ConnectionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-05Full publication date if available
- Authors
-
C. Xie, Tongxuan Liu, Yuting Zeng, J.K. Guo, Yunheng Shen, Weizhe Huang, Jing‐Feng LiList of authors in order
- Landing page
-
https://arxiv.org/abs/2504.04099Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2504.04099Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2504.04099Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416125230 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2504.04099 |
| ids.doi | https://doi.org/10.48550/arxiv.2504.04099 |
| ids.openalex | https://openalex.org/W4416125230 |
| fwci | |
| type | preprint |
| title | TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2504.04099 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2504.04099 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2504.04099 |
| locations[1].id | doi:10.48550/arxiv.2504.04099 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2504.04099 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5111196686 |
| authorships[0].author.orcid | https://orcid.org/0009-0004-9111-2127 |
| authorships[0].author.display_name | C. Xie |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Xie, Chunzhao |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5072530962 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Tongxuan Liu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Liu, Tongxuan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5110793492 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Yuting Zeng |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Zeng, Yuting |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5010250334 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5515-2226 |
| authorships[3].author.display_name | J.K. Guo |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Guo, jinrong |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5009179731 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-1961-0274 |
| authorships[4].author.display_name | Yunheng Shen |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Shen, Yunheng |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5101832257 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-0844-9350 |
| authorships[5].author.display_name | Weizhe Huang |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Huang, Weizhe |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5100747643 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-0185-0512 |
| authorships[6].author.display_name | Jing‐Feng Li |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Li, Jing |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2504.04099 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T05:45:59.854083 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2504.04099 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2504.04099 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2504.04099 |
| primary_location.id | pmh:oai:arXiv.org:2504.04099 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2504.04099 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2504.04099 |
| publication_date | 2025-04-05 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 75, 111 |
| abstract_inverted_index.By | 127 |
| abstract_inverted_index.In | 79, 165 |
| abstract_inverted_index.We | 147 |
| abstract_inverted_index.by | 43, 73, 139, 170, 174 |
| abstract_inverted_index.in | 30, 38 |
| abstract_inverted_index.of | 13, 35, 89, 97, 142, 151 |
| abstract_inverted_index.on | 63, 100, 122, 144, 179 |
| abstract_inverted_index.to | 51, 91, 132, 177 |
| abstract_inverted_index.we | 82, 103 |
| abstract_inverted_index.8.7 | 175 |
| abstract_inverted_index.For | 54 |
| abstract_inverted_index.The | 19 |
| abstract_inverted_index.VCD | 70, 178 |
| abstract_inverted_index.and | 40, 94, 118, 156, 172 |
| abstract_inverted_index.has | 48 |
| abstract_inverted_index.our | 160 |
| abstract_inverted_index.the | 11, 27, 33, 58, 84, 87, 95, 129, 140, 149, 180 |
| abstract_inverted_index.25.2 | 171 |
| abstract_inverted_index.from | 23, 60 |
| abstract_inverted_index.have | 3 |
| abstract_inverted_index.that | 115, 159 |
| abstract_inverted_index.this | 80, 101 |
| abstract_inverted_index.ways | 50 |
| abstract_inverted_index.$C_I$ | 173 |
| abstract_inverted_index.$C_S$ | 169 |
| abstract_inverted_index.Based | 99 |
| abstract_inverted_index.CHAIR | 181 |
| abstract_inverted_index.Large | 0 |
| abstract_inverted_index.OPERA | 56 |
| abstract_inverted_index.TARAC | 135, 152, 167 |
| abstract_inverted_index.data. | 45 |
| abstract_inverted_index.decay | 88, 141 |
| abstract_inverted_index.image | 92, 123, 133, 145 |
| abstract_inverted_index.model | 59 |
| abstract_inverted_index.novel | 112 |
| abstract_inverted_index.their | 16 |
| abstract_inverted_index.LVLMs' | 120 |
| abstract_inverted_index.Models | 2 |
| abstract_inverted_index.across | 7, 153 |
| abstract_inverted_index.arises | 22 |
| abstract_inverted_index.biases | 41 |
| abstract_inverted_index.caused | 138 |
| abstract_inverted_index.during | 125 |
| abstract_inverted_index.method | 114 |
| abstract_inverted_index.models | 155 |
| abstract_inverted_index.overly | 61 |
| abstract_inverted_index.paper, | 81 |
| abstract_inverted_index.tasks; | 9 |
| abstract_inverted_index.tokens | 93, 124 |
| abstract_inverted_index.visual | 36 |
| abstract_inverted_index."anchor | 64 |
| abstract_inverted_index.between | 86 |
| abstract_inverted_index.model's | 130 |
| abstract_inverted_index.models, | 32 |
| abstract_inverted_index.problem | 21 |
| abstract_inverted_index.propose | 104 |
| abstract_inverted_index.reduces | 168 |
| abstract_inverted_index.thereby | 66 |
| abstract_inverted_index.tokens, | 134 |
| abstract_inverted_index.tokens. | 146 |
| abstract_inverted_index.updates | 119 |
| abstract_inverted_index.various | 8 |
| abstract_inverted_index.whereas | 69 |
| abstract_inverted_index.(TARAC), | 110 |
| abstract_inverted_index.Temporal | 105 |
| abstract_inverted_index.approach | 161 |
| abstract_inverted_index.compared | 176 |
| abstract_inverted_index.decoding | 77 |
| abstract_inverted_index.encoders | 37 |
| abstract_inverted_index.explored | 49 |
| abstract_inverted_index.factors, | 25 |
| abstract_inverted_index.finding, | 102 |
| abstract_inverted_index.focusing | 62 |
| abstract_inverted_index.however, | 10 |
| abstract_inverted_index.inherent | 28 |
| abstract_inverted_index.language | 31 |
| abstract_inverted_index.mitigate | 52 |
| abstract_inverted_index.multiple | 24, 154 |
| abstract_inverted_index.prevents | 57 |
| abstract_inverted_index.reducing | 67 |
| abstract_inverted_index.research | 47 |
| abstract_inverted_index.tokens", | 65 |
| abstract_inverted_index.validate | 148 |
| abstract_inverted_index.Attention | 106 |
| abstract_inverted_index.Extensive | 46 |
| abstract_inverted_index.Real-time | 107 |
| abstract_inverted_index.approach. | 78 |
| abstract_inverted_index.attention | 90, 121, 131, 143 |
| abstract_inverted_index.challenge | 12 |
| abstract_inverted_index.datasets, | 157 |
| abstract_inverted_index.employing | 74 |
| abstract_inverted_index.enhancing | 128 |
| abstract_inverted_index.including | 26 |
| abstract_inverted_index.instance, | 55 |
| abstract_inverted_index.mitigates | 71, 136, 163 |
| abstract_inverted_index.practical | 17 |
| abstract_inverted_index.Connection | 109 |
| abstract_inverted_index.benchmark. | 182 |
| abstract_inverted_index.constrains | 15 |
| abstract_inverted_index.introduced | 42 |
| abstract_inverted_index.multimodal | 44 |
| abstract_inverted_index.occurrence | 96 |
| abstract_inverted_index.remarkable | 5 |
| abstract_inverted_index.accumulates | 117 |
| abstract_inverted_index.contrastive | 76 |
| abstract_inverted_index.correlation | 85 |
| abstract_inverted_index.dynamically | 116 |
| abstract_inverted_index.generation. | 126 |
| abstract_inverted_index.investigate | 83 |
| abstract_inverted_index.limitations | 34 |
| abstract_inverted_index.particular, | 166 |
| abstract_inverted_index.perception, | 39 |
| abstract_inverted_index.performance | 6 |
| abstract_inverted_index.Accumulative | 108 |
| abstract_inverted_index.demonstrated | 4 |
| abstract_inverted_index.applications. | 18 |
| abstract_inverted_index.demonstrating | 158 |
| abstract_inverted_index.effectiveness | 150 |
| abstract_inverted_index.hallucination | 20 |
| abstract_inverted_index.substantially | 162 |
| abstract_inverted_index.training-free | 113 |
| abstract_inverted_index.hallucinations | 14, 29, 72, 137 |
| abstract_inverted_index.Vision-Language | 1 |
| abstract_inverted_index.hallucinations, | 68 |
| abstract_inverted_index.hallucinations. | 53, 98, 164 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |