Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.19672
Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding. However, while MLLMs are vulnerable to adversarial attacks, the transferability of these attacks across different models remains limited, especially under targeted attack setting. Existing methods primarily focus on vision-specific perturbations but struggle with the complex nature of vision-language modality alignment. In this work, we introduce the Dynamic Vision-Language Alignment (DynVLA) Attack, a novel approach that injects dynamic perturbations into the vision-language connector to enhance generalization across diverse vision-language alignment of different models. Our experimental results show that DynVLA significantly improves the transferability of adversarial examples across various MLLMs, including BLIP2, InstructBLIP, MiniGPT4, LLaVA, and closed-source models such as Gemini.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.19672
- https://arxiv.org/pdf/2502.19672
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416149692
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416149692Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.19672Digital Object Identifier
- Title
-
Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment AttackWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-27Full publication date if available
- Authors
-
Chen Gu, Jindong Gu, Andong Hua, Yao QinList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.19672Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.19672Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.19672Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416149692 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.19672 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.19672 |
| ids.openalex | https://openalex.org/W4416149692 |
| fwci | |
| type | preprint |
| title | Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.19672 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.19672 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.19672 |
| locations[1].id | doi:10.48550/arxiv.2502.19672 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.19672 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5064570991 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-0003-8992 |
| authorships[0].author.display_name | Chen Gu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Gu, Chenhe |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5055994909 |
| authorships[1].author.orcid | https://orcid.org/0009-0000-0574-0129 |
| authorships[1].author.display_name | Jindong Gu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Gu, Jindong |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5108885369 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Andong Hua |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Hua, Andong |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101601675 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-1304-9933 |
| authorships[3].author.display_name | Yao Qin |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Qin, Yao |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.19672 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-12-01T00:03:43.161839 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.19672 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.19672 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.19672 |
| primary_location.id | pmh:oai:arXiv.org:2502.19672 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.19672 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.19672 |
| publication_date | 2025-02-27 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 71 |
| abstract_inverted_index.In | 60 |
| abstract_inverted_index.as | 117 |
| abstract_inverted_index.in | 15 |
| abstract_inverted_index.of | 30, 56, 89, 102 |
| abstract_inverted_index.on | 47 |
| abstract_inverted_index.to | 25, 82 |
| abstract_inverted_index.we | 63 |
| abstract_inverted_index.Our | 92 |
| abstract_inverted_index.and | 18, 113 |
| abstract_inverted_index.are | 23 |
| abstract_inverted_index.but | 50 |
| abstract_inverted_index.for | 12 |
| abstract_inverted_index.the | 28, 53, 65, 79, 100 |
| abstract_inverted_index.have | 8 |
| abstract_inverted_index.into | 78 |
| abstract_inverted_index.show | 95 |
| abstract_inverted_index.such | 116 |
| abstract_inverted_index.that | 74, 96 |
| abstract_inverted_index.this | 61 |
| abstract_inverted_index.upon | 6 |
| abstract_inverted_index.with | 52 |
| abstract_inverted_index.LLMs, | 7 |
| abstract_inverted_index.Large | 1 |
| abstract_inverted_index.MLLMs | 22 |
| abstract_inverted_index.built | 5 |
| abstract_inverted_index.focus | 46 |
| abstract_inverted_index.image | 16 |
| abstract_inverted_index.novel | 72 |
| abstract_inverted_index.their | 13 |
| abstract_inverted_index.these | 31 |
| abstract_inverted_index.under | 39 |
| abstract_inverted_index.while | 21 |
| abstract_inverted_index.work, | 62 |
| abstract_inverted_index.BLIP2, | 109 |
| abstract_inverted_index.DynVLA | 97 |
| abstract_inverted_index.LLaVA, | 112 |
| abstract_inverted_index.MLLMs, | 107 |
| abstract_inverted_index.Models | 3 |
| abstract_inverted_index.across | 33, 85, 105 |
| abstract_inverted_index.attack | 41 |
| abstract_inverted_index.gained | 10 |
| abstract_inverted_index.models | 35, 115 |
| abstract_inverted_index.nature | 55 |
| abstract_inverted_index.Attack, | 70 |
| abstract_inverted_index.Dynamic | 66 |
| abstract_inverted_index.Gemini. | 118 |
| abstract_inverted_index.attacks | 32 |
| abstract_inverted_index.complex | 54 |
| abstract_inverted_index.diverse | 86 |
| abstract_inverted_index.dynamic | 76 |
| abstract_inverted_index.enhance | 83 |
| abstract_inverted_index.injects | 75 |
| abstract_inverted_index.methods | 44 |
| abstract_inverted_index.models. | 91 |
| abstract_inverted_index.remains | 36 |
| abstract_inverted_index.results | 94 |
| abstract_inverted_index.various | 106 |
| abstract_inverted_index.(DynVLA) | 69 |
| abstract_inverted_index.(MLLMs), | 4 |
| abstract_inverted_index.Existing | 43 |
| abstract_inverted_index.However, | 20 |
| abstract_inverted_index.Language | 2 |
| abstract_inverted_index.approach | 73 |
| abstract_inverted_index.attacks, | 27 |
| abstract_inverted_index.examples | 104 |
| abstract_inverted_index.improves | 99 |
| abstract_inverted_index.limited, | 37 |
| abstract_inverted_index.modality | 58 |
| abstract_inverted_index.recently | 9 |
| abstract_inverted_index.setting. | 42 |
| abstract_inverted_index.struggle | 51 |
| abstract_inverted_index.targeted | 40 |
| abstract_inverted_index.Alignment | 68 |
| abstract_inverted_index.MiniGPT4, | 111 |
| abstract_inverted_index.alignment | 88 |
| abstract_inverted_index.attention | 11 |
| abstract_inverted_index.connector | 81 |
| abstract_inverted_index.different | 34, 90 |
| abstract_inverted_index.including | 108 |
| abstract_inverted_index.introduce | 64 |
| abstract_inverted_index.primarily | 45 |
| abstract_inverted_index.Multimodal | 0 |
| abstract_inverted_index.alignment. | 59 |
| abstract_inverted_index.especially | 38 |
| abstract_inverted_index.vulnerable | 24 |
| abstract_inverted_index.adversarial | 26, 103 |
| abstract_inverted_index.recognition | 17 |
| abstract_inverted_index.capabilities | 14 |
| abstract_inverted_index.experimental | 93 |
| abstract_inverted_index.InstructBLIP, | 110 |
| abstract_inverted_index.closed-source | 114 |
| abstract_inverted_index.perturbations | 49, 77 |
| abstract_inverted_index.significantly | 98 |
| abstract_inverted_index.generalization | 84 |
| abstract_inverted_index.understanding. | 19 |
| abstract_inverted_index.Vision-Language | 67 |
| abstract_inverted_index.transferability | 29, 101 |
| abstract_inverted_index.vision-language | 57, 80, 87 |
| abstract_inverted_index.vision-specific | 48 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |