Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.03705
With the development of technology, large language models (LLMs) have dominated the downstream natural language processing (NLP) tasks. However, because of the LLMs' instruction-following abilities and inability to distinguish the instructions in the data content, such as web pages from search engines, the LLMs are vulnerable to prompt injection attacks. These attacks trick the LLMs into deviating from the original input instruction and executing the attackers' target instruction. Recently, various instruction hierarchy defense strategies are proposed to effectively defend against prompt injection attacks via fine-tuning. In this paper, we explore more vicious attacks that nullify the prompt injection defense methods, even the instruction hierarchy: backdoor-powered prompt injection attacks, where the attackers utilize the backdoor attack for prompt injection attack purposes. Specifically, the attackers poison the supervised fine-tuning samples and insert the backdoor into the model. Once the trigger is activated, the backdoored model executes the injected instruction surrounded by the trigger. We construct a benchmark for comprehensive evaluation. Our experiments demonstrate that backdoor-powered prompt injection attacks are more harmful than previous prompt injection attacks, nullifying existing prompt injection defense methods, even the instruction hierarchy techniques.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2510.03705
- https://arxiv.org/pdf/2510.03705
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416373120
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416373120Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2510.03705Digital Object Identifier
- Title
-
Backdoor-Powered Prompt Injection Attacks Nullify Defense MethodsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-10-04Full publication date if available
- Authors
-
Yulin Chen, Haoran Li, Yuan Sui, Bryan HooiList of authors in order
- Landing page
-
https://arxiv.org/abs/2510.03705Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2510.03705Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2510.03705Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416373120 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2510.03705 |
| ids.doi | https://doi.org/10.48550/arxiv.2510.03705 |
| ids.openalex | https://openalex.org/W4416373120 |
| fwci | |
| type | preprint |
| title | Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2510.03705 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2510.03705 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2510.03705 |
| locations[1].id | doi:10.48550/arxiv.2510.03705 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2510.03705 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100398890 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-5679-4055 |
| authorships[0].author.display_name | Yulin Chen |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Chen, Yulin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5086448068 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Haoran Li |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Li, Haoran |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5119204421 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Yuan Sui |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Sui, Yuan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5065675832 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-5645-1754 |
| authorships[3].author.display_name | Bryan Hooi |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Hooi, Bryan |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2510.03705 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T12:27:25.831668 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2510.03705 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2510.03705 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2510.03705 |
| primary_location.id | pmh:oai:arXiv.org:2510.03705 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2510.03705 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2510.03705 |
| publication_date | 2025-10-04 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 153 |
| abstract_inverted_index.In | 85 |
| abstract_inverted_index.We | 151 |
| abstract_inverted_index.as | 36 |
| abstract_inverted_index.by | 148 |
| abstract_inverted_index.in | 31 |
| abstract_inverted_index.is | 138 |
| abstract_inverted_index.of | 3, 20 |
| abstract_inverted_index.to | 27, 46, 76 |
| abstract_inverted_index.we | 88 |
| abstract_inverted_index.Our | 158 |
| abstract_inverted_index.and | 25, 62, 128 |
| abstract_inverted_index.are | 44, 74, 166 |
| abstract_inverted_index.for | 115, 155 |
| abstract_inverted_index.the | 1, 11, 21, 29, 32, 42, 53, 58, 64, 95, 101, 109, 112, 121, 124, 130, 133, 136, 140, 144, 149, 181 |
| abstract_inverted_index.via | 83 |
| abstract_inverted_index.web | 37 |
| abstract_inverted_index.LLMs | 43, 54 |
| abstract_inverted_index.Once | 135 |
| abstract_inverted_index.With | 0 |
| abstract_inverted_index.data | 33 |
| abstract_inverted_index.even | 100, 180 |
| abstract_inverted_index.from | 39, 57 |
| abstract_inverted_index.have | 9 |
| abstract_inverted_index.into | 55, 132 |
| abstract_inverted_index.more | 90, 167 |
| abstract_inverted_index.such | 35 |
| abstract_inverted_index.than | 169 |
| abstract_inverted_index.that | 93, 161 |
| abstract_inverted_index.this | 86 |
| abstract_inverted_index.(NLP) | 16 |
| abstract_inverted_index.LLMs' | 22 |
| abstract_inverted_index.These | 50 |
| abstract_inverted_index.input | 60 |
| abstract_inverted_index.large | 5 |
| abstract_inverted_index.model | 142 |
| abstract_inverted_index.pages | 38 |
| abstract_inverted_index.trick | 52 |
| abstract_inverted_index.where | 108 |
| abstract_inverted_index.(LLMs) | 8 |
| abstract_inverted_index.attack | 114, 118 |
| abstract_inverted_index.defend | 78 |
| abstract_inverted_index.insert | 129 |
| abstract_inverted_index.model. | 134 |
| abstract_inverted_index.models | 7 |
| abstract_inverted_index.paper, | 87 |
| abstract_inverted_index.poison | 123 |
| abstract_inverted_index.prompt | 47, 80, 96, 105, 116, 163, 171, 176 |
| abstract_inverted_index.search | 40 |
| abstract_inverted_index.target | 66 |
| abstract_inverted_index.tasks. | 17 |
| abstract_inverted_index.against | 79 |
| abstract_inverted_index.attacks | 51, 82, 92, 165 |
| abstract_inverted_index.because | 19 |
| abstract_inverted_index.defense | 72, 98, 178 |
| abstract_inverted_index.explore | 89 |
| abstract_inverted_index.harmful | 168 |
| abstract_inverted_index.natural | 13 |
| abstract_inverted_index.nullify | 94 |
| abstract_inverted_index.samples | 127 |
| abstract_inverted_index.trigger | 137 |
| abstract_inverted_index.utilize | 111 |
| abstract_inverted_index.various | 69 |
| abstract_inverted_index.vicious | 91 |
| abstract_inverted_index.However, | 18 |
| abstract_inverted_index.attacks, | 107, 173 |
| abstract_inverted_index.attacks. | 49 |
| abstract_inverted_index.backdoor | 113, 131 |
| abstract_inverted_index.content, | 34 |
| abstract_inverted_index.engines, | 41 |
| abstract_inverted_index.executes | 143 |
| abstract_inverted_index.existing | 175 |
| abstract_inverted_index.injected | 145 |
| abstract_inverted_index.language | 6, 14 |
| abstract_inverted_index.methods, | 99, 179 |
| abstract_inverted_index.original | 59 |
| abstract_inverted_index.previous | 170 |
| abstract_inverted_index.proposed | 75 |
| abstract_inverted_index.trigger. | 150 |
| abstract_inverted_index.Recently, | 68 |
| abstract_inverted_index.abilities | 24 |
| abstract_inverted_index.attackers | 110, 122 |
| abstract_inverted_index.benchmark | 154 |
| abstract_inverted_index.construct | 152 |
| abstract_inverted_index.deviating | 56 |
| abstract_inverted_index.dominated | 10 |
| abstract_inverted_index.executing | 63 |
| abstract_inverted_index.hierarchy | 71, 183 |
| abstract_inverted_index.inability | 26 |
| abstract_inverted_index.injection | 48, 81, 97, 106, 117, 164, 172, 177 |
| abstract_inverted_index.purposes. | 119 |
| abstract_inverted_index.activated, | 139 |
| abstract_inverted_index.attackers' | 65 |
| abstract_inverted_index.backdoored | 141 |
| abstract_inverted_index.downstream | 12 |
| abstract_inverted_index.hierarchy: | 103 |
| abstract_inverted_index.nullifying | 174 |
| abstract_inverted_index.processing | 15 |
| abstract_inverted_index.strategies | 73 |
| abstract_inverted_index.supervised | 125 |
| abstract_inverted_index.surrounded | 147 |
| abstract_inverted_index.vulnerable | 45 |
| abstract_inverted_index.demonstrate | 160 |
| abstract_inverted_index.development | 2 |
| abstract_inverted_index.distinguish | 28 |
| abstract_inverted_index.effectively | 77 |
| abstract_inverted_index.evaluation. | 157 |
| abstract_inverted_index.experiments | 159 |
| abstract_inverted_index.fine-tuning | 126 |
| abstract_inverted_index.instruction | 61, 70, 102, 146, 182 |
| abstract_inverted_index.techniques. | 184 |
| abstract_inverted_index.technology, | 4 |
| abstract_inverted_index.fine-tuning. | 84 |
| abstract_inverted_index.instruction. | 67 |
| abstract_inverted_index.instructions | 30 |
| abstract_inverted_index.Specifically, | 120 |
| abstract_inverted_index.comprehensive | 156 |
| abstract_inverted_index.backdoor-powered | 104, 162 |
| abstract_inverted_index.instruction-following | 23 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |