Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods Article Swipe

PDF

Yulin Chen , Haoran Li , Yuan Sui , Bryan Hooi ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2510.03705

With the development of technology, large language models (LLMs) have dominated the downstream natural language processing (NLP) tasks. However, because of the LLMs' instruction-following abilities and inability to distinguish the instructions in the data content, such as web pages from search engines, the LLMs are vulnerable to prompt injection attacks. These attacks trick the LLMs into deviating from the original input instruction and executing the attackers' target instruction. Recently, various instruction hierarchy defense strategies are proposed to effectively defend against prompt injection attacks via fine-tuning. In this paper, we explore more vicious attacks that nullify the prompt injection defense methods, even the instruction hierarchy: backdoor-powered prompt injection attacks, where the attackers utilize the backdoor attack for prompt injection attack purposes. Specifically, the attackers poison the supervised fine-tuning samples and insert the backdoor into the model. Once the trigger is activated, the backdoored model executes the injected instruction surrounded by the trigger. We construct a benchmark for comprehensive evaluation. Our experiments demonstrate that backdoor-powered prompt injection attacks are more harmful than previous prompt injection attacks, nullifying existing prompt injection defense methods, even the instruction hierarchy techniques.

Related Topics

Concepts

No concepts available.

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2510.03705
PDF: https://arxiv.org/pdf/2510.03705
OA Status: green
OpenAlex ID: https://openalex.org/W4416373120

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4416373120

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2510.03705

Digital Object Identifier
Title: Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-10-04

Full publication date if available
Authors: Yulin Chen, Haoran Li, Yuan Sui, Bryan Hooi

List of authors in order
Landing page: https://arxiv.org/abs/2510.03705

Publisher landing page
PDF URL: https://arxiv.org/pdf/2510.03705

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2510.03705

Direct OA link when available
Cited by: 0

Total citation count in OpenAlex

Full payload

id	https://openalex.org/W4416373120
doi	https://doi.org/10.48550/arxiv.2510.03705
ids.doi	https://doi.org/10.48550/arxiv.2510.03705
ids.openalex	https://openalex.org/W4416373120
fwci
type	preprint
title	Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
is_xpac	False
apc_list
apc_paid
language	en
locations[0].id	pmh:oai:arXiv.org:2510.03705
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2510.03705
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2510.03705
locations[1].id	doi:10.48550/arxiv.2510.03705
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2510.03705
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5100398890
authorships[0].author.orcid	https://orcid.org/0000-0001-5679-4055
authorships[0].author.display_name	Yulin Chen
authorships[0].author_position	first
authorships[0].raw_author_name	Chen, Yulin
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5086448068
authorships[1].author.orcid
authorships[1].author.display_name	Haoran Li
authorships[1].author_position	middle
authorships[1].raw_author_name	Li, Haoran
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5119204421
authorships[2].author.orcid
authorships[2].author.display_name	Yuan Sui
authorships[2].author_position	middle
authorships[2].raw_author_name	Sui, Yuan
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5065675832
authorships[3].author.orcid	https://orcid.org/0000-0002-5645-1754
authorships[3].author.display_name	Bryan Hooi
authorships[3].author_position	middle
authorships[3].raw_author_name	Hooi, Bryan
authorships[3].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2510.03705
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Backdoor-Powered Prompt Injection Attacks Nullify Defense Methods
has_fulltext	False
is_retracted	False
updated_date	2025-11-28T12:27:25.831668
primary_topic
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2510.03705
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2510.03705
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2510.03705
primary_location.id	pmh:oai:arXiv.org:2510.03705
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2510.03705
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2510.03705
publication_date	2025-10-04
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	153
abstract_inverted_index.In	85
abstract_inverted_index.We	151
abstract_inverted_index.as	36
abstract_inverted_index.by	148
abstract_inverted_index.in	31
abstract_inverted_index.is	138
abstract_inverted_index.of	3, 20
abstract_inverted_index.to	27, 46, 76
abstract_inverted_index.we	88
abstract_inverted_index.Our	158
abstract_inverted_index.and	25, 62, 128
abstract_inverted_index.are	44, 74, 166
abstract_inverted_index.for	115, 155
abstract_inverted_index.the	1, 11, 21, 29, 32, 42, 53, 58, 64, 95, 101, 109, 112, 121, 124, 130, 133, 136, 140, 144, 149, 181
abstract_inverted_index.via	83
abstract_inverted_index.web	37
abstract_inverted_index.LLMs	43, 54
abstract_inverted_index.Once	135
abstract_inverted_index.With	0
abstract_inverted_index.data	33
abstract_inverted_index.even	100, 180
abstract_inverted_index.from	39, 57
abstract_inverted_index.have	9
abstract_inverted_index.into	55, 132
abstract_inverted_index.more	90, 167
abstract_inverted_index.such	35
abstract_inverted_index.than	169
abstract_inverted_index.that	93, 161
abstract_inverted_index.this	86
abstract_inverted_index.(NLP)	16
abstract_inverted_index.LLMs'	22
abstract_inverted_index.These	50
abstract_inverted_index.input	60
abstract_inverted_index.large	5
abstract_inverted_index.model	142
abstract_inverted_index.pages	38
abstract_inverted_index.trick	52
abstract_inverted_index.where	108
abstract_inverted_index.(LLMs)	8
abstract_inverted_index.attack	114, 118
abstract_inverted_index.defend	78
abstract_inverted_index.insert	129
abstract_inverted_index.model.	134
abstract_inverted_index.models	7
abstract_inverted_index.paper,	87
abstract_inverted_index.poison	123
abstract_inverted_index.prompt	47, 80, 96, 105, 116, 163, 171, 176
abstract_inverted_index.search	40
abstract_inverted_index.target	66
abstract_inverted_index.tasks.	17
abstract_inverted_index.against	79
abstract_inverted_index.attacks	51, 82, 92, 165
abstract_inverted_index.because	19
abstract_inverted_index.defense	72, 98, 178
abstract_inverted_index.explore	89
abstract_inverted_index.harmful	168
abstract_inverted_index.natural	13
abstract_inverted_index.nullify	94
abstract_inverted_index.samples	127
abstract_inverted_index.trigger	137
abstract_inverted_index.utilize	111
abstract_inverted_index.various	69
abstract_inverted_index.vicious	91
abstract_inverted_index.However,	18
abstract_inverted_index.attacks,	107, 173
abstract_inverted_index.attacks.	49
abstract_inverted_index.backdoor	113, 131
abstract_inverted_index.content,	34
abstract_inverted_index.engines,	41
abstract_inverted_index.executes	143
abstract_inverted_index.existing	175
abstract_inverted_index.injected	145
abstract_inverted_index.language	6, 14
abstract_inverted_index.methods,	99, 179
abstract_inverted_index.original	59
abstract_inverted_index.previous	170
abstract_inverted_index.proposed	75
abstract_inverted_index.trigger.	150
abstract_inverted_index.Recently,	68
abstract_inverted_index.abilities	24
abstract_inverted_index.attackers	110, 122
abstract_inverted_index.benchmark	154
abstract_inverted_index.construct	152
abstract_inverted_index.deviating	56
abstract_inverted_index.dominated	10
abstract_inverted_index.executing	63
abstract_inverted_index.hierarchy	71, 183
abstract_inverted_index.inability	26
abstract_inverted_index.injection	48, 81, 97, 106, 117, 164, 172, 177
abstract_inverted_index.purposes.	119
abstract_inverted_index.activated,	139
abstract_inverted_index.attackers'	65
abstract_inverted_index.backdoored	141
abstract_inverted_index.downstream	12
abstract_inverted_index.hierarchy:	103
abstract_inverted_index.nullifying	174
abstract_inverted_index.processing	15
abstract_inverted_index.strategies	73
abstract_inverted_index.supervised	125
abstract_inverted_index.surrounded	147
abstract_inverted_index.vulnerable	45
abstract_inverted_index.demonstrate	160
abstract_inverted_index.development	2
abstract_inverted_index.distinguish	28
abstract_inverted_index.effectively	77
abstract_inverted_index.evaluation.	157
abstract_inverted_index.experiments	159
abstract_inverted_index.fine-tuning	126
abstract_inverted_index.instruction	61, 70, 102, 146, 182
abstract_inverted_index.techniques.	184
abstract_inverted_index.technology,	4
abstract_inverted_index.fine-tuning.	84
abstract_inverted_index.instruction.	67
abstract_inverted_index.instructions	30
abstract_inverted_index.Specifically,	120
abstract_inverted_index.comprehensive	156
abstract_inverted_index.backdoor-powered	104, 162
abstract_inverted_index.instruction-following	23
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	4
citation_normalized_percentile