Buster: Implanting Semantic Backdoor into Text Encoder to Mitigate NSFW Content Generation Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2412.07249
The rise of deep learning models in the digital era has raised substantial concerns regarding the generation of Not-Safe-for-Work (NSFW) content. Existing defense methods primarily involve model fine-tuning and post-hoc content moderation. Nevertheless, these approaches largely lack scalability in eliminating harmful content, degrade the quality of benign image generation, or incur high inference costs. To address these challenges, we propose an innovative framework named \textit{Buster}, which injects backdoors into the text encoder to prevent NSFW content generation. Buster leverages deep semantic information rather than explicit prompts as triggers, redirecting NSFW prompts towards targeted benign prompts. Additionally, Buster employs energy-based training data generation through Langevin dynamics for adversarial knowledge augmentation, thereby ensuring robustness in harmful concept definition. This approach demonstrates exceptional resilience and scalability in mitigating NSFW content. Particularly, Buster fine-tunes the text encoder of Text-to-Image models within merely five minutes, showcasing its efficiency. Our extensive experiments denote that Buster outperforms nine state-of-the-art baselines, achieving a superior NSFW content removal rate of at least 91.2\% while preserving the quality of harmless images.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2412.07249
- https://arxiv.org/pdf/2412.07249
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4405255244
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4405255244Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2412.07249Digital Object Identifier
- Title
-
Buster: Implanting Semantic Backdoor into Text Encoder to Mitigate NSFW Content GenerationWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-12-10Full publication date if available
- Authors
-
Xin Zhao, Xiaojun Chen, Yuexin Xuan, Zhendong ZhaoList of authors in order
- Landing page
-
https://arxiv.org/abs/2412.07249Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2412.07249Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2412.07249Direct OA link when available
- Concepts
-
Backdoor, Content (measure theory), Encoder, Computer science, Computer security, Artificial intelligence, Natural language processing, Arithmetic, Mathematics, Operating system, Mathematical analysisTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4405255244 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2412.07249 |
| ids.doi | https://doi.org/10.48550/arxiv.2412.07249 |
| ids.openalex | https://openalex.org/W4405255244 |
| fwci | |
| type | preprint |
| title | Buster: Implanting Semantic Backdoor into Text Encoder to Mitigate NSFW Content Generation |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11241 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9973000288009644 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1711 |
| topics[0].subfield.display_name | Signal Processing |
| topics[0].display_name | Advanced Malware Detection Techniques |
| topics[1].id | https://openalex.org/T12479 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9945999979972839 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1710 |
| topics[1].subfield.display_name | Information Systems |
| topics[1].display_name | Web Application Security Vulnerabilities |
| topics[2].id | https://openalex.org/T11424 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9890000224113464 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Security and Verification in Computing |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2781045450 |
| concepts[0].level | 2 |
| concepts[0].score | 0.99288010597229 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q254569 |
| concepts[0].display_name | Backdoor |
| concepts[1].id | https://openalex.org/C2778152352 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6754050850868225 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q5165061 |
| concepts[1].display_name | Content (measure theory) |
| concepts[2].id | https://openalex.org/C118505674 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6455315351486206 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q42586063 |
| concepts[2].display_name | Encoder |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.5870251059532166 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C38652104 |
| concepts[4].level | 1 |
| concepts[4].score | 0.36948859691619873 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[4].display_name | Computer security |
| concepts[5].id | https://openalex.org/C154945302 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3599914312362671 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[5].display_name | Artificial intelligence |
| concepts[6].id | https://openalex.org/C204321447 |
| concepts[6].level | 1 |
| concepts[6].score | 0.3441822826862335 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[6].display_name | Natural language processing |
| concepts[7].id | https://openalex.org/C94375191 |
| concepts[7].level | 1 |
| concepts[7].score | 0.32584017515182495 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q11205 |
| concepts[7].display_name | Arithmetic |
| concepts[8].id | https://openalex.org/C33923547 |
| concepts[8].level | 0 |
| concepts[8].score | 0.1513238251209259 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[8].display_name | Mathematics |
| concepts[9].id | https://openalex.org/C111919701 |
| concepts[9].level | 1 |
| concepts[9].score | 0.11994192004203796 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q9135 |
| concepts[9].display_name | Operating system |
| concepts[10].id | https://openalex.org/C134306372 |
| concepts[10].level | 1 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[10].display_name | Mathematical analysis |
| keywords[0].id | https://openalex.org/keywords/backdoor |
| keywords[0].score | 0.99288010597229 |
| keywords[0].display_name | Backdoor |
| keywords[1].id | https://openalex.org/keywords/content |
| keywords[1].score | 0.6754050850868225 |
| keywords[1].display_name | Content (measure theory) |
| keywords[2].id | https://openalex.org/keywords/encoder |
| keywords[2].score | 0.6455315351486206 |
| keywords[2].display_name | Encoder |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.5870251059532166 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/computer-security |
| keywords[4].score | 0.36948859691619873 |
| keywords[4].display_name | Computer security |
| keywords[5].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[5].score | 0.3599914312362671 |
| keywords[5].display_name | Artificial intelligence |
| keywords[6].id | https://openalex.org/keywords/natural-language-processing |
| keywords[6].score | 0.3441822826862335 |
| keywords[6].display_name | Natural language processing |
| keywords[7].id | https://openalex.org/keywords/arithmetic |
| keywords[7].score | 0.32584017515182495 |
| keywords[7].display_name | Arithmetic |
| keywords[8].id | https://openalex.org/keywords/mathematics |
| keywords[8].score | 0.1513238251209259 |
| keywords[8].display_name | Mathematics |
| keywords[9].id | https://openalex.org/keywords/operating-system |
| keywords[9].score | 0.11994192004203796 |
| keywords[9].display_name | Operating system |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2412.07249 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2412.07249 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2412.07249 |
| locations[1].id | doi:10.48550/arxiv.2412.07249 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2412.07249 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5039091321 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8455-9757 |
| authorships[0].author.display_name | Xin Zhao |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhao, Xin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101407251 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-0362-847X |
| authorships[1].author.display_name | Xiaojun Chen |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Chen, Xiaojun |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5100572309 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Yuexin Xuan |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Xuan, Yuexin |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5102965387 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Zhendong Zhao |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Zhao, Zhendong |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2412.07249 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-12-12T00:00:00 |
| display_name | Buster: Implanting Semantic Backdoor into Text Encoder to Mitigate NSFW Content Generation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11241 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9973000288009644 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1711 |
| primary_topic.subfield.display_name | Signal Processing |
| primary_topic.display_name | Advanced Malware Detection Techniques |
| related_works | https://openalex.org/W4320031223, https://openalex.org/W4200629851, https://openalex.org/W4281902577, https://openalex.org/W4309417370, https://openalex.org/W4292107232, https://openalex.org/W3009072493, https://openalex.org/W4386080799, https://openalex.org/W3140988292, https://openalex.org/W4317672133, https://openalex.org/W4386185023 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2412.07249 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2412.07249 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2412.07249 |
| primary_location.id | pmh:oai:arXiv.org:2412.07249 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2412.07249 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2412.07249 |
| publication_date | 2024-12-10 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 154 |
| abstract_inverted_index.To | 54 |
| abstract_inverted_index.an | 60 |
| abstract_inverted_index.as | 86 |
| abstract_inverted_index.at | 161 |
| abstract_inverted_index.in | 6, 38, 112, 123 |
| abstract_inverted_index.of | 2, 17, 45, 133, 160, 168 |
| abstract_inverted_index.or | 49 |
| abstract_inverted_index.to | 72 |
| abstract_inverted_index.we | 58 |
| abstract_inverted_index.Our | 143 |
| abstract_inverted_index.The | 0 |
| abstract_inverted_index.and | 28, 121 |
| abstract_inverted_index.era | 9 |
| abstract_inverted_index.for | 105 |
| abstract_inverted_index.has | 10 |
| abstract_inverted_index.its | 141 |
| abstract_inverted_index.the | 7, 15, 43, 69, 130, 166 |
| abstract_inverted_index.NSFW | 74, 89, 125, 156 |
| abstract_inverted_index.This | 116 |
| abstract_inverted_index.data | 100 |
| abstract_inverted_index.deep | 3, 79 |
| abstract_inverted_index.five | 138 |
| abstract_inverted_index.high | 51 |
| abstract_inverted_index.into | 68 |
| abstract_inverted_index.lack | 36 |
| abstract_inverted_index.nine | 150 |
| abstract_inverted_index.rate | 159 |
| abstract_inverted_index.rise | 1 |
| abstract_inverted_index.text | 70, 131 |
| abstract_inverted_index.than | 83 |
| abstract_inverted_index.that | 147 |
| abstract_inverted_index.image | 47 |
| abstract_inverted_index.incur | 50 |
| abstract_inverted_index.least | 162 |
| abstract_inverted_index.model | 26 |
| abstract_inverted_index.named | 63 |
| abstract_inverted_index.these | 33, 56 |
| abstract_inverted_index.which | 65 |
| abstract_inverted_index.while | 164 |
| abstract_inverted_index.(NSFW) | 19 |
| abstract_inverted_index.91.2\% | 163 |
| abstract_inverted_index.Buster | 77, 96, 128, 148 |
| abstract_inverted_index.benign | 46, 93 |
| abstract_inverted_index.costs. | 53 |
| abstract_inverted_index.denote | 146 |
| abstract_inverted_index.merely | 137 |
| abstract_inverted_index.models | 5, 135 |
| abstract_inverted_index.raised | 11 |
| abstract_inverted_index.rather | 82 |
| abstract_inverted_index.within | 136 |
| abstract_inverted_index.address | 55 |
| abstract_inverted_index.concept | 114 |
| abstract_inverted_index.content | 30, 75, 157 |
| abstract_inverted_index.defense | 22 |
| abstract_inverted_index.degrade | 42 |
| abstract_inverted_index.digital | 8 |
| abstract_inverted_index.employs | 97 |
| abstract_inverted_index.encoder | 71, 132 |
| abstract_inverted_index.harmful | 40, 113 |
| abstract_inverted_index.images. | 170 |
| abstract_inverted_index.injects | 66 |
| abstract_inverted_index.involve | 25 |
| abstract_inverted_index.largely | 35 |
| abstract_inverted_index.methods | 23 |
| abstract_inverted_index.prevent | 73 |
| abstract_inverted_index.prompts | 85, 90 |
| abstract_inverted_index.propose | 59 |
| abstract_inverted_index.quality | 44, 167 |
| abstract_inverted_index.removal | 158 |
| abstract_inverted_index.thereby | 109 |
| abstract_inverted_index.through | 102 |
| abstract_inverted_index.towards | 91 |
| abstract_inverted_index.Existing | 21 |
| abstract_inverted_index.Langevin | 103 |
| abstract_inverted_index.approach | 117 |
| abstract_inverted_index.concerns | 13 |
| abstract_inverted_index.content, | 41 |
| abstract_inverted_index.content. | 20, 126 |
| abstract_inverted_index.dynamics | 104 |
| abstract_inverted_index.ensuring | 110 |
| abstract_inverted_index.explicit | 84 |
| abstract_inverted_index.harmless | 169 |
| abstract_inverted_index.learning | 4 |
| abstract_inverted_index.minutes, | 139 |
| abstract_inverted_index.post-hoc | 29 |
| abstract_inverted_index.prompts. | 94 |
| abstract_inverted_index.semantic | 80 |
| abstract_inverted_index.superior | 155 |
| abstract_inverted_index.targeted | 92 |
| abstract_inverted_index.training | 99 |
| abstract_inverted_index.achieving | 153 |
| abstract_inverted_index.backdoors | 67 |
| abstract_inverted_index.extensive | 144 |
| abstract_inverted_index.framework | 62 |
| abstract_inverted_index.inference | 52 |
| abstract_inverted_index.knowledge | 107 |
| abstract_inverted_index.leverages | 78 |
| abstract_inverted_index.primarily | 24 |
| abstract_inverted_index.regarding | 14 |
| abstract_inverted_index.triggers, | 87 |
| abstract_inverted_index.approaches | 34 |
| abstract_inverted_index.baselines, | 152 |
| abstract_inverted_index.fine-tunes | 129 |
| abstract_inverted_index.generation | 16, 101 |
| abstract_inverted_index.innovative | 61 |
| abstract_inverted_index.mitigating | 124 |
| abstract_inverted_index.preserving | 165 |
| abstract_inverted_index.resilience | 120 |
| abstract_inverted_index.robustness | 111 |
| abstract_inverted_index.showcasing | 140 |
| abstract_inverted_index.adversarial | 106 |
| abstract_inverted_index.challenges, | 57 |
| abstract_inverted_index.definition. | 115 |
| abstract_inverted_index.efficiency. | 142 |
| abstract_inverted_index.eliminating | 39 |
| abstract_inverted_index.exceptional | 119 |
| abstract_inverted_index.experiments | 145 |
| abstract_inverted_index.fine-tuning | 27 |
| abstract_inverted_index.generation, | 48 |
| abstract_inverted_index.generation. | 76 |
| abstract_inverted_index.information | 81 |
| abstract_inverted_index.moderation. | 31 |
| abstract_inverted_index.outperforms | 149 |
| abstract_inverted_index.redirecting | 88 |
| abstract_inverted_index.scalability | 37, 122 |
| abstract_inverted_index.substantial | 12 |
| abstract_inverted_index.demonstrates | 118 |
| abstract_inverted_index.energy-based | 98 |
| abstract_inverted_index.Additionally, | 95 |
| abstract_inverted_index.Nevertheless, | 32 |
| abstract_inverted_index.Particularly, | 127 |
| abstract_inverted_index.Text-to-Image | 134 |
| abstract_inverted_index.augmentation, | 108 |
| abstract_inverted_index.\textit{Buster}, | 64 |
| abstract_inverted_index.state-of-the-art | 151 |
| abstract_inverted_index.Not-Safe-for-Work | 18 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |