Explanipedia

May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks Open

Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes · 2025

A popular class of defenses against prompt injection attacks on large language models (LLMs) relies on fine-tuning to separate instructions and data, so that the LLM does not follow instructions that might be present with data. We evaluate…

Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface Open

Andrey Labunets, Nishit V. Pandya, Ashish Hooda, Xiaohan Fu, Earlence Fernandes · 2025

Computer science

We surface a new threat to closed-weight Large Language Models (LLMs) that enables an attacker to compute optimization-based prompt injections. Specifically, we characterize how an attacker can leverage the loss-like information returned f…

An Empirical Analysis on the Use and Reporting of National Security Letters Open

Alex Bellon, Miro Haller, Andrey Labunets, Enze Liu, Stefan Savage · 2024

Computer science Business

Government investigatory and surveillance powers are important tools for examining crime and protecting public safety. However, since these tools must be employed in secret, it can be challenging to identify abuses or changes in use that c…

Experimental Analyses of the Physical Surveillance Risks in Client-Side Content Scanning Open

Ashish Hooda, Andrey Labunets, Tadayoshi Kohno, Earlence Fernandes · 2024

Computer science

Content scanning systems employ perceptual hashing algorithms to scan user content for illicit material, such as child pornography or terrorist recruitment flyers.Perceptual hashing algorithms help determine whether two images are visually…

Re-purposing Perceptual Hashing based Client Side Scanning for Physical Surveillance Open

Ashish Hooda, Andrey Labunets, Tadayoshi Kohno, Earlence Fernandes · 2022

Computer science Psychology Chemistry

Content scanning systems employ perceptual hashing algorithms to scan user content for illegal material, such as child pornography or terrorist recruitment flyers. Perceptual hashing algorithms help determine whether two images are visuall…

Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021 Open

Maaz Amjad, Alisa Zhila, Grigori Sidorov, Andrey Labunets, Sabur Butt , et al. · 2022

Computer science Engineering Philosophy

With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existi…

Andrey Labunets YOU? Author Swipe