Andrey Labunets
YOU?
Author Swipe
View article: May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks
May I have your Attention? Breaking Fine-Tuning based Prompt Injection Defenses using Architecture-Aware Attacks Open
A popular class of defenses against prompt injection attacks on large language models (LLMs) relies on fine-tuning to separate instructions and data, so that the LLM does not follow instructions that might be present with data. We evaluate…
View article: Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface
Fun-tuning: Characterizing the Vulnerability of Proprietary LLMs to Optimization-based Prompt Injection Attacks via the Fine-Tuning Interface Open
We surface a new threat to closed-weight Large Language Models (LLMs) that enables an attacker to compute optimization-based prompt injections. Specifically, we characterize how an attacker can leverage the loss-like information returned f…
View article: An Empirical Analysis on the Use and Reporting of National Security Letters
An Empirical Analysis on the Use and Reporting of National Security Letters Open
Government investigatory and surveillance powers are important tools for examining crime and protecting public safety. However, since these tools must be employed in secret, it can be challenging to identify abuses or changes in use that c…
View article: Experimental Analyses of the Physical Surveillance Risks in Client-Side Content Scanning
Experimental Analyses of the Physical Surveillance Risks in Client-Side Content Scanning Open
Content scanning systems employ perceptual hashing algorithms to scan user content for illicit material, such as child pornography or terrorist recruitment flyers.Perceptual hashing algorithms help determine whether two images are visually…
View article: Re-purposing Perceptual Hashing based Client Side Scanning for Physical Surveillance
Re-purposing Perceptual Hashing based Client Side Scanning for Physical Surveillance Open
Content scanning systems employ perceptual hashing algorithms to scan user content for illegal material, such as child pornography or terrorist recruitment flyers. Perceptual hashing algorithms help determine whether two images are visuall…
View article: Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021
Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021 Open
With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existi…