Roei Schuster
YOU?
Author Swipe
View article: Rerouting LLM Routers
Rerouting LLM Routers Open
LLM routers aim to balance quality and cost of generation by classifying queries and routing them to a cheaper or more expensive LLM depending on their complexity. Routers represent one type of what we call LLM control planes: systems that…
View article: Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents Open
Retrieval-augmented generation (RAG) systems respond to queries by retrieving relevant documents from a knowledge database and applying an LLM to the retrieved documents. We demonstrate that RAG systems that operate on databases with untru…
View article: The Adversarial Implications of Variable-Time Inference
The Adversarial Implications of Variable-Time Inference Open
Machine learning (ML) models are known to be vulnerable to a number of attacks that target the integrity of their predictions or the privacy of their training data. To carry out these attacks, a black-box adversary must typically possess t…
View article: The Adversarial Implications of Variable-Time Inference
The Adversarial Implications of Variable-Time Inference Open
Machine learning (ML) models are known to be vulnerable to a number of attacks that target the integrity of their predictions or the privacy of their training data. To carry out these attacks, a black-box adversary must typically possess t…
View article: Reconstructing Individual Data Points in Federated Learning Hardened with Differential Privacy and Secure Aggregation
Reconstructing Individual Data Points in Federated Learning Hardened with Differential Privacy and Secure Aggregation Open
Federated learning (FL) is a framework for users to jointly train a machine learning model. FL is promoted as a privacy-enhancing technology (PET) that provides data minimization: data never "leaves" personal devices and users share only m…
View article: Understanding Transformer Memorization Recall Through Idioms
Understanding Transformer Memorization Recall Through Idioms Open
To produce accurate predictions, language models (LMs) must balance between generalization and memorization. Yet, little is known about the mechanism by which transformer LMs employ their memorization capacity. When does a model decide to …
View article: Learned-Database Systems Security
Learned-Database Systems Security Open
A learned database system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users o…
View article: Understanding Transformer Memorization Recall Through Idioms
Understanding Transformer Memorization Recall Through Idioms Open
To produce accurate predictions, language models (LMs) must balance between generalization and memorization. Yet, little is known about the mechanism by which transformer LMs employ their memorization capacity. When does a model decide to …
View article: In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning
In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning Open
When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a …
View article: When the Curious Abandon Honesty: Federated Learning Is Not Private
When the Curious Abandon Honesty: Federated Learning Is Not Private Open
In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients, parameters, or other model updates, with a central party (e.g., a company) co…
View article: You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion Open
Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasib…
View article: Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value Memories Open
Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where…
View article: You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion Open
Code autocompletion is an integral feature of modern code editors and IDEs. The latest generation of autocompleters uses neural language models, trained on public open-source code repositories, to suggest likely (not just statically feasib…
View article: De-Anonymizing Text by Fingerprinting Language Generation
De-Anonymizing Text by Fingerprinting Language Generation Open
Components of machine learning systems are not (yet) perceived as security hotspots. Secure coding practices, such as ensuring that no execution paths depend on confidential inputs, have not yet been adopted by ML developers. We initiate t…
View article: Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning
Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning Open
Word embeddings, i.e., low-dimensional vector representations such as GloVe and SGNS, encode word "meaning" in the sense that distances between words' vectors correspond to their semantic proximity. This enables transfer learning of semant…
View article: The Limitations of Stylometry for Detecting Machine-Generated Fake News
The Limitations of Stylometry for Detecting Machine-Generated Fake News Open
Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake ne…
View article: Are We Safe Yet? The Limitations of Distributional Features for Fake News Detection.
Are We Safe Yet? The Limitations of Distributional Features for Fake News Detection. Open
Recent developments in neural language models (LMs) have raised concerns about their potential misuse for automatically spreading misinformation. In light of these concerns, several studies have proposed to detect machine-generated fake ne…
View article: Synesthesia: Detecting Screen Content via Remote Acoustic Side Channels
Synesthesia: Detecting Screen Content via Remote Acoustic Side Channels Open
We show that subtle acoustic noises emanating from within computer screens\ncan be used to detect the content displayed on the screens. This sound can be\npicked up by ordinary microphones built into webcams or screens, and is\ninadvertent…
View article: Situational Access Control in the Internet of Things
Situational Access Control in the Internet of Things Open
Access control in the Internet of Things (IoT) often depends on a situation --- for example, "the user is at home'' --- that can only be tracked using multiple devices. In contrast to the (well-studied) smartphone frameworks, enforcement o…