Krishnamurthy Dvijotham
YOU?
Author Swipe
View article: Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?
Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks? Open
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept of firewalls, we show t…
View article: Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain Open
The practice of fine-tuning AI agents on data from their own interactions--such as web browsing or tool use--, while being a strong general recipe for improving agentic capabilities, also introduces a critical security vulnerability within…
View article: VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation
VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation Open
The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy. These agents may deviate from user objectives, violate data handling policies, or be compromised b…
View article: Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion
Adaptive Diffusion Denoised Smoothing : Certified Robustness via Randomized Smoothing with Differentially Private Guided Denoising Diffusion Open
We propose Adaptive Diffusion Denoised Smoothing, a method for certifying the predictions of a vision model against adversarial examples, while adapting to the input. Our key insight is to reinterpret a guided denoising diffusion model as …
View article: Correlated Noise Mechanisms for Differentially Private Learning
Correlated Noise Mechanisms for Differentially Private Learning Open
This monograph explores the design and analysis of correlated noise mechanisms for differential privacy (DP), focusing on their application to private training of AI and machine learning models via the core primitive of estimation of weigh…
View article: Through the Stealth Lens: Rethinking Attacks and Defenses in RAG
Through the Stealth Lens: Rethinking Attacks and Defenses in RAG Open
Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved set, even at low corruption rates. We show that existing attacks are not designed to be stealthy, allowing reliable det…
View article: DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats Open
We present DoomArena, a security evaluation framework for AI agents. DoomArena is designed on three principles: 1) It is a plug-in framework and integrates easily into realistic agentic frameworks like BrowserGym (for web agents) and $τ$-b…
View article: No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms
No, of Course I Can! Deeper Fine-Tuning Attacks That Bypass Token-Level Safety Mechanisms Open
Leading language model (LM) providers like OpenAI and Anthropic allow customers to fine-tune frontier LMs for specific use cases. To prevent abuse, these providers apply filters to block fine-tuning on overtly harmful data. In this setting…
View article: Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning
Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning Open
The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing r…
View article: Norm-Bounded Low-Rank Adaptation
Norm-Bounded Low-Rank Adaptation Open
In this work, we propose norm-bounded low-rank adaptation (NB-LoRA) for parameter-efficient fine tuning. NB-LoRA is a novel parameterization of low-rank weight adaptations that admits explicit bounds on each singular value of the adaptatio…
View article: LitLLMs, LLMs for Literature Review: Are we there yet?
LitLLMs, LLMs for Literature Review: Are we there yet? Open
Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. This paper explores the zero-shot abilities of recent La…
View article: BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks Open
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require long-s…
View article: Achieving the Tightest Relaxation of Sigmoids for Formal Verification
Achieving the Tightest Relaxation of Sigmoids for Formal Verification Open
In the field of formal verification, Neural Networks (NNs) are typically reformulated into equivalent mathematical programs which are optimized over. To overcome the inherent non-convexity of these reformulations, convex relaxations of non…
View article: Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation Open
Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This pa…
View article: Verified Neural Compressed Sensing
Verified Neural Compressed Sensing Open
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task, with the proof of correctness generated by an automated verification algorithm without any human input. Prior work on ne…
View article: Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models Open
Fine-tuning text-to-image models with reward functions trained on human feedback data has proven effective for aligning model behavior with human intent. However, excessive optimization with such reward models, which serve as mere proxy ob…
View article: Stealing Part of a Production Language Model
Stealing Part of a Production Language Model Open
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer…
View article: Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction
Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction Open
Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of suc…
View article: Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation
Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation Open
We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss. Under modest assumptions on the input, we characterize the distribution of the iterate at each time ste…
View article: Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks
Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks Open
This paper presents a new bi-Lipschitz invertible neural network, the BiLipNet, which has the ability to smoothly control both its Lipschitzness (output sensitivity to input perturbations) and inverse Lipschitzness (input distinguishabilit…
View article: MINT: A wrapper to make multi-modal and multi-image AI models interactive
MINT: A wrapper to make multi-modal and multi-image AI models interactive Open
During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge:…
View article: Rich Human Feedback for Text-to-Image Generation
Rich Human Feedback for Text-to-Image Generation Open
Recent Text-to-Image (T2I) generation models such as Stable Diffusion and Imagen have made significant progress in generating high-resolution images based on text descriptions. However, many generated images still suffer from issues such a…
View article: Selective Concept Models: Permitting Stakeholder Customisation at Test-Time
Selective Concept Models: Permitting Stakeholder Customisation at Test-Time Open
Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We …
View article: Correlated Noise Provably Beats Independent Noise for Differentially Private Learning
Correlated Noise Provably Beats Independent Noise for Differentially Private Learning Open
Differentially private learning algorithms inject noise into the learning process. While the most common private learning algorithm, DP-SGD, adds independent Gaussian noise in each iteration, recent work on matrix factorization mechanisms …
View article: Learning to Receive Help: Intervention-Aware Concept Embedding Models
Learning to Receive Help: Intervention-Aware Concept Embedding Models Open
Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, …
View article: Human Uncertainty in Concept-Based AI Systems
Human Uncertainty in Concept-Based AI Systems Open
Placing a human in the loop may help abate the risks of deploying AI systems in safety-critical settings (e.g., a clinician working with a medical AI system). However, mitigating risks arising from human error and uncertainty within such h…
View article: Interactive Concept Bottleneck Models
Interactive Concept Bottleneck Models Open
Concept bottleneck models (CBMs) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions. We exte…
View article: Selective Concept Models: Permitting Stakeholder Customisation at Test-Time
Selective Concept Models: Permitting Stakeholder Customisation at Test-Time Open
Concept-based models perform prediction using a set of concepts that are interpretable to stakeholders. However, such models often involve a fixed, large number of concepts, which may place a substantial cognitive load on stakeholders. We …
View article: Faithful Knowledge Distillation
Faithful Knowledge Distillation Open
Knowledge distillation (KD) has received much attention due to its success in compressing networks to allow for their deployment in resource-constrained systems. While the problem of adversarial robustness has been studied before in the KD…
View article: Training Private Models That Know What They Don't Know
Training Private Models That Know What They Don't Know Open
Training reliable deep learning models which avoid making overconfident but incorrect predictions is a longstanding challenge. This challenge is further exacerbated when learning has to be differentially private: protection provided to sen…