Explanipedia

Does More Inference-Time Compute Really Help Robustness? Open

Tong Wu, Chong Xiang, Jiachen T. Wang, Weichen Yu, Chawin Sitawarin , et al. · 2025

Recently, Zaremba et al. demonstrated that increasing inference-time computation improves robustness in large proprietary reasoning LLMs. In this paper, we first show that smaller-scale, open-source models (e.g., DeepSeek R1, Qwen3, Phi-re…

Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Model Open

Jie Ren, Kangrui Chen, Chen Chen, Vikash Sehwag, Yue Xing , et al. · 2025

Differentially Private Image Classification by Learning Priors from Random Processes Open

Xinyu Tang, Ashwinee Panda, Vikash Sehwag, Prateek Mittal · 2025

In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition. A recent focus in private learning research is improving th…

Adapting to Evolving Adversaries with Regularized Continual Robust Training Open

Sihui Dai, Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Prateek Mittal · 2025

Robust training methods typically defend against specific attack types, such as Lp attacks with fixed budgets, and rarely account for the fact that defenders may encounter new attacks over time. A natural solution is to adapt the defended …

Activity Recognition on Avatar-Anonymized Datasets with Masked Differential Privacy Open

David Schneider, Sina Sajadmanesh, Vikash Sehwag, Saquib Sarfraz, Rainer Stiefelhagen , et al. · 2024

Privacy-preserving computer vision is an important emerging problem in machine learning and artificial intelligence. Prevalent methods tackling this problem use differential privacy (DP) or obfuscation techniques to protect the privacy of …

Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models Open

Jie Ren, Kangrui Chen, Chen Chen, Vikash Sehwag, Yue Xing , et al. · 2024

Large Language Models (LLMs) and Vision-Language Models (VLMs) have made significant advancements in a wide range of natural language processing and vision-language tasks. Access to large web-scale datasets has been a key factor in their s…

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget Open

Vikash Sehwag, Xianghao Kong, Yi Li, Michael Spranger, Lingjuan Lyu · 2024

As scaling laws in generative AI push performance, they also simultaneously concentrate the development of these models among actors with large computational resources. With a focus on text-to-image (T2I) generative models, we aim to addre…

EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations Open

Jie Ren, Yingqian Cui, Chen Chen, Vikash Sehwag, Yue Xing , et al. · 2024

Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datas…

Evaluating and Mitigating IP Infringement in Visual Generative AI Open

Zhenting Wang, Chen Chen, Vikash Sehwag, Minzhou Pan, Lingjuan Lyu · 2024

The popularity of visual generative AI models like DALL-E 3, Stable Diffusion XL, Stable Video Diffusion, and Sora has been increasing. Through extensive evaluation, we discovered that the state-of-the-art visual generative models can gene…

AI Risk Management Should Incorporate Both Safety and Security Open

Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping , et al. · 2024

The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come tog…

How to Trace Latent Generative Model Generated Images without Artificial Watermark? Open

Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris Metaxas , et al. · 2024

Latent generative models (e.g., Stable Diffusion) have become more and more popular, but concerns have arisen regarding potential misuse related to images generated by these models. It is, therefore, necessary to analyze the origin of imag…

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Open

Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce , et al. · 2024

Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation te…

Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection Open

Minzhou Pan, Zhengting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu , et al. · 2024

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting. WMD is capable of detecting arbitrary watermarks within a given reference dataset using a cl…

Scaling Compute Is Not All You Need for Adversarial Robustness Open

Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj , et al. · 2023

The last six years have witnessed significant progress in adversarially robust deep learning. As evidenced by the CIFAR-10 dataset category in RobustBench benchmark, the accuracy under $\ell_\infty$ adversarial perturbations improved from …

Differentially Private Image Classification by Learning Priors from Random Processes Open

Xinyu Tang, Ashwinee Panda, Vikash Sehwag, Prateek Mittal · 2023

In privacy-preserving machine learning, differentially private stochastic gradient descent (DP-SGD) performs worse than SGD due to per-sample gradient clipping and noise addition. A recent focus in private learning research is improving th…

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks Open

Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin‐Yu Chen , et al. · 2023

The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded Lp-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety …

Extracting Training Data from Diffusion Models Open

Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag , et al. · 2023

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual imag…

Uncovering Adversarial Risks of Test-Time Adaptation Open

Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag , et al. · 2023

Recently, test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts. It allows a base model to adapt to an unforeseen distribution during inference by leveraging the information from the batch …

A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization Open

Ashwinee Panda, Xinyu Tang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal · 2022

An open problem in differentially private deep learning is hyperparameter optimization (HPO). DP-SGD introduces new hyperparameters and complicates existing ones, forcing researchers to painstakingly tune hyperparameters with hundreds of t…

Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation Open

Tong Wu, Tianhao Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal · 2022

Recent works have demonstrated that deep learning models are vulnerable to backdoor poisoning attacks, where these attacks instill spurious correlations to external trigger patterns or objects (e.g., stickers, sunglasses, etc.). We find th…

A Light Recipe to Train Robust Vision Transformers Open

Edoardo Debenedetti, Vikash Sehwag, Prateek Mittal · 2022

In this paper, we ask whether Vision Transformers (ViTs) can serve as an underlying architecture for improving the adversarial robustness of machine learning models against evasion attacks. While earlier works have focused on improving Con…

Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation Open

Tong Wu, Tianhao Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal · 2022

Recent works have demonstrated that deep learning models are vulnerable to backdoor poisoning attacks, where these attacks instill spurious correlations to external trigger patterns or objects (e.g., stickers, sunglasses, etc.). We find th…

Understanding Robust Learning through the Lens of Representation Similarities Open

Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Y. Zhao, Prateek Mittal · 2022

Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial …

Generating High Fidelity Data from Low-density Regions using Diffusion Models Open

Vikash Sehwag, Caner Hazırbaş, Albert Gordo, Firat Ozgenel, Cristian Canton Ferrer · 2022

Our work focuses on addressing sample deficiency from low-density regions of data manifold in common image datasets. We leverage diffusion process based generative models to synthesize novel images from low-density regions. We observe that…

Improving Adversarial Robustness Using Proxy Distributions. Open

Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang , et al. · 2021

We focus on the use of proxy distributions, i.e., approximations of the underlying distribution of the training dataset, in both understanding and improving the adversarial robustness in image classification. While additional training data…

Robust Learning Meets Generative Models: Can Proxy Distributions Improve\n Adversarial Robustness? Open

Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang , et al. · 2021

While additional training data improves the robustness of deep neural\nnetworks against adversarial examples, it presents the challenge of curating a\nlarge number of specific real-world samples. We circumvent this challenge by\nusing addi…

Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness? Open

Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang , et al. · 2021

While additional training data improves the robustness of deep neural networks against adversarial examples, it presents the challenge of curating a large number of specific real-world samples. We circumvent this challenge by using additio…

Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries Open

Arjun Nitin Bhagoji, Daniel Cullina, Vikash Sehwag, Prateek Mittal · 2021

Understanding the fundamental limits of robust supervised learning has emerged as a problem of immense interest, from both practical and theoretical standpoints. In particular, it is critical to determine classifier-agnostic bounds on the …

SSD: A Unified Framework for Self-Supervised Outlier Detection Open

Vikash Sehwag, Mung Chiang, Prateek Mittal · 2021

We ask the following question: what training information is required to design an effective outlier/out-of-distribution (OOD) detector, i.e., detecting samples that lie far away from the training distribution? Since unlabeled data is easil…

Fast-Convergent Federated Learning Open

Hung T. Nguyen, Vikash Sehwag, Seyyedali Hosseinalipour, Christopher G. Brinton, Mung Chiang , et al. · 2020

Federated learning has emerged recently as a promising solution for distributing machine learning tasks through modern networks of mobile devices. Recent studies have obtained lower bounds on the expected decrease in model loss that is ach…

Vikash Sehwag YOU? Author Swipe