Varun Chandrasekaran
YOU?
Author Swipe
View article: The Privacy Quagmire: Where Computer Scientists and Lawyers May Disagree
The Privacy Quagmire: Where Computer Scientists and Lawyers May Disagree Open
View article: Efficiently Attacking Memorization Scores
Efficiently Attacking Memorization Scores Open
Influence estimation tools -- such as memorization scores -- are widely used to understand model behavior, attribute training data, and inform dataset curation. However, recent applications in data valuation and responsible machine learnin…
View article: Analyzing Security and Privacy Challenges in Generative AI Usage Guidelines for Higher Education
Analyzing Security and Privacy Challenges in Generative AI Usage Guidelines for Higher Education Open
Educators and learners worldwide are embracing the rise of Generative Artificial Intelligence (GenAI) as it reshapes higher education. However, GenAI also raises significant privacy and security concerns, as models and privacy-sensitive us…
View article: AMUN: Adversarial Machine UNlearning
AMUN: Adversarial Machine UNlearning Open
Machine unlearning, where users can request the deletion of a forget dataset, is becoming increasingly important because of numerous privacy regulations. Initial works on ``exact'' unlearning (e.g., retraining) incur large computational ov…
View article: MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation Open
Vision-language models (VLMs) are highly effective but often underperform on specialized tasks; for example, Llava-1.5 struggles with chart and diagram understanding due to scarce task-specific training data. Existing training data, source…
View article: Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models Open
View article: The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis
The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis Open
Watermarking approaches are widely used to identify if images being circulated are authentic or AI-generated. Determining the robustness of image watermarking methods in the ``no-box'' setting, where the attacker is assumed to have no know…
View article: Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models Open
We explore the internal mechanisms of how bias emerges in large language models (LLMs) when provided with ambiguous comparative prompts: inputs that compare or enforce choosing between two or more entities without providing clear context f…
View article: BenchAgents: Multi-Agent Systems for Structured Benchmark Creation
BenchAgents: Multi-Agent Systems for Structured Benchmark Creation Open
Evaluation insights are limited by the availability of high-quality benchmarks. As models evolve, there is a need to create benchmarks that can measure progress on new and complex generative capabilities. However, manually creating new ben…
View article: Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models Open
With models getting stronger, evaluations have grown more complex, testing multiple skills in one benchmark and even in the same instance at once. However, skill-wise performance is obscured when inspecting aggregate accuracy, under-utiliz…
View article: LOTOS: Layer-wise Orthogonalization for Training Robust Ensembles
LOTOS: Layer-wise Orthogonalization for Training Robust Ensembles Open
Transferability of adversarial examples is a well-known property that endangers all classification models, even those that are only accessible through black-box queries. Prior work has shown that an ensemble of models is more resilient to …
View article: Generative Monoculture in Large Language Models
Generative Monoculture in Large Language Models Open
We introduce {\em generative monoculture}, a behavior observed in large language models (LLMs) characterized by a significant narrowing of model output diversity relative to available training data for a given task: for example, generating…
View article: Bypassing LLM Watermarks with Color-Aware Substitutions
Bypassing LLM Watermarks with Color-Aware Substitutions Open
Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (``gr…
View article: Designing Informative Metrics for Few-Shot Example Selection
Designing Informative Metrics for Few-Shot Example Selection Open
Pretrained language models (PLMs) have shown remarkable few-shot learning capabilities when provided with properly formatted examples. However, selecting the "best" examples remains an open challenge. We propose a complexity-based prompt s…
View article: Privately Aligning Language Models with Reinforcement Learning
Privately Aligning Language Models with Reinforcement Learning Open
Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT. In this work, we…
View article: KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval Open
We study the ability of state-of-the art models to answer constraint satisfaction queries for information retrieval (e.g., 'a list of ice cream shops in San Diego'). In the past, such queries were considered to be tasks that could only be …
View article: Why Train More? Effective and Efficient Membership Inference via Memorization
Why Train More? Effective and Efficient Membership Inference via Memorization Open
Membership Inference Attacks (MIAs) aim to identify specific data samples within the private training dataset of machine learning models, leading to serious privacy violations and other sophisticated threats. Many practical black-box MIAs …
View article: Diversity of Thought Improves Reasoning Abilities of LLMs
Diversity of Thought Improves Reasoning Abilities of LLMs Open
Large language models (LLMs) are documented to struggle in settings that require complex reasoning. Nevertheless, instructing the model to break down the problem into smaller reasoning steps, or ensembling various generations through modif…
View article: Teaching Language Models to Hallucinate Less with Synthetic Tasks
Teaching Language Models to Hallucinate Less with Synthetic Tasks Open
Large language models (LLMs) frequently hallucinate on abstractive summarization tasks such as document-based question-answering, meeting summarization, and clinical report generation, even though all necessary information is included in c…
View article: Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models Open
We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as constraint satisfaction problems and use this framework to investiga…
View article: DSML 2023 Committee
DSML 2023 Committee Open
View article: Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Open
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. Th…
View article: Verifiable and Provably Secure Machine Unlearning
Verifiable and Provably Secure Machine Unlearning Open
Machine unlearning aims to remove points from the training dataset of a machine learning model after training; for example when a user requests their data to be deleted. While many machine unlearning methods have been proposed, none of the…
View article: Proof-of-Learning is Currently More Broken Than You Think
Proof-of-Learning is Currently More Broken Than You Think Open
Proof-of-Learning (PoL) proposes that a model owner logs training checkpoints to establish a proof of having expended the computation necessary for training. The authors of PoL forego cryptographic approaches and trade rigorous security gu…
View article: Generative Extraction of Audio Classifiers for Speaker Identification
Generative Extraction of Audio Classifiers for Speaker Identification Open
It is perhaps no longer surprising that machine learning models, especially deep neural networks, are particularly vulnerable to attacks. One such vulnerability that has been well studied is model extraction: a phenomenon in which the atta…
View article: Hierarchical Federated Learning with Privacy
Hierarchical Federated Learning with Privacy Open
Federated learning (FL), where data remains at the federated clients, and where only gradient updates are shared with a central aggregator, was assumed to be private. Recent work demonstrates that adversaries with gradient-level access can…
View article: Message from the DSML 2022 Organizers
Message from the DSML 2022 Organizers Open
View article: CONFIDANT: A Privacy Controller for Social Robots
CONFIDANT: A Privacy Controller for Social Robots Open
As social robots become increasingly prevalent in day-to-day environments, they will participate in conversations and appropriately manage the information shared with them. However, little is known about how robots might appropriately disc…
View article: Unrolling SGD: Understanding Factors Influencing Machine Unlearning
Unrolling SGD: Understanding Factors Influencing Machine Unlearning Open
Machine unlearning is the process through which a deployed machine learning model is made to forget about some of its training data points. While naively retraining the model from scratch is an option, it is almost always associated with l…
View article: SoK: Machine Learning Governance
SoK: Machine Learning Governance Open
The application of machine learning (ML) in computer systems introduces not only many benefits but also risks to society. In this paper, we develop the concept of ML governance to balance such benefits and risks, with the aim of achieving …