Aditya Golatkar
YOU?
Author Swipe
View article: PICASO: Permutation-Invariant Context Composition with State Space Models
PICASO: Permutation-Invariant Context Composition with State Space Models Open
Providing Large Language Models with relevant contextual knowledge at inference time has been shown to greatly improve the quality of their generations. This is often achieved by prepending informative passages of text, or 'contexts', retr…
View article: Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models
Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models Open
The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context siz…
View article: B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory
B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory Open
We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resource…
View article: CPR: Retrieval Augmented Generation for Copyright Protection
CPR: Retrieval Augmented Generation for Copyright Protection Open
Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG tech…
View article: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models Open
We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearn…
View article: CPR: Retrieval Augmented Generation for Copyright Protection
CPR: Retrieval Augmented Generation for Copyright Protection Open
Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG tech…
View article: Training Data Protection with Compositional Diffusion Models
Training Data Protection with Compositional Diffusion Models Open
We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at…
View article: Tangent Transformers for Composition, Privacy and Removal
Tangent Transformers for Composition, Privacy and Removal Open
We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resultin…
View article: SAFE: Machine Unlearning With Shard Graphs
SAFE: Machine Unlearning With Shard Graphs Open
We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also k…
View article: Integral Continual Learning Along the Tangent Vector Field of Tasks
Integral Continual Learning Along the Tangent Vector Field of Tasks Open
We propose a lightweight continual learning method which incorporates information from specialized datasets incrementally, by integrating it along the vector field of "generalist" models. The tangent plane to the specialist model acts as a…
View article: On Leave-One-Out Conditional Mutual Information For Generalization
On Leave-One-Out Conditional Mutual Information For Generalization Open
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not…
View article: Mixed Differential Privacy in Computer Vision
Mixed Differential Privacy in Computer Vision Open
We introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. While pre-training language models on large public datasets has enabled strong differe…
View article: Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers
Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers Open
We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum. Deep neural networks commonly used for image classification are deterministic maps from an input image to an output class. As suc…
View article: Mixed-Privacy Forgetting in Deep Networks
Mixed-Privacy Forgetting in Deep Networks Open
We show that the influence of a subset of the training samples can be removed -- or "forgotten" -- from the weights of a network trained on large-scale image classification tasks, and we provide strong computable bounds on the amount of re…
View article: LQF: Linear Quadratic Fine-Tuning
LQF: Linear Quadratic Fine-Tuning Open
Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are …
View article: Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks
Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks Open
We explore the problem of selectively forgetting a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gle…
View article: Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence
Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence Open
Regularization is typically understood as improving generalization by altering the landscape of local extrema to which the model eventually converges. Deep neural networks (DNNs), however, challenge this view: We show that removing regular…
View article: Sparse Kernel PCA for Outlier Detection
Sparse Kernel PCA for Outlier Detection Open
In this paper, we propose a new method to perform Sparse Kernel Principal Component Analysis (SKPCA) and also mathematically analyze the validity of SKPCA. We formulate SKPCA as a constrained optimization problem with elastic net regulariz…