Explanipedia

PICASO: Permutation-Invariant Context Composition with State Space Models Open

Tian Yu Liu, Alessandro Achille, Matthew Trager, Aditya Golatkar, Luca Zancato , et al. · 2025

Providing Large Language Models with relevant contextual knowledge at inference time has been shown to greatly improve the quality of their generations. This is often achieved by prepending informative passages of text, or 'contexts', retr…

Expansion Span: Combining Fading Memory and Retrieval in Hybrid State Space Models Open

Elvis Nunez, Luca Zancato, Benjamin Bowman, Aditya Golatkar, Wei Xia , et al. · 2024

The "state" of State Space Models (SSMs) represents their memory, which fades exponentially over an unbounded span. By contrast, Attention-based models have "eidetic" (i.e., verbatim, or photographic) memory over a finite span (context siz…

B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory Open

Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao Shen , et al. · 2024

Computer science Physics Mathematics

We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resource…

CPR: Retrieval Augmented Generation for Copyright Protection Open

Aditya Golatkar, Alessandro Achille, Luca Zancato, Yuxiang Wang, Ashwin Swaminathan , et al. · 2024

Computer science

Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG tech…

Diffusion Soup: Model Merging for Text-to-Image Diffusion Models Open

Benjamin Biggs, Arjun Seshadri, Yang Zou, Achin Jain, Aditya Golatkar , et al. · 2024

Computer science Physics

We present Diffusion Soup, a compartmentalization method for Text-to-Image Generation that averages the weights of diffusion models trained on sharded data. By construction, our approach enables training-free continual learning and unlearn…

CPR: Retrieval Augmented Generation for Copyright Protection Open

Aditya Golatkar, Alessandro Achille, Luca Zancato, Yuxiang Wang, Ashwin Swaminathan , et al. · 2024

Business Computer science

Retrieval Augmented Generation (RAG) is emerging as a flexible and robust technique to adapt models to private users data without training, to handle credit attribution, and to allow efficient machine unlearning at scale. However, RAG tech…

Training Data Protection with Compositional Diffusion Models Open

Aditya Golatkar, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto · 2023

Computer science Mathematics Philosophy

We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at…

Tangent Transformers for Composition, Privacy and Removal Open

Tian Yu Liu, Aditya Golatkar, Stefano Soatto · 2023

Computer science Mathematics Engineering

We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resultin…

SAFE: Machine Unlearning With Shard Graphs Open

Yonatan Dukler, Benjamin Bowman, Alessandro Achille, Aditya Golatkar, Ashwin Swaminathan , et al. · 2023

Computer science Engineering Philosophy

We present Synergy Aware Forgetting Ensemble (SAFE), a method to adapt large models on a diverse collection of data while minimizing the expected cost to remove the influence of training samples from the trained model. This process, also k…

Integral Continual Learning Along the Tangent Vector Field of Tasks Open

Tian Yu Liu, Aditya Golatkar, Stefano Soatto, Alessandro Achille · 2022

Computer science Mathematics Philosophy

We propose a lightweight continual learning method which incorporates information from specialized datasets incrementally, by integrating it along the vector field of "generalist" models. The tangent plane to the specialist model acts as a…

On Leave-One-Out Conditional Mutual Information For Generalization Open

Mohamad Rida Rammal, Alessandro Achille, Aditya Golatkar, Suhas Diggavi, Stefano Soatto · 2022

Computer science Mathematics Philosophy

We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI). Contrary to other CMI bounds, which are black-box bounds that do not…

Mixed Differential Privacy in Computer Vision Open

Aditya Golatkar, Alessandro Achille, Yu-Xiang Wang, Aaron Roth, Michael Kearns , et al. · 2022

Computer science Economics Chemistry

We introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. While pre-training language models on large public datasets has enabled strong differe…

Scene Uncertainty and the Wellington Posterior of Deterministic Image Classifiers Open

Stephanie Tsuei, Aditya Golatkar, Stefano Soatto · 2021

Computer science Geography Physics

We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum. Deep neural networks commonly used for image classification are deterministic maps from an input image to an output class. As suc…

Mixed-Privacy Forgetting in Deep Networks Open

Aditya Golatkar, Alessandro Achille, Avinash Ravichandran, Marzia Polito, Stefano Soatto · 2021

Computer science Mathematics Physics

We show that the influence of a subset of the training samples can be removed -- or "forgotten" -- from the weights of a network trained on large-scale image classification tasks, and we provide strong computable bounds on the amount of re…

LQF: Linear Quadratic Fine-Tuning Open

Alessandro Achille, Aditya Golatkar, Avinash Ravichandran, Marzia Polito, Stefano Soatto · 2021

Computer science Mathematics Chemistry

Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are …

Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks Open

Aditya Golatkar, Alessandro Achille, Stefano Soatto · 2020

Computer science Philosophy Biology

We explore the problem of selectively forgetting a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gle…

Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence Open

Aditya Golatkar, Alessandro Achille, Stefano Soatto · 2019

Computer science Mathematics

Regularization is typically understood as improving generalization by altering the landscape of local extrema to which the model eventually converges. Deep neural networks (DNNs), however, challenge this view: We show that removing regular…

Sparse Kernel PCA for Outlier Detection Open

Rudrajit Das, Aditya Golatkar, Suyash P. Awate · 2018

Computer science Mathematics

In this paper, we propose a new method to perform Sparse Kernel Principal Component Analysis (SKPCA) and also mathematically analyze the validity of SKPCA. We formulate SKPCA as a constrained optimization problem with elastic net regulariz…

Aditya Golatkar YOU? Author Swipe