Explanipedia

Approximating Language Model Training Data from Weights Open

John X. Morris, Jianping Yin, Woojeong Kim, Vitaly Shmatikov, Alexander M. Rush · 2025

Modern language models often have open weights but closed training data. We formalize the problem of data approximation from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the hig…

How much do language models memorize? Open

John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh , et al. · 2025

We propose a new method for estimating how much a model knows about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have struggled to disentangle memorization from gene…

Harnessing the Universal Geometry of Embeddings Open

Rishi Jha, Collin Zhang, Vitaly Shmatikov, John X. Morris · 2025

We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal lat…

Universal Zero-shot Embedding Inversion Open

Collin Zhang, John X. Morris, Vitaly Shmatikov · 2025

Embedding inversion, i.e., reconstructing text given its embedding and black-box access to the embedding encoder, is a fundamental problem in both NLP and security. From the NLP perspective, it helps determine how much semantic information…

Contextual Document Embeddings Open

John X. Morris, Alexander M. Rush · 2024

Computer science

Dense document embeddings are central to neural retrieval. The dominant paradigm is to train and construct embeddings by running encoders directly on individual documents. In this work, we argue that these embeddings, while effective, are …

Self-interpreting Adversarial Images Open

Tingwei Zhang, Collin Zhang, John X. Morris, Eugene Bagdasaryan, Vitaly Shmatikov · 2024

Computer science

We introduce a new type of indirect, cross-modal injection attacks against visual language models that enable creation of self-interpreting images. These images contain hidden "meta-instructions" that control how models answer users' quest…

Crafting Interpretable Embeddings by Asking LLMs Questions Open

Vinamra Benara, Chandan Deep Singh, John X. Morris, Richard Antonello, Ion Stoica , et al. · 2024

Psychology Philosophy

Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing ne…

Extracting Prompts by Inverting LLM Outputs Open

Collin Zhang, John X. Morris, Vitaly Shmatikov · 2024

Computer science

We consider the problem of language model inversion: given outputs of a language model, we seek to extract the prompt that generated these outputs. We develop a new black-box method, output2prompt, that learns to extract prompts without ac…

Do language models plan ahead for future tokens? Open

Wilson Wu, John X. Morris, Lionel Levine · 2024

Computer science Geography

Do transformers "think ahead" during inference at a given position? It is known transformers prepare information in the hidden states of the forward pass at time step $t$ that is then used in future forward passes $t+τ$. We posit two expla…

Nomic Embed: Training a Reproducible Long Context Text Embedder Open

Zach Nussbaum, John X. Morris, Brandon Duderstadt, Andriy Mulyar · 2024

Computer science History Physics

This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-…

Language Model Inversion Open

John X. Morris, Wenting Zhao, Justin Chiu, Vitaly Shmatikov, Alexander M. Rush · 2023

Computer science Biology Philosophy

Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of…

Tree Prompting: Efficient Task Adaptation without Fine-Tuning Open

John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng · 2023

Computer science Psychology Mathematics

Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based finetuning. Tree Prompting is an approach to prompting which builds …

Text Embeddings Reveal (Almost) As Much As Text Open

John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush · 2023

Computer science Mathematics Biology

How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controll…

Explaining Data Patterns in Natural Language with Language Models Open

Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao · 2023

Computer science Mathematics Materials science

Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. We explore whether we can leverage this ability to find and explain patterns in data. Specifically, given a pre-trained…

Text Embeddings Reveal (Almost) As Much As Text Open

John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush · 2023

Computer science Mathematics Biology

How much private information do text embeddings reveal about the original text? We investigate the problem of embedding inversion, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled genera…

Tree Prompting: Efficient Task Adaptation without Fine-Tuning Open

Chandan Singh, John X. Morris, Alexander M. Rush, Jianfeng Gao, Yuntian Deng · 2023

Computer science Mathematics Physics

Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based fine-tuning. Tree Prompting is an approach to prompting which builds…

Unsupervised Text Deidentification Open

John X. Morris, Justin Chiu, Ramin Zabih, Alexander M. Rush · 2022

Computer science Mathematics Economics

Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that m…

Explaining Patterns in Data with Language Models via Interpretable Autoprompting Open

Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao · 2022

Computer science Physics Mathematics

Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specificall…

Unsupervised Text Deidentification Open

John X. Morris, Justin Chiu, Ramin Zabih, Alexander M. Rush · 2022

Computer science Economics Philosophy

Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that m…

TextAttack: Lessons learned in designing Python frameworks for NLP Open

John X. Morris, Jin Yong Yoo, Yanjun Qi · 2020

Computer science

TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components …

Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples Open

Jin Yong Yoo, John X. Morris, Eli Lifland, Yanjun Qi · 2020

Computer science Business Geography

We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, s…

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open

John X. Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin , et al. · 2020

Computer science Physics Chemistry

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …

TextAttack: A Framework for Adversarial Attacks in Natural Language Processing Open

John X. Morris, Eli Lifland, Jin Yong Yoo, Yanjun Qi · 2020

Computer science Engineering Chemistry

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …

Second-Order NLP Adversarial Examples Open

John X. Morris · 2020

Computer science Mathematics Chemistry

Adversarial example generation methods in NLP rely on models like language models or sentence encoders to determine if potential adversarial examples are valid. In these methods, a valid adversarial example fools the model being attacked, …

Reevaluating Adversarial Examples in Natural Language Open

John X. Morris, Eli Lifland, Jack Lanchantin, Yangfeng Ji, Yanjun Qi · 2020

Computer science Philosophy Biology

State-of-the-art attacks on NLP models lack a shared definition of a what constitutes a successful attack. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that f…

Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples Open

Jin Yong Yoo, John X. Morris, Eli Lifland, Yanjun Qi · 2020

Computer science Business Geography

We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, s…

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open

John X. Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin , et al. · 2020

Computer science Chemistry Physics

While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …

TextAttack: Lessons learned in designing Python frameworks for NLP Open

John X. Morris, Jin Yong Yoo, Yanjun Qi · 2020

Computer science

TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components …

John X. Morris YOU? Author Swipe