John X. Morris
YOU?
Author Swipe
View article: Approximating Language Model Training Data from Weights
Approximating Language Model Training Data from Weights Open
Modern language models often have open weights but closed training data. We formalize the problem of data approximation from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the hig…
View article: How much do language models memorize?
How much do language models memorize? Open
We propose a new method for estimating how much a model knows about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have struggled to disentangle memorization from gene…
View article: Harnessing the Universal Geometry of Embeddings
Harnessing the Universal Geometry of Embeddings Open
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal lat…
View article: Universal Zero-shot Embedding Inversion
Universal Zero-shot Embedding Inversion Open
Embedding inversion, i.e., reconstructing text given its embedding and black-box access to the embedding encoder, is a fundamental problem in both NLP and security. From the NLP perspective, it helps determine how much semantic information…
View article: Contextual Document Embeddings
Contextual Document Embeddings Open
Dense document embeddings are central to neural retrieval. The dominant paradigm is to train and construct embeddings by running encoders directly on individual documents. In this work, we argue that these embeddings, while effective, are …
View article: Self-interpreting Adversarial Images
Self-interpreting Adversarial Images Open
We introduce a new type of indirect, cross-modal injection attacks against visual language models that enable creation of self-interpreting images. These images contain hidden "meta-instructions" that control how models answer users' quest…
View article: Crafting Interpretable Embeddings by Asking LLMs Questions
Crafting Interpretable Embeddings by Asking LLMs Questions Open
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing ne…
View article: Extracting Prompts by Inverting LLM Outputs
Extracting Prompts by Inverting LLM Outputs Open
We consider the problem of language model inversion: given outputs of a language model, we seek to extract the prompt that generated these outputs. We develop a new black-box method, output2prompt, that learns to extract prompts without ac…
View article: Do language models plan ahead for future tokens?
Do language models plan ahead for future tokens? Open
Do transformers "think ahead" during inference at a given position? It is known transformers prepare information in the hidden states of the forward pass at time step $t$ that is then used in future forward passes $t+τ$. We posit two expla…
View article: Nomic Embed: Training a Reproducible Long Context Text Embedder
Nomic Embed: Training a Reproducible Long Context Text Embedder Open
This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-…
View article: Language Model Inversion
Language Model Inversion Open
Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of…
View article: Tree Prompting: Efficient Task Adaptation without Fine-Tuning
Tree Prompting: Efficient Task Adaptation without Fine-Tuning Open
Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based finetuning. Tree Prompting is an approach to prompting which builds …
View article: Text Embeddings Reveal (Almost) As Much As Text
Text Embeddings Reveal (Almost) As Much As Text Open
How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controll…
View article: Explaining Data Patterns in Natural Language with Language Models
Explaining Data Patterns in Natural Language with Language Models Open
Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. We explore whether we can leverage this ability to find and explain patterns in data. Specifically, given a pre-trained…
View article: Text Embeddings Reveal (Almost) As Much As Text
Text Embeddings Reveal (Almost) As Much As Text Open
How much private information do text embeddings reveal about the original text? We investigate the problem of embedding inversion, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled genera…
View article: Tree Prompting: Efficient Task Adaptation without Fine-Tuning
Tree Prompting: Efficient Task Adaptation without Fine-Tuning Open
Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based fine-tuning. Tree Prompting is an approach to prompting which builds…
View article: Unsupervised Text Deidentification
Unsupervised Text Deidentification Open
Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that m…
View article: Explaining Patterns in Data with Language Models via Interpretable Autoprompting
Explaining Patterns in Data with Language Models via Interpretable Autoprompting Open
Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specificall…
View article: Unsupervised Text Deidentification
Unsupervised Text Deidentification Open
Deidentification seeks to anonymize textual data prior to distribution. Automatic deidentification primarily uses supervised named entity recognition from human-labeled data points. We propose an unsupervised deidentification method that m…
View article: TextAttack: Lessons learned in designing Python frameworks for NLP
TextAttack: Lessons learned in designing Python frameworks for NLP Open
TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components …
View article: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples
Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples Open
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, s…
View article: TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open
While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …
View article: TextAttack: A Framework for Adversarial Attacks in Natural Language Processing
TextAttack: A Framework for Adversarial Attacks in Natural Language Processing Open
While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …
View article: Second-Order NLP Adversarial Examples
Second-Order NLP Adversarial Examples Open
Adversarial example generation methods in NLP rely on models like language models or sentence encoders to determine if potential adversarial examples are valid. In these methods, a valid adversarial example fools the model being attacked, …
View article: Reevaluating Adversarial Examples in Natural Language
Reevaluating Adversarial Examples in Natural Language Open
State-of-the-art attacks on NLP models lack a shared definition of a what constitutes a successful attack. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that f…
View article: Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples
Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples Open
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks. We perform a fine-grained analysis of three elements relevant to search: search algorithm, s…
View article: TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP Open
While there has been substantial research using adversarial attacks to analyze NLP models, each attack is implemented in its own code repository. It remains challenging to develop NLP attacks and utilize them to improve model performance. …
View article: TextAttack: Lessons learned in designing Python frameworks for NLP
TextAttack: Lessons learned in designing Python frameworks for NLP Open
TextAttack is an open-source Python toolkit for adversarial attacks, adversarial training, and data augmentation in NLP. TextAttack unites 15+ papers from the NLP adversarial attack literature into a single framework, with many components …