Robin Jia
YOU?
Author Swipe
View article: LLM Unlearning Without an Expert Curated Dataset
LLM Unlearning Without an Expert Curated Dataset Open
Modern large language models often encode sensitive, harmful, or copyrighted knowledge, raising the need for post-hoc unlearning-the ability to remove specific domains of knowledge from a model without full retraining. A major bottleneck i…
View article: TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability
TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability Open
Understanding the relationship between training data and model behavior during pretraining is crucial, but existing workflows make this process cumbersome, fragmented, and often inaccessible to researchers. We present TokenSmith, an open-s…
View article: Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Open
Humans often use visual aids, for example diagrams or sketches, when solving complex problems. Training multimodal models to do the same, known as Visual Chain of Thought (Visual CoT), is challenging due to: (1) poor off-the-shelf visual C…
View article: Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition Open
Large language models demonstrate the intriguing ability to perform unseen tasks via in-context learning. However, it remains unclear what mechanisms inside the model drive such task-level generalization. In this work, we approach this que…
View article: PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models
PSALM-V: Automating Symbolic Planning in Interactive Visual Environments with Large Language Models Open
We propose PSALM-V, the first autonomous neuro-symbolic learning system able to induce symbolic action semantics (i.e., pre- and post-conditions) in visual environments through interaction. PSALM-V bootstraps reliable symbolic planning wit…
View article: Why Do Some Inputs Break Low-Bit LLM Quantization?
Why Do Some Inputs Break Low-Bit LLM Quantization? Open
Low-bit weight-only quantization significantly reduces the memory footprint of large language models (LLMs), but disproportionately affects certain examples. We analyze diverse 3-4 bit methods on LLMs ranging from 7B-70B in size and find t…
View article: Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models Open
Steering methods have emerged as effective and targeted tools for guiding large language models' (LLMs) behavior without modifying their parameters. Multimodal large language models (MLLMs), however, do not currently enjoy the same suite o…
View article: Teaching Models to Understand (but not Generate) High-risk Data
Teaching Models to Understand (but not Generate) High-risk Data Open
Language model developers typically filter out high-risk content -- such as toxic or copyrighted text -- from their pre-training data to prevent models from generating similar outputs. However, removing such data altogether limits models' …
View article: Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions Open
Cancer patients are increasingly turning to large language models (LLMs) for medical information, making it critical to assess how well these models handle complex, personalized questions. However, current medical benchmarks focus on medic…
View article: Interrogating LLM design under a fair learning doctrine
Interrogating LLM design under a fair learning doctrine Open
The current discourse on large language models (LLMs) and copyright largely takes a "behavioral" perspective, focusing on model outputs and evaluating whether they are substantially similar to training data. However, substantial similarity…
View article: FoNE: Precise Single-Token Number Embeddings via Fourier Features
FoNE: Precise Single-Token Number Embeddings via Fourier Features Open
Large Language Models (LLMs) typically represent numbers using multiple tokens, which requires the model to aggregate these tokens to interpret numerical values. This fragmentation makes both training and inference less efficient and adver…
View article: Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language Models Open
Large language models (LLMs) show promising capabilities in predicting human emotions from text. However, the mechanisms through which these models process emotional stimuli remain largely unexplored. Our study addresses this gap by invest…
View article: Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics Open
Improvements in large language models have led to increasing optimism that they can serve as reliable evaluators of natural language generation outputs. In this paper, we challenge this optimism by thoroughly re-evaluating five state-of-th…
View article: Operationalizing Content Moderation “Accuracy” in the Digital Services Act
Operationalizing Content Moderation “Accuracy” in the Digital Services Act Open
The Digital Services Act, recently adopted by the EU, requires social media platforms to report the ``accuracy'' of their automated content moderation systems. The colloquial term is vague, or open-textured---the literal accuracy (number o…
View article: TLDR: Token-Level Detective Reward Model for Large Vision Language Models
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Open
Although reward models have been successful in improving multimodal large language models, the reward models themselves remain brutal and contain minimal information. Notably, existing reward models only mimic human annotations by assignin…
View article: Rethinking Backdoor Detection Evaluation for Language Models
Rethinking Backdoor Detection Evaluation for Language Models Open
Backdoor attacks, in which a model behaves maliciously when given an attacker-specified trigger, pose a major security risk for practitioners who depend on publicly released language models. As a countermeasure, backdoor detection methods …
View article: When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models
When Parts Are Greater Than Sums: Individual LLM Components Can Outperform Full Models Open
This paper studies in-context learning by decomposing the output of large language models into the individual contributions of attention heads and MLPs (components). We observe curious components: good-performing ones that individually do …
View article: Pre-trained Large Language Models Use Fourier Features to Compute Addition
Pre-trained Large Language Models Use Fourier Features to Compute Addition Open
Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-trained LLMs add numbers using Fourier fea…
View article: Language Models can Infer Action Semantics for Symbolic Planners from Environment Feedback
Language Models can Infer Action Semantics for Symbolic Planners from Environment Feedback Open
Symbolic planners can discover a sequence of actions from initial to goal states given expert-defined, domain-specific logical action semantics. Large Language Models (LLMs) can directly generate such sequences, but limitations in reasonin…
View article: IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations Open
Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs. But do their capabilities change depending on the input modality? In this work, we propose $\textbf{IsoBench}…
View article: Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions?
Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions? Open
Data augmentation via back-translation is common when pretraining Vision-and-Language Navigation (VLN) models, even though the generated instructions are noisy. But: does that noise matter? We find that nonsensical or irrelevant language i…