Explanipedia

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics Open

Christian Blum, Katja Filippova, Ann Yuan, Asma Ghandeharioun, Julian Zimmert , et al. · 2025

Large language models (LLMs) struggle with cross-lingual knowledge transfer: they hallucinate when asked in one language about facts expressed in a different language during training. This work introduces a controlled setting to study the …

Just Say No to Single Embeddings: Why Your AI Needs Multiple Perspectives Open

Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce , et al. · 2025

Note: This is a work in progress document This exploratory work analyzes 229 multi-agent AI dialogues byprojecting them into five different embedding spaces (transformer-based and classical) and measuring geometric properties. We find astr…

LaMPost: AI Writing Assistance for Adults with Dyslexia Using Large Language Models Open

Steven M. Goodman, Erin Buehler, Patrick Clary, Andy Coenen, Aaron Donsbach , et al. · 2024

The natural language capabilities demonstrated by large language models (LLMs) highlight an opportunity for new writing support tools that address the varied needs of people with dyslexia. We present LaMPost, a prototype email editor that …

Who's asking? User personas and the mechanics of latent misalignment Open

Asma Ghandeharioun, Ann Yuan, Marius Guerard, Emily Reif, Michael A. Lepori , et al. · 2024

Despite investments in improving model safety, studies show that misaligned capabilities remain latent in safety-tuned models. In this work, we shed light on the mechanics of this phenomenon. First, we show that even when model generations…

ConstitutionalExperts: Training a Mixture of Principle-based Prompts Open

Savvas Petridis, Ben Wedin, Ann Yuan, James Wexler, Nithum Thain · 2024

Large language models (LLMs) are highly capable at a variety of tasks given the right prompt, but writing one is still a difficult and tedious process. In this work, we introduce ConstitutionalExperts, a method for learning a prompt consis…

Towards Agile Text Classifiers for Everyone Open

Maximilian Mozes, Jessica D. Hoffmann, Katrin Tomanek, Muhamed Kouate, Nithum Thain , et al. · 2023

Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies re…

Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning Open

Maximilian Mozes, Tolga Bolukbasi, Ann Yuan, Frederick Liu, Nithum Thain , et al. · 2023

Pretrained large language models (LLMs) are able to solve a wide variety of tasks through transfer learning. Various explainability methods have been developed to investigate their decision making process. TracIn (Pruthi et al., 2020) is o…

Towards Agile Text Classifiers for Everyone Open

Maximilian Mozes, Jessica D. Hoffmann, Katrin Tomanek, Muhamed Kouate, Nithum Thain , et al. · 2023

Text-based safety classifiers are widely used for content moderation and increasingly to tune generative language model behavior - a topic of growing concern for the safety of digital assistants and chatbots. However, different policies re…

Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers Open

Daphne Ippolito, Ann Yuan, Andy Coenen, Sehmon Burnam · 2022

Recent developments in natural language generation (NLG) using neural language models have brought us closer than ever to the goal of building AI-powered creative writing tools. However, most prior work on human-AI collaboration in the cre…

LaMPost: Design and Evaluation of an AI-assisted Email Writing Prototype for Adults with Dyslexia Open

Steven M. Goodman, Erin Buehler, Patrick Clary, Andy Coenen, Aaron Donsbach , et al. · 2022

Prior work has explored the writing challenges experienced by people with\ndyslexia, and the potential for new spelling, grammar, and word retrieval\ntechnologies to address these challenges. However, the capabilities for natural\nlanguage…

The Case for a Single Model that can Both Generate Continuations and Fill in the Blank Open

Daphne Ippolito, Liam Dugan, Emily Reif, Ann Yuan, Andy Coenen , et al. · 2022

The task of inserting text into a specified position in a passage, known as fill in the blank (FitB), is useful for a variety of applications where writers interact with a natural language generation (NLG) system to craft text. While previ…

Perspective-Taking to Reduce Affective Polarization on Social Media Open

Martin Saveski, Nabeel Gillani, Ann Yuan, Prashanth Vijayaraghavan, Deb Roy · 2022

The intensification of affective polarization worldwide has raised new questions about how social media platforms might be further fracturing an already-divided public sphere. As opposed to ideological polarization, affective polarization …

Wordcraft: Story Writing With Large Language Models Open

Ann Yuan, Andy Coenen, Emily Reif, Daphne Ippolito · 2022

The latest generation of large neural language models such as GPT-3 have achieved new levels of performance on benchmarks for language understanding and generation. These models have even demonstrated an ability to perform arbitrary tasks …

A Recipe for Arbitrary Text Style Transfer with Large Language Models Open

Emily Reif, Daphne Ippolito, Ann Yuan, Andy Coenen, Chris Callison-Burch , et al. · 2022

Emily Reif, Daphne Ippolito, Ann Yuan, Andy Coenen, Chris Callison-Burch, Jason Wei. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2022.

The Case for a Single Model that can Both Generate Continuations and Fill-in-the-Blank Open

Daphne Ippolito, Liam Dugan, Emily Reif, Ann Yuan, Andy Coenen , et al. · 2022

The task of inserting text into a specified position in a passage, known as fill in the blank (FitB), is useful for a variety of applications where writers interact with a natural language generation (NLG) system to craft text. While previ…

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets Open

Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen , et al. · 2021

NLP researchers need more, higher-quality text datasets. Human-labeled datasets are expensive to collect, while datasets collected via automatic retrieval from the web such as WikiBio are noisy and can include undesired biases. Moreover, d…

Perspective-taking to Reduce Affective Polarization on Social Media Open

Martin Saveski, Nabeel Gillani, Ann Yuan, Prashanth Vijayaraghavan, Deb Roy · 2021

The intensification of affective polarization worldwide has raised new questions about how social media platforms might be further fracturing an already-divided public sphere. As opposed to ideological polarization, affective polarization …

A Recipe For Arbitrary Text Style Transfer with Large Language Models Open

Emily Reif, Daphne Ippolito, Ann Yuan, Andy Coenen, Chris Callison-Burch , et al. · 2021

In this paper, we leverage large language models (LMs) to perform zero-shot text style transfer. We present a prompting method that we call augmented zero-shot learning, which frames style transfer as a sentence rewriting task and requires…

Wordcraft: a Human-AI Collaborative Editor for Story Writing Open

Andy Coenen, Luke M. Davis, Daphne Ippolito, Emily Reif, Ann Yuan · 2021

As neural language models grow in effectiveness, they are increasingly being applied in real-world settings. However these applications tend to be limited in the modes of interaction they support. In this extended abstract, we propose Word…

An Interpretability Illusion for BERT Open

Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif , et al. · 2021

We describe an "interpretability illusion" that arises when analyzing the BERT model. Activations of individual neurons in the network may spuriously appear to encode a single, simple concept, when in fact they are encoding something far m…

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models Open

Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen , et al. · 2020

We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform po…

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models Open

Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen , et al. · 2020

Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan. Proceedings of the 2020 Conference on Empirical Methods in Natural Language…

TensorFlow.js: Machine Learning for the Web and Beyond Open

Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann Yuan, Nick Kreeger , et al. · 2019

TensorFlow.js is a library for building and executing machine learning algorithms in JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is part of the TensorFlow ecosystem, providing a set of …

TensorFlow.js: Machine Learning for the Web and Beyond Open

Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann Yuan, Nick Kreeger , et al. · 2019

TensorFlow.js is a library for building and executing machine learning algorithms in JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is part of the TensorFlow ecosystem, providing a set of …

Me, My Echo Chamber, and I: Introspection on Social Media Polarization Open

Nabeel Gillani, Ann Yuan, Martin Saveski, Soroush Vosoughi, Deb Roy · 2018

Homophily -- our tendency to surround ourselves with others who share our perspectives and opinions about the world -- is both a part of human nature and an organizing principle underpinning many of our digital social networks. However, wh…

Me, My Echo Chamber, and I Open

Nabeel Gillani, Ann Yuan, Martin Saveski, Soroush Vosoughi, Deb Roy · 2018

Homophily - our tendency to surround ourselves with others who share our perspectives and opinions about the world - is both a part of human nature and an organizing principle underpinning many of our digital social networks. However, when…

Mapping Twitter Conversation Landscapes Open

Soroush Vosoughi, Prashanth Vijayaraghavan, Ann Yuan, Deb Roy · 2017

While the most ambitious polls are based on standardized interviews with a few thousand people, millions are tweeting freely and publicly in their own voices about issues they care about. This data offers a vibrant 24/7 snapshot of people'…

TweetVista: An AI-Powered Interactive Tool for Exploring Conversations on Twitter Open

Prashanth Vijayaraghavan, Soroush Vosoughi, Ann Yuan, Deb Roy · 2017

We present TweetVista, an interactive web-based tool for mapping the conversation landscapes on Twitter. TweetVista is an intelligent and interactive desktop web application for exploring the conversation landscapes on Twitter. Given a dat…

Ann Yuan YOU? Author Swipe