Timo Schick
YOU?
Author Swipe
View article: FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations
FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations Open
The accurate evaluation of differential treatment in language models to specific groups is critical to ensuring a positive and safe user experience. An ideal evaluation should have the properties of being robust, extendable to new groups o…
View article: Improving Wikipedia verifiability with AI
Improving Wikipedia verifiability with AI Open
Verifiability is a core content policy of Wikipedia: claims need to be backed by citations. Maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist hum…
View article: Evaluation of Faithfulness Using the Longest Supported Subsequence
Evaluation of Faithfulness Using the Longest Supported Subsequence Open
As increasingly sophisticated language models emerge, their trustworthiness becomes a pivotal issue, especially in tasks such as summarization and question-answering. Ensuring their responses are contextually grounded and faithful is chall…
View article: Self-Alignment with Instruction Backtranslation
Self-Alignment with Instruction Backtranslation Open
We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a languag…
View article: Active Learning Principles for In-Context Learning with Large Language Models
Active Learning Principles for In-Context Learning with Large Language Models Open
The remarkable advancements in large language models (LLMs) have significantly enhanced the performance in few-shot learning settings. By using only a small number of labeled examples, referred to as demonstrations, LLMs can effectively gr…
View article: LongForm: Effective Instruction Tuning with Reverse Instructions
LongForm: Effective Instruction Tuning with Reverse Instructions Open
Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-so…
View article: Augmented Language Models: a Survey
Augmented Language Models: a Survey Open
This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in c…
View article: Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools Open
Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup,…
View article: Semantic-Oriented Unlabeled Priming for Large-Scale Language Models
Semantic-Oriented Unlabeled Priming for Large-Scale Language Models Open
Due to the high costs associated with finetuning large language models, various recent works propose to adapt them to specific tasks without any parameter updates through in-context learning.Unfortunately, for in-context learning there is …
View article: MEAL: Stable and Active Learning for Few-Shot Prompting
MEAL: Stable and Active Learning for Few-Shot Prompting Open
Few-shot classification has made great strides due to foundation models that, through priming and prompting, are highly effective few-shot learners. However, this approach has high variance both across different sets of few shots (*data se…
View article: Task-aware Retrieval with Instructions
Task-aware Retrieval with Instructions Open
We study the problem of retrieval with instructions, where users provide explicit descriptions of their intent along with their queries to guide a retrieval system. Our solution is a general-purpose task-aware retrieval system, trained usi…
View article: Active Learning Principles for In-Context Learning with Large Language Models
Active Learning Principles for In-Context Learning with Large Language Models Open
The remarkable advancements in large language models (LLMs) have significantly enhanced predictive performance in few-shot learning settings. By using only a small number of labeled examples, referred to as demonstrations, LLMs can effecti…
View article: Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor Open
Instruction tuning enables pretrained language models to perform new tasks from inference-time natural language descriptions. These approaches rely on vast amounts of human supervision in the form of crowdsourced datasets or user interacti…
View article: Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor Open
Instruction tuning enables pretrained language models to perform new tasks from inference-time natural language descriptions. These approaches rely on vast amounts of human supervision in the form of crowdsourced datasets or user interacti…
View article: Task-aware Retrieval with Instructions
Task-aware Retrieval with Instructions Open
We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction …
View article: MEAL: Stable and Active Learning for Few-Shot Prompting
MEAL: Stable and Active Learning for Few-Shot Prompting Open
Few-shot classification has made great strides due to foundation models that, through priming and prompting, are highly effective few-shot learners. However, this approach has high variance both across different sets of few shots (data sel…
View article: Improving Wikipedia Verifiability with AI
Improving Wikipedia Verifiability with AI Open
Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this re…
View article: EditEval: An Instruction-Based Benchmark for Text Improvements
EditEval: An Instruction-Based Benchmark for Text Improvements Open
Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text. Writing, however, is naturally an iterative and incremental process that requires expertise in differ…
View article: PEER: A Collaborative Language Model
PEER: A Collaborative Language Model Open
Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the fi…
View article: Atlas: Few-shot Learning with Retrieval Augmented Language Models
Atlas: Few-shot Learning with Retrieval Augmented Language Models Open
Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to stor…
View article: Improving Wikipedia Verifiability with AI
Improving Wikipedia Verifiability with AI Open
Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this re…
View article: Leveraging QA Datasets to Improve Generative Data Augmentation
Leveraging QA Datasets to Improve Generative Data Augmentation Open
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation. In this work, we propose CONDA, an approach to further improve GLMs' ab…
View article: Semantic-Oriented Unlabeled Priming for Large-Scale Language Models
Semantic-Oriented Unlabeled Priming for Large-Scale Language Models Open
Due to the high costs associated with finetuning large language models, various recent works propose to adapt them to specific tasks without any parameter updates through in-context learning. Unfortunately, for in-context learning there is…
View article: True Few-Shot Learning with Prompts—A Real-World Perspective
True Few-Shot Learning with Prompts—A Real-World Perspective Open
Prompt-based approaches excel at few-shot learning. However, Perez et al. (2021) recently cast doubt on their performance as they had difficulty getting good results in a “true” few-shot setting in which prompts and hyperparameters cannot …
View article: Leveraging QA Datasets to Improve Generative Data Augmentation
Leveraging QA Datasets to Improve Generative Data Augmentation Open
The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation. In this work, we propose CONDA, an approach to further improve GLM's ab…
View article: CoDA21: Evaluating Language Understanding Capabilities of NLP Models With Context-Definition Alignment
CoDA21: Evaluating Language Understanding Capabilities of NLP Models With Context-Definition Alignment Open
Pretrained language models (PLMs) have achieved superhuman performance on many benchmarks, creating a need for harder tasks. We introduce CoDA21 (Context Definition Alignment), a challenging benchmark that measures natural language underst…
View article: True Few-Shot Learning with Prompts -- A Real-World Perspective
True Few-Shot Learning with Prompts -- A Real-World Perspective Open
Prompt-based approaches are strong at few-shot learning. However, Perez et al. (2021) have recently cast doubt on their performance because they had difficulty getting good results in a "true" few-shot setting in which prompts and hyperpar…
View article: Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Open
When trained on large, unfiltered crawls from the internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: they often generate racist, sexist, violent or otherwise toxic language. As la…