Corby Rosset
YOU?
Author Swipe
View article: Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Open
Recent success in large multimodal models (LMMs) has sparked promising applications of agents capable of autonomously completing complex web tasks. While open-source LMM agents have made significant advances in offline evaluation benchmark…
View article: AgentInstruct: Toward Generative Teaching with Agentic Flows
AgentInstruct: Toward Generative Teaching with Agentic Flows Open
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imit…
View article: Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Open
Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the…
View article: MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels
MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels Open
Recent breakthroughs in large models have highlighted the critical\nsignificance of data scale, labels and modals. In this paper, we introduce MS\nMARCO Web Search, the first large-scale information-rich web dataset, featuring\nmillions of…
View article: Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Open
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5…
View article: Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Open
This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training LLMs involves Reinforcement Learning fro…
View article: Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents Open
Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ``known unknowns'' with clear indi…
View article: Orca-Math: Unlocking the potential of SLMs in Grade School Math
Orca-Math: Unlocking the potential of SLMs in Grade School Math Open
Mathematical word problem-solving has long been recognized as a complex task for small language models (SLMs). A recent study hypothesized that the smallest model size, needed to achieve over 80% accuracy on the GSM8K benchmark, is 34 bill…
View article: LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts Open
This paper introduces a framework for the automated evaluation of natural\nlanguage texts. A manually constructed rubric describes how to assess multiple\ndimensions of interest. To evaluate a text, a large language model (LLM) is\nprompte…
View article: Axiomatic Preference Modeling for Longform Question Answering
Axiomatic Preference Modeling for Longform Question Answering Open
The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model. However, these re…
View article: Orca 2: Teaching Small Language Models How to Reason
Orca 2: Teaching Small Language Models How to Reason Open
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can e…
View article: Overview of the TREC 2023 Product Product Search Track
Overview of the TREC 2023 Product Product Search Track Open
This is the first year of the TREC Product search track. The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy. This year we leverage th…
View article: Automatic Pair Construction for Contrastive Post-training
Automatic Pair Construction for Contrastive Post-training Open
Alignment serves as an important step to steer large language models (LLMs) towards human preferences. In this paper, we propose an automatic way to construct contrastive data for LLM, using preference pairs from multiple models of varying…
View article: Dodo: Dynamic Contextual Compression for Decoder-only LMs
Dodo: Dynamic Contextual Compression for Decoder-only LMs Open
Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of h…
View article: Zero-shot Clarifying Question Generation for Conversational Search
Zero-shot Clarifying Question Generation for Conversational Search Open
A long-standing challenge for search and conversational assistants is query intention detection in ambiguous queries. Asking clarifying questions in conversational search has been widely studied and considered an effective solution to reso…
View article: Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories
Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories Open
In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the…
View article: Zero-shot Clarifying Question Generation for Conversational Search
Zero-shot Clarifying Question Generation for Conversational Search Open
A long-standing challenge for search and conversational assistants is query intention detection in ambiguous queries. Asking clarifying questions in conversational search has been widely studied and considered an effective solution to reso…
View article: Axiomatic Preference Modeling for Longform Question Answering
Axiomatic Preference Modeling for Longform Question Answering Open
The remarkable abilities of large language models (LLMs) like ChatGPT and GPT-4 partially stem from the post-training processes involving human preferences encoded within a reward model as part of a Reinforcement Learning from Human Feedba…
View article: Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories
Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories Open
In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora (external memories), with the o…
View article: Knowledge-Aware Language Model Pretraining
Knowledge-Aware Language Model Pretraining Open
How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this …
View article: An Axiomatic Approach to Regularizing Neural Ranking Models
An Axiomatic Approach to Regularizing Neural Ranking Models Open
Axiomatic information retrieval (IR) seeks a set of principle properties desirable in IR models. These properties when formally expressed provide guidance in the search for better relevance estimation functions. Neural ranking models typic…
View article: Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks
Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks Open
Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document score…
View article: Optimizing Query Evaluations Using Reinforcement Learning for Web Search
Optimizing Query Evaluations Using Reinforcement Learning for Web Search Open
In web search, typically a candidate generation step selects a small set of documents---from collections containing as many as billions of web pages---that are subsequently ranked and pruned before being presented to the user. In Bing, the…