Explanipedia

Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents Open

Vardaan Pahuja, Yadong Lü, Corby Rosset, Boyu Gou, Arindam Mitra , et al. · 2025

Recent success in large multimodal models (LMMs) has sparked promising applications of agents capable of autonomously completing complex web tasks. While open-source LMM agents have made significant advances in offline evaluation benchmark…

AgentInstruct: Toward Generative Teaching with Agentic Flows Open

Arindam Mitra, Luciano Del Corro, Guo‐qing Zheng, Shweti Mahajan, Dany Rouhana , et al. · 2024

Psychology Computer science

Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imit…

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Open

Tengyang Xie, Dylan J. Foster, Akshay Krishnamurthy, Corby Rosset, Ahmed Hassan Awadallah , et al. · 2024

Computer science Mathematics Physics

Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the…

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels Open

Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu , et al. · 2024

Computer science Geography

Recent breakthroughs in large models have highlighted the critical\nsignificance of data scale, labels and modals. In this paper, we introduce MS\nMARCO Web Search, the first large-scale information-rich web dataset, featuring\nmillions of…

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Open

Marah Abdin, Sam Adé Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Hassan Awadallah , et al. · 2024

Computer science Philosophy

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5…

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Open

Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Hassan Awadallah , et al. · 2024

Computer science Economics

This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training LLMs involves Reinforcement Learning fro…

Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents Open

Corby Rosset, Ho-Lam Chung, Guanghui Qin, Ethan C. Chau, Zhuo Feng , et al. · 2024

Computer science

Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ``known unknowns'' with clear indi…

Orca-Math: Unlocking the potential of SLMs in Grade School Math Open

Arindam Mitra, Hamed Khanpour, Corby Rosset, Ahmed Hassan Awadallah · 2024

Mathematics

Mathematical word problem-solving has long been recognized as a complex task for small language models (SLMs). A recent study hypothesized that the smallest model size, needed to achieve over 80% accuracy on the GSM8K benchmark, is 34 bill…

LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts Open

Helia Hashemi, Jason Eisner, Corby Rosset, Benjamin Van Durme, Chris Kedzie · 2024

Computer science Psychology History

This paper introduces a framework for the automated evaluation of natural\nlanguage texts. A manually constructed rubric describes how to assess multiple\ndimensions of interest. To evaluate a text, a large language model (LLM) is\nprompte…

Axiomatic Preference Modeling for Longform Question Answering Open

Corby Rosset, Guo‐qing Zheng, Victor Dibia, Ahmed Hassan Awadallah, Paul N. Bennett · 2023

Computer science Mathematics Engineering

The remarkable abilities of large language models (LLMs) like GPT-4 partially stem from post-training processes like Reinforcement Learning from Human Feedback (RLHF) involving human preferences encoded in a reward model. However, these re…

Orca 2: Teaching Small Language Models How to Reason Open

Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andrés Codas, Clarisse Simoes , et al. · 2023

Computer science Psychology Mathematics

Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can e…

Overview of the TREC 2023 Product Product Search Track Open

Daniel Campos, Surya Kallumadi, Corby Rosset, Cheng Xiang Zhai, Alessandro Magnani · 2023

Computer science Mathematics

This is the first year of the TREC Product search track. The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy. This year we leverage th…

Automatic Pair Construction for Contrastive Post-training Open

Canwen Xu, Corby Rosset, Luciano Del Corro, Shweti Mahajan, Julian McAuley , et al. · 2023

Computer science Psychology Mathematics

Alignment serves as an important step to steer large language models (LLMs) towards human preferences. In this paper, we propose an automatic way to construct contrastive data for LLM, using preference pairs from multiple models of varying…

Dodo: Dynamic Contextual Compression for Decoder-only LMs Open

Guanghui Qin, Corby Rosset, Ethan C. Chau, Nikhil Rao, Benjamin Van Durme · 2023

Computer science Mathematics Physics

Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of h…

Zero-shot Clarifying Question Generation for Conversational Search Open

Zhenduo Wang, Yuancheng Tu, Corby Rosset, Nick Craswell, Ming Wu , et al. · 2023

Computer science Mathematics Philosophy

A long-standing challenge for search and conversational assistants is query intention detection in ambiguous queries. Asking clarifying questions in conversational search has been widely studied and considered an effective solution to reso…

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories Open

Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han , et al. · 2023

Computer science Mathematics Philosophy

In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora ("external memories"), with the…

Zero-shot Clarifying Question Generation for Conversational Search Open

Zhenduo Wang, Yuancheng Tu, Corby Rosset, Nick Craswell, Ming Wu , et al. · 2023

Computer science Mathematics Economics

A long-standing challenge for search and conversational assistants is query intention detection in ambiguous queries. Asking clarifying questions in conversational search has been widely studied and considered an effective solution to reso…

Axiomatic Preference Modeling for Longform Question Answering Open

Corby Rosset, Guo‐qing Zheng, Victor Dibia, Ahmed Hassan Awadallah, Paul N. Bennett · 2023

Computer science Mathematics Engineering

The remarkable abilities of large language models (LLMs) like ChatGPT and GPT-4 partially stem from the post-training processes involving human preferences encoded within a reward model as part of a Reinforcement Learning from Human Feedba…

Augmenting Zero-Shot Dense Retrievers with Plug-in Mixture-of-Memories Open

Suyu Ge, Chenyan Xiong, Corby Rosset, Arnold Overwijk, Jiawei Han , et al. · 2023

Computer science Mathematics Geography

In this paper we improve the zero-shot generalization ability of language models via Mixture-Of-Memory Augmentation (MoMA), a mechanism that retrieves augmentation documents from multiple information corpora (external memories), with the o…

Knowledge-Aware Language Model Pretraining Open

Corby Rosset, Chenyan Xiong, Vu Minh Hieu Phan, Song Xia, Paul N. Bennett , et al. · 2020

Computer science Psychology Philosophy

How much knowledge do pretrained language models hold? Recent research observed that pretrained transformers are adept at modeling semantics but it is unclear to what degree they grasp human knowledge, or how to ensure they do so. In this …

An Axiomatic Approach to Regularizing Neural Ranking Models Open

Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Song Xia , et al. · 2019

Computer science Mathematics Political science

Axiomatic information retrieval (IR) seeks a set of principle properties desirable in IR models. These properties when formally expressed provide guidance in the search for better relevance estimation functions. Neural ranking models typic…

Incorporating Query Term Independence Assumption for Efficient Retrieval and Ranking using Deep Neural Networks Open

Bhaskar Mitra, Corby Rosset, David Hawking, Nick Craswell, Fernando Díaz , et al. · 2019

Computer science Mathematics Physics

Classical information retrieval (IR) methods, such as query likelihood and BM25, score documents independently w.r.t. each query term, and then accumulate the scores. Assuming query term independence allows precomputing term-document score…

Optimizing Query Evaluations Using Reinforcement Learning for Web Search Open

Corby Rosset, Damien Jose, Gargi Ghosh, Bhaskar Mitra, Saurabh Tiwary · 2018

Computer science Engineering

In web search, typically a candidate generation step selects a small set of documents---from collections containing as many as billions of web pages---that are subsequently ranked and pruned before being presented to the user. In Bing, the…

Corby Rosset YOU? Author Swipe