Explanipedia

Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks Open

Paul Suganthan, Fédor Moiseev, Limei Yan, Junru Wu, Jianmo Ni , et al. · 2025

Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant i…

ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation Open

Yupeng Hou, Jianmo Ni, Zhankui He, Noveen Sachdeva, Wang-Cheng Kang , et al. · 2025

Generative recommendation (GR) is an emerging paradigm where user actions are tokenized into discrete token patterns and autoregressively generated as predictions. However, existing GR models tokenize each action independently, assigning t…

Improving Data Efficiency for Recommenders and LLMs Open

Noveen Sachdeva, Benjamin Coleman, Wang-Cheng Kang, Jianmo Ni, James Caverlee , et al. · 2024

How to Train Data-Efficient LLMs Open

Noveen Sachdeva, Benjamin Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong , et al. · 2024

The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consu…

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval Open

Nandan Thakur, Jianmo Ni, Gustavo Hernández Ábrego, John Wieting, Jimmy Lin , et al. · 2023

There has been limited success for dense retrieval models in multilingual retrieval, due to uneven and scarce training data available across multiple languages. Synthetic training data generation is promising (e.g., InPars or Promptagator)…

Farzi Data: Autoregressive Data Distillation Open

Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng , et al. · 2023

We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small nu…

RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses Open

Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma , et al. · 2023

Pretrained language models such as BERT have been shown to be exceptionally effective for text ranking. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts usually f…

Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction Open

Wang-Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong , et al. · 2023

Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner. However, the extent to which LLMs can comprehend user preferences based on their previous behavior rema…

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset Open

Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer , et al. · 2023

Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little a…

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding Open

Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer , et al. · 2023

Webpages have been a rich, scalable resource for vision-language and language only tasks. Yet only pieces of webpages are kept in existing datasets: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage task…

RISE: Leveraging Retrieval Techniques for Summarization Evaluation Open

David Uthus, Jianmo Ni · 2023

Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging …

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding Open

Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer , et al. · 2023

Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan Plummer, Kate Saenko, Jianmo Ni, Mandy Guo. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

HYRR: Hybrid Infused Reranking for Passage Retrieval Open

Jing Lü, Keith Hall, Ji Ma, Jianmo Ni · 2022

We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a framework for training rerankers based on a hybrid of BM25 and neural retrieval models. Retrievers based on hybrid models have been shown to outperform both BM25 and neur…

RISE: Leveraging Retrieval Techniques for Summarization Evaluation Open

David Uthus, Jianmo Ni · 2022

Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging …

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models Open

Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor , et al. · 2022

Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM…

RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses Open

Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma , et al. · 2022

Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts us…

Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts Open

Cícero Nogueira dos Santos, Zhe Dong, Daniel Cer, John Nham, Siamak Shakeri , et al. · 2022

Soft prompts have been recently proposed as a tool for adapting large frozen language models (LMs) to new tasks. In this work, we repurpose soft prompts to the task of injecting world knowledge into LMs. We introduce a method to train soft…

Promptagator: Few-shot Dense Retrieval From 8 Examples Open

Zhuyun Dai, Vincent Y. Zhao, Ji Ma, Yi Luan, Jianmo Ni , et al. · 2022

Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to g…

Knowledge-aware Neural Collective Matrix Factorization for Cross-domain Recommendation Open

Li Zhang, Ge Yan, Jun Ma, Jianmo Ni, Haiping Lu · 2022

Cross-domain recommendation (CDR) can help customers find more satisfying items in different domains. Existing CDR models mainly use common users or mapping functions as bridges between domains but have very limited exploration in fully ut…

Exploring Dual Encoder Architectures for Question Answering Open

Zhe Dong, Jianmo Ni, Dan Bikel, Enrique Alfonseca, Yuan Wang , et al. · 2022

Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. Previous research focuses on two major types of dual encoders, Siamese Dual Encoder (SDE), with parameters shared across two e…

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ Open

Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James T. Bradbury , et al. · 2022

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to…

Transformer Memory as a Differentiable Search Index Open

Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri , et al. · 2022

In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Searc…

Large Dual Encoders Are Generalizable Retrievers Open

Jianmo Ni, Chen Qu, Jing Lü, Zhuyun Dai, Gustavo Hernández Ábrego , et al. · 2022

Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernandez Abrego, Ji Ma, Vincent Zhao, Yi Luan, Keith Hall, Ming-Wei Chang, Yinfei Yang. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.

SHARE: a System for Hierarchical Assistive Recipe Editing Open

S. X. Li, Yufei Li, Jianmo Ni, Julian McAuley · 2022

The large population of home cooks with dietary restrictions is under-served by existing cooking resources and recipe generation models. To help them, we propose the task of controllable recipe editing: adapt a base recipe to satisfy a use…

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Open

Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith Hall , et al. · 2022

We provide the first exploration of sentence embeddings from text-to-text transformers (T5) including the effects of scaling up sentence encoders to 11B parameters. Sentence embeddings are broadly useful for language processing tasks. Whil…

LongT5: Efficient Text-To-Text Transformer for Long Sequences Open

Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontañón, Jianmo Ni , et al. · 2022

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present LongT5, a new model that explores the effects of scali…

Exploring Dual Encoder Architectures for Question Answering Open

Zhe Dong, Jianmo Ni, Dan Bikel, Enrique Alfonseca, Yuan Wang , et al. · 2022

Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. There are two major types of dual encoders, Siamese Dual Encoders (SDE), with parameters shared across two encoders, and Asymm…

LongT5: Efficient Text-To-Text Transformer for Long Sequences Open

Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontañón, Jianmo Ni , et al. · 2021

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the …

Large Dual Encoders Are Generalizable Retrievers Open

Jianmo Ni, Chen Qu, Jing Lü, Zhuyun Dai, Gustavo Hernández Ábrego , et al. · 2021

It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product b…

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning Open

Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Zheng , et al. · 2021

Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training. Towards this goal, this …

Jianmo Ni YOU? Author Swipe