Jianmo Ni
YOU?
Author Swipe
View article: Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks
Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks Open
Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant i…
View article: ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation
ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation Open
Generative recommendation (GR) is an emerging paradigm where user actions are tokenized into discrete token patterns and autoregressively generated as predictions. However, existing GR models tokenize each action independently, assigning t…
View article: Improving Data Efficiency for Recommenders and LLMs
Improving Data Efficiency for Recommenders and LLMs Open
View article: How to Train Data-Efficient LLMs
How to Train Data-Efficient LLMs Open
The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consu…
View article: Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval Open
There has been limited success for dense retrieval models in multilingual retrieval, due to uneven and scarce training data available across multiple languages. Synthetic training data generation is promising (e.g., InPars or Promptagator)…
View article: Farzi Data: Autoregressive Data Distillation
Farzi Data: Autoregressive Data Distillation Open
We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small nu…
View article: RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses
RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses Open
Pretrained language models such as BERT have been shown to be exceptionally effective for text ranking. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts usually f…
View article: Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction
Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction Open
Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner. However, the extent to which LLMs can comprehend user preferences based on their previous behavior rema…
View article: WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset
WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset Open
Webpages have been a rich resource for language and vision-language tasks. Yet only pieces of webpages are kept: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage tasks have resultingly received little a…
View article: A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding
A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding Open
Webpages have been a rich, scalable resource for vision-language and language only tasks. Yet only pieces of webpages are kept in existing datasets: image-caption pairs, long text articles, or raw HTML, never all in one place. Webpage task…
View article: RISE: Leveraging Retrieval Techniques for Summarization Evaluation
RISE: Leveraging Retrieval Techniques for Summarization Evaluation Open
Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging …
View article: A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding
A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding Open
Andrea Burns, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan Plummer, Kate Saenko, Jianmo Ni, Mandy Guo. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.
View article: HYRR: Hybrid Infused Reranking for Passage Retrieval
HYRR: Hybrid Infused Reranking for Passage Retrieval Open
We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a framework for training rerankers based on a hybrid of BM25 and neural retrieval models. Retrievers based on hybrid models have been shown to outperform both BM25 and neur…
View article: RISE: Leveraging Retrieval Techniques for Summarization Evaluation
RISE: Leveraging Retrieval Techniques for Summarization Evaluation Open
Evaluating automatically-generated text summaries is a challenging task. While there have been many interesting approaches, they still fall short of human evaluations. We present RISE, a new approach for evaluating summaries by leveraging …
View article: Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models Open
Large language models (LLMs) have shown impressive results while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM…
View article: RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses
RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses Open
Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts us…
View article: Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts
Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts Open
Soft prompts have been recently proposed as a tool for adapting large frozen language models (LMs) to new tasks. In this work, we repurpose soft prompts to the task of injecting world knowledge into LMs. We introduce a method to train soft…
View article: Promptagator: Few-shot Dense Retrieval From 8 Examples
Promptagator: Few-shot Dense Retrieval From 8 Examples Open
Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to g…
View article: Knowledge-aware Neural Collective Matrix Factorization for Cross-domain Recommendation
Knowledge-aware Neural Collective Matrix Factorization for Cross-domain Recommendation Open
Cross-domain recommendation (CDR) can help customers find more satisfying items in different domains. Existing CDR models mainly use common users or mapping functions as bridges between domains but have very limited exploration in fully ut…
View article: Exploring Dual Encoder Architectures for Question Answering
Exploring Dual Encoder Architectures for Question Answering Open
Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. Previous research focuses on two major types of dual encoders, Siamese Dual Encoder (SDE), with parameters shared across two e…
View article: Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$
Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ Open
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to…
View article: Transformer Memory as a Differentiable Search Index
Transformer Memory as a Differentiable Search Index Open
In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model. To this end, we introduce the Differentiable Searc…
View article: Large Dual Encoders Are Generalizable Retrievers
Large Dual Encoders Are Generalizable Retrievers Open
Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernandez Abrego, Ji Ma, Vincent Zhao, Yi Luan, Keith Hall, Ming-Wei Chang, Yinfei Yang. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
View article: SHARE: a System for Hierarchical Assistive Recipe Editing
SHARE: a System for Hierarchical Assistive Recipe Editing Open
The large population of home cooks with dietary restrictions is under-served by existing cooking resources and recipe generation models. To help them, we propose the task of controllable recipe editing: adapt a base recipe to satisfy a use…
View article: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models Open
We provide the first exploration of sentence embeddings from text-to-text transformers (T5) including the effects of scaling up sentence encoders to 11B parameters. Sentence embeddings are broadly useful for language processing tasks. Whil…
View article: LongT5: Efficient Text-To-Text Transformer for Long Sequences
LongT5: Efficient Text-To-Text Transformer for Long Sequences Open
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present LongT5, a new model that explores the effects of scali…
View article: Exploring Dual Encoder Architectures for Question Answering
Exploring Dual Encoder Architectures for Question Answering Open
Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. There are two major types of dual encoders, Siamese Dual Encoders (SDE), with parameters shared across two encoders, and Asymm…
View article: LongT5: Efficient Text-To-Text Transformer for Long Sequences
LongT5: Efficient Text-To-Text Transformer for Long Sequences Open
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the …
View article: Large Dual Encoders Are Generalizable Retrievers
Large Dual Encoders Are Generalizable Retrievers Open
It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product b…
View article: ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning Open
Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training. Towards this goal, this …