LongT5: Efficient Text-To-Text Transformer for Long Sequences Article Swipe

PDF

Mandy Guo , Joshua Ainslie , David Uthus , Santiago Ontañón , Jianmo Ni , Yun-Hsuan Sung , Yinfei Yang ·

YOU? · · 2022 · Open Access · · DOI: https://doi.org/10.18653/v1/2022.findings-naacl.55

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present LongT5, a new model that explores the effects of scaling both the input length and model size at the same time. Specifically, we integrate attention ideas from long-input transformers (ETC), and adopt pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call Transient Global (TGlobal), which mimics ETC’s local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization and question answering tasks, as well as outperform the original T5 models on these tasks. We have open sourced our architecture and training code, as well as our pre-trained model checkpoints.

Related Topics To Compare & Contrast

Computer Science

Transformer

Architecture

Artificial Intelligence

Electrical Engineering

Voltage

Concepts

Automatic summarization Computer science Transformer Scalability Architecture Artificial intelligence Question answering Machine learning Natural language processing Database Engineering Art Visual arts Electrical engineering Voltage

Metadata

Type: article
Language: en
Landing Page: https://doi.org/10.18653/v1/2022.findings-naacl.55
PDF: https://aclanthology.org/2022.findings-naacl.55.pdf
OA Status: hybrid
Cited By: 183
References: 39
Related Works: 10
OpenAlex ID: https://openalex.org/W4225727438

All OpenAlex metadata

Raw OpenAlex JSON

No additional metadata available.