Explanipedia

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper) Open

T. B. Brown, Benjamin F. Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan , et al. · 2023

This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and th…

PaLM: Scaling Language Modeling with Pathways Open

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra , et al. · 2022

Computer science Physics Economics

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model t…

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model Open

Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari , et al. · 2022

Computer science Physics

Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success,…

Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022) Open

Sarah Ebling, Emily Prud’hommeaux, Preethi Vaidyanathan, Sara Candeias, Cecilia Ovesdotter Alm , et al. · 2022

Computer science Physics

We present MozoLM, an open-source language model microservice package intended for use in AAC text-entry applications, with a particular focus on the design principles of the library.The intent of the library is to allow the ensembling of …

Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022) Open

Ting-Hao Huang, Vipul Raheja, Dongyeop Grammarly, Daniel Gissin, Mina Lee , et al. · 2022

Computer science

Today, data-to-text systems are used as commercial solutions for automated text production of large quantities of text.Therefore, they already represent a new technology of writing.This new technology requires the author, as an act of writ…

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda Open

Antonis Maronikolakis, Hinrich Schütze, Mark Stevenson, T. B. Brown, Benjamin F. Mann , et al. · 2021

Computer science Political science

Welcome to the fourth edition of the Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda.This is the second time we are running the workshop virtually, due to the COVID-19 pandemic.The pandemic has had a profou…

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials Open

Andrew Yates, Rodrigo Nogueira, Jimmy Lin, T. B. Brown, Benjamin Mann , et al. · 2021

Computer science Philosophy

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task.Although the most common formulation of text ranking is search, instances of the task can also be found i…

Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them\n on Images Open

Rewon Child · 2020

Computer science Mathematics

We present a hierarchical VAE that, for the first time, generates samples\nquickly while outperforming the PixelCNN in log-likelihood on all natural image\nbenchmarks. We begin by observing that, in theory, VAEs can actually represent\naut…

Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images Open

Rewon Child · 2020

Computer science Mathematics

We present a hierarchical VAE that, for the first time, generates samples quickly while outperforming the PixelCNN in log-likelihood on all natural image benchmarks. We begin by observing that, in theory, VAEs can actually represent autore…

Language Models are Few-Shot Learners Open

T. B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan , et al. · 2020

Computer science Economics Philosophy

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires…

Scaling Laws for Neural Language Models Open

Jared Kaplan, Sam McCandlish, Tom Henighan, T. B. Brown, Benjamin Chess , et al. · 2020

Computer science Political science Mathematics

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven …

Generating Long Sequences with Sparse Transformers Open

Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever · 2019

Computer science Mathematics Engineering

Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We als…

Exploring Neural Transducers for End-to-End Speech Recognition Open

Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur , et al. · 2017

Computer science Geography Political science

In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition. We show that, without any language model, Seq2Seq and RNN-Transducer models both outperfo…

Reducing Bias in Production Speech Models Open

Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur , et al. · 2017

Computer science Economics Mathematics

Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition. However, the causality and latency constraints of production systems put end-to-end sp…

Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting Open

Sercan Ö. Arık, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky , et al. · 2017

Computer science Biology

Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towar…

Active Learning for Speech Recognition: the Power of Gradients Open

Jiaji Huang, Rewon Child, Vinay Rao, Hairong Liu, Sanjeev Satheesh , et al. · 2016

Computer science Mathematics Business

In training speech recognition systems, labeling audio clips can be expensive, and not all data is equally valuable. Active learning aims to label only the most informative samples to reduce cost. For speech recognition, confidence scores …

Rewon Child YOU? Author Swipe