Rewon Child
YOU?
Author Swipe
View article: Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper)
Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper) Open
This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and th…
View article: PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways Open
Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model t…
View article: Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model Open
Pretrained general-purpose language models can achieve state-of-the-art accuracies in various natural language processing domains by adapting to downstream tasks via zero-shot, few-shot and fine-tuning techniques. Because of their success,…
View article: Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022) Open
We present MozoLM, an open-source language model microservice package intended for use in AAC text-entry applications, with a particular focus on the design principles of the library.The intent of the library is to allow the ensembling of …
View article: Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022)
Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022) Open
Today, data-to-text systems are used as commercial solutions for automated text production of large quantities of text.Therefore, they already represent a new technology of writing.This new technology requires the author, as an act of writ…
View article: Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda Open
Welcome to the fourth edition of the Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda.This is the second time we are running the workshop virtually, due to the COVID-19 pandemic.The pandemic has had a profou…
View article: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials Open
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task.Although the most common formulation of text ranking is search, instances of the task can also be found i…
View article: Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them\n on Images
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them\n on Images Open
We present a hierarchical VAE that, for the first time, generates samples\nquickly while outperforming the PixelCNN in log-likelihood on all natural image\nbenchmarks. We begin by observing that, in theory, VAEs can actually represent\naut…
View article: Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images Open
We present a hierarchical VAE that, for the first time, generates samples quickly while outperforming the PixelCNN in log-likelihood on all natural image benchmarks. We begin by observing that, in theory, VAEs can actually represent autore…
View article: Language Models are Few-Shot Learners
Language Models are Few-Shot Learners Open
Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires…
View article: Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models Open
We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven …
View article: Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers Open
Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We als…
View article: Exploring Neural Transducers for End-to-End Speech Recognition
Exploring Neural Transducers for End-to-End Speech Recognition Open
In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition. We show that, without any language model, Seq2Seq and RNN-Transducer models both outperfo…
View article: Reducing Bias in Production Speech Models
Reducing Bias in Production Speech Models Open
Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition. However, the causality and latency constraints of production systems put end-to-end sp…
View article: Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting
Convolutional Recurrent Neural Networks for Small-Footprint Keyword Spotting Open
Keyword spotting (KWS) constitutes a major component of human-technology interfaces. Maximizing the detection accuracy at a low false alarm (FA) rate, while minimizing the footprint size, latency and complexity are the goals for KWS. Towar…
View article: Active Learning for Speech Recognition: the Power of Gradients
Active Learning for Speech Recognition: the Power of Gradients Open
In training speech recognition systems, labeling audio clips can be expensive, and not all data is equally valuable. Active learning aims to label only the most informative samples to reduce cost. For speech recognition, confidence scores …