Explanipedia

Improving Informally Romanized Language Identification Open

Adrian Benton, Alexander Gutkin, Christo Kirov, Brian Roark · 2025

The Latin script is often used to informally write languages with non-Latin native scripts. In many cases (e.g., most languages in India), the lack of conventional spelling in the Latin script results in high spelling variability. Such rom…

Exploring and Improving Drafts in Blockwise Parallel Decoding Open

Taehyeon Kim, Ananda Theertha Suresh, Kishore Papineni, Michael Riley, Sanjiv Kumar , et al. · 2024

Computer science Mathematics

Despite the remarkable strides made by autoregressive language models, their potential is often hampered by the slow inference speeds inherent in sequential token generation. Blockwise parallel decoding (BPD) was proposed by Stern et al. a…

Weakly Supervised Headline Dependency Parsing Open

Adrian Benton, Tianze Shi, Ozan İrsoy, Igor Malioutov · 2023

Computer science Philosophy

English news headlines form a register with unique syntactic properties that have been documented in linguistics literature since the 1930s. However, headlines have received surprisingly little attention from the NLP syntactic parsing comm…

The English Headline Treebank corpus Open

Adrian Benton, Tianze Shi, Ozan İrsoy, Igor Malioutov · 2022

Computer science Philosophy

This repository contains the evaluation sets used in A Benton, T Shi, O İrsoy, and I Malioutov."Weakly Supervised Headline Dependency Parsing". Findings of EMNLP. 2022. This dataset contains parse annotations for English news headlines an…

The English Headline Treebank corpus Open

Adrian Benton, Tianze Shi, Ozan İrsoy, Igor Malioutov · 2022

Computer science Philosophy

This repository contains the evaluation sets used in A Benton, T Shi, O İrsoy, and I Malioutov."Weakly Supervised Headline Dependency Parsing". Findings of EMNLP. 2022. This dataset contains parse annotations for English news headlines an…

Updated Headline Generation: Creating Updated Summaries for Evolving News Stories Open

Sheena Panthaplackel, Adrian Benton, Mark Dredze · 2022

Computer science Business

To study how headlines are updated as underlying news stories evolve, we constructed a parallel dataset of simultaneous news body/headline updates. For this, we extracted selected examples from the NewsEdits corpus (Spangher and May, 2021)…

What Makes Data-to-Text Generation Hard for Pretrained Language Models? Open

Moniba Keymanesh, Adrian Benton, Mark Dredze · 2022

Computer science Biology Economics

Expressing natural language descriptions of structured facts or relations -- data-to-text generation (D2T) -- increases the accessibility of structured knowledge repositories. Previous work shows that pre-trained language models(PLMs) perf…

Updated Headline Generation: Creating Updated Summaries for Evolving News Stories Open

Sheena Panthaplackel, Adrian Benton, Mark Dredze · 2022

Computer science Engineering Philosophy

We propose the task of updated headline generation, in which a system generates a headline for an updated article, considering both the previous article and headline. The system must identify the novel information in the article update, an…

What Makes Data-to-Text Generation Hard for Pretrained Language Models? Open

Moniba Keymanesh, Adrian Benton, Mark Dredze · 2022

Computer science Economics Biology

Expressing natural language descriptions of structured facts or relations – data-to-text generation (D2T) – increases the accessibility of structured knowledge repositories. Previous work shows that pre-trained language models (PLMs) perfo…

Weakly Supervised Headline Dependency Parsing Open

Adrian Benton, Tianze Shi, Ozan İrsoy, Igor Malioutov · 2022

Computer science Philosophy

English news headlines form a register with unique syntactic properties that have been documented in linguistics literature since the 1930s. However, headlines have received surprisingly little attention from the NLP syntactic parsing comm…

C4 kōan CBOW embeddings Open

Ozan İrsoy, Adrian Benton, Karl Stratos · 2021

Computer science Mathematics

These are 2 million 768-dimensional and 300-dimensional CBOW embeddings trained on the English colossal, cleaned common crawl (C4) corpus. They were trained with the corrected CBOW code from kōan: https://github.com/bloomberg/koan with int…

C4 kōan CBOW embeddings Open

Ozan İrsoy, Adrian Benton, Karl Stratos · 2021

Computer science

These are 2 million 768-dimensional and 300-dimensional CBOW embeddings trained on the English colossal, cleaned common crawl (C4) corpus. They were trained with the corrected CBOW code from kōan: https://github.com/bloomberg/koan with int…

Cross-Register Projection for Headline Part of Speech Tagging Open

Adrian Benton, Hanyang Li, Igor Malioutov · 2021

Computer science Philosophy Economics

Part of speech (POS) tagging is a familiar NLP task. State of the art taggers routinely achieve token-level accuracies of over 97% on news body text, evidence that the problem is well understood. However, the register of English news headl…

Comparing Euclidean and Hyperbolic Embeddings on the WordNet Nouns Hypernymy Graph Open

Sameer Bansal, Adrian Benton · 2021

Mathematics Computer science

Nickel and Kiela (2017) present a new method for embedding tree nodes in the Poincare ball, and suggest that these hyperbolic embeddings are far more effective than Euclidean embeddings at embedding nodes in large, hierarchically structure…

Cross-Register Projection for Headline Part of Speech Tagging Open

Adrian Benton, Hanyang Li, Igor Malioutov · 2021

Computer science Philosophy

POSH: The POS-tagged HeadlIne corpus was created for the paper “Cross-Register Projection for Headline Part of Speech Tagging” published in EMNLP 2021. This dataset contains headlines with gold annotated POS tags. The GSCh evaluation set i…

Cross-Register Projection for Headline Part of Speech Tagging Open

Adrian Benton, Hangyang Li, Igor Malioutov · 2021

Computer science Philosophy

POSH: The POS-tagged HeadlIne corpus was created for the paper “Cross-Register Projection for Headline Part of Speech Tagging” published in EMNLP 2021. This dataset contains headlines with gold annotated POS tags. The GSCh evaluation set i…

Diversity-Aware Batch Active Learning for Dependency Parsing Open

Tianze Shi, Adrian Benton, Igor Malioutov, Ozan İrsoy · 2021

Computer science Sociology

While the predictive performance of modern statistical dependency parsers relies heavily on the availability of expensive expert-annotated treebank data, not all annotations contribute equally to the training of the parsers. In this paper,…

Cross-Register Projection for Headline Part of Speech Tagging Open

Adrian Benton, Hanyang Li, Igor Malioutov · 2021

Computer science Economics Philosophy

Part of speech (POS) tagging is a familiar NLP task. State of the art taggers routinely achieve token-level accuracies of over 97% on news body text, evidence that the problem is well understood. However, the register of English news headl…

Corrected CBOW Performs as well as Skip-gram Open

Ozan İrsoy, Adrian Benton, Karl Stratos · 2021

Computer science Mathematics

Mikolov et al. (2013a) observed that continuous bag-of-words (CBOW) word embeddings tend to underperform Skip-gram (SG) embeddings, and this finding has been reported in subsequent works. We find that these observations are driven not by f…

Towards Realistic Few-Shot Relation Extraction Open

Sam Brody, Sichao Wu, Adrian Benton · 2021

Computer science Engineering Chemistry

In recent years, few-shot models have been applied successfully to a variety of NLP tasks. Han et al. (2018) introduced a few-shot learning framework for relation classification, and since then, several models have surpassed human performa…

Comparing Euclidean and Hyperbolic Embeddings on the WordNet Nouns Hypernymy Graph Open

Sameer Bansal, Adrian Benton · 2021

Computer science Mathematics

Nickel and Kiela (2017) present a new method for embedding tree nodes in the Poincare ball, and suggest that these hyperbolic embeddings are far more effective than Euclidean embeddings at embedding nodes in large, hierarchically structure…

Diversity-Aware Batch Active Learning for Dependency Parsing Open

Tianze Shi, Adrian Benton, Igor Malioutov, Ozan İrsoy · 2021

Computer science Sociology Philosophy

Tianze Shi, Adrian Benton, Igor Malioutov, Ozan İrsoy. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.

kōan: A Corrected CBOW Implementation Open

Ozan İrsoy, Adrian Benton, Karl Stratos · 2020

Computer science Mathematics

It is a common belief in the NLP community that continuous bag-of-words (CBOW) word embeddings tend to underperform skip-gram (SG) embeddings. We find that this belief is founded less on theoretical differences in their training objectives…

Deep Generalized Canonical Correlation Analysis Open

Adrian Benton, Huda Khayrallah, Biman Gujral, Dee Ann Reisinger, Sheng Zhang , et al. · 2019

Computer science Mathematics Political science

We present Deep Generalized Canonical Correlation Analysis (DGCCA) -- a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While…

Roll Call Vote Prediction with Knowledge Augmented Models Open

Pallavi Patil, Kriti Myer, Ronak Zala, Arpit Singh, Sheshera Mysore , et al. · 2019

Computer science

Pallavi Patil, Kriti Myer, Ronak Zala, Arpit Singh, Sheshera Mysore, Andrew McCallum, Adrian Benton, Amanda Stent. Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). 2019.

Learning Representations of Social Media Users Open

Adrian Benton · 2018

Computer science Economics Philosophy

User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists conside…

Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate Open

David Broniatowski, Amelia Jamison, SiHua Qi, Lulwah Alkulaib, Tao Chen , et al. · 2018

Political science Computer science Medicine

Objectives. To understand how Twitter bots and trolls (“bots”) promote online health content. Methods. We compared bots’ to average users’ rates of vaccine-relevant messages, which we collected online from July 2014 through September 2017.…

Using Author Embeddings to Improve Tweet Stance Classification Open

Adrian Benton, Mark Dredze · 2018

Computer science

Many social media classification tasks analyze the content of a message, but do not consider the context of the message. For example, in tweet stance classification – where a tweet is categorized according to a viewpoint it espouses – the …

Deep Dirichlet Multinomial Regression Open

Adrian Benton, Mark Dredze · 2018

Computer science Mathematics Philosophy

Adrian Benton, Mark Dredze. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.

Multi-Task Learning for Mental Health using Social Media Text Open

Adrian Benton, Margaret Mitchell, Dirk Hovy · 2017

Computer science Psychology Engineering

We introduce initial groundwork for estimating suicide risk and mental health in a deep learning framework. By modeling multiple conditions, the system learns to make predictions about suicide risk and mental health at a low false positive…

Adrian Benton YOU? Author Swipe