Lori Levin
YOU?
Author Swipe
View article: Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons
Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons Open
In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning en…
View article: UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies
UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies Open
The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning throu…
View article: Wav2Gloss: Generating Interlinear Glossed Text from Speech
Wav2Gloss: Generating Interlinear Glossed Text from Speech Open
Thousands of the world's languages are in danger of extinction--a tremendous threat to cultural identities and human language diversity. Interlinear Glossed Text (IGT) is a form of linguistic annotation that can support documentation and r…
View article: GlossLM: A Massively Multilingual Corpus and Pretrained Model for Interlinear Glossed Text
GlossLM: A Massively Multilingual Corpus and Pretrained Model for Interlinear Glossed Text Open
Language documentation projects often involve the creation of annotated text in a format such as interlinear glossed text (IGT), which captures fine-grained morphosyntactic analyses in a morpheme-by-morpheme format. However, there are few …
View article: Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity
Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity Open
Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larg…
View article: Construction Grammar Provides Unique Insight into Neural Language Models
Construction Grammar Provides Unique Insight into Neural Language Models Open
Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pretrained language models (PLMs) with respect to the structure and meaning of constructions. In this positi…
View article: SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing
SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing Open
Taiqi He, Lindia Tjuatja, Nathaniel Robinson, Shinji Watanabe, David R. Mortensen, Graham Neubig, Lori Levin. Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology. 2023.
View article: Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation
Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation Open
David R. Mortensen, Ela Gulsen, Taiqi He, Nathaniel Robinson, Jonathan Amith, Lindia Tjuatja, Lori Levin. Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology. 2023.
View article: Syntax and Semantics Meet in the “Middle”: Probing the Syntax-Semantics Interface of LMs Through Agentivity
Syntax and Semantics Meet in the “Middle”: Probing the Syntax-Semantics Interface of LMs Through Agentivity Open
Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larg…
View article: Language Technologies for Humanitarian Aid
Language Technologies for Humanitarian Aid Open
Humanitarian aid missions, whether emergency famine relief, establishment of medical clinics, or missions in conjunction with peace-keeping operations, require on-demand communication with the indigenous population. If such operations take…
View article: Neural Polysynthetic Language Modelling
Neural Polysynthetic Language Modelling Open
Research in natural language processing commonly assumes that approaches that work well for English and and other widely-used languages are "language agnostic". In high-resource languages, especially those that are analytic, a common appro…
View article: Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations
Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations Open
Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers. Manual production of IGT takes time and requires linguistic expertise. We attempt to addres…
View article: Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings
Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings Open
Cross-lingual word embedding (CWE) algorithms represent words in multiple languages in a unified vector space. Multi-Word Expressions (MWE) are common in every language. When training word embeddings, each component word of an MWE gets its…
View article: An Empirical Exploration of Local Ordering Pre-training for Structured Prediction
An Empirical Exploration of Local Ordering Pre-training for Structured Prediction Open
Recently, pre-training contextualized encoders with language model (LM) objectives has been shown an effective semi-supervised method for structured prediction. In this work, we empirically explore an alternative pre-training method for co…
View article: A Resource for Computational Experiments on Mapudungun
A Resource for Computational Experiments on Mapudungun Open
We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers. We provide 142 hours of culturally significant conversations in the domain of med…
View article: Low-Resource Machine Translation using Interlinear Glosses.
Low-Resource Machine Translation using Interlinear Glosses. Open
Neural Machine Translation (NMT) does not handle low-resource translation well because NMT is data-hungry and low-resource languages, by their nature, have limited parallel data. Many low-resource languages are morphologically rich, which …
View article: Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation
Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation Open
We demonstrate a new approach to Neural Machine Translation (NMT) for low-resource languages using a ubiquitous linguistic resource, Interlinear Glossed Text (IGT). IGT represents a non-English sentence as a sequence of English lemmas and …
View article: The ARIEL-CMU Systems for LoReHLT18
The ARIEL-CMU Systems for LoReHLT18 Open
This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text…
View article: Adapting Word Embeddings to New Languages with Morphological and\n Phonological Subword Representations
Adapting Word Embeddings to New Languages with Morphological and\n Phonological Subword Representations Open
Much work in Natural Language Processing (NLP) has been for resource-rich\nlanguages, making generalization to new, less-resourced languages challenging.\nWe present two approaches for improving generalization to low-resourced\nlanguages b…
View article: DeepCx: A transition-based approach for shallow semantic parsing with complex constructional triggers
DeepCx: A transition-based approach for shallow semantic parsing with complex constructional triggers Open
This paper introduces the surface construction labeling (SCL) task, which expands the coverage of Shallow Semantic Parsing (SSP) to include frames triggered by complex constructions. We present DeepCx, a neural, transition-based system for…
View article: ParaMor: Finding Paradigms across Morphology
ParaMor: Finding Paradigms across Morphology Open
Our algorithm, ParaMor, fared well in Morpho Challenge 2007 (Kurimo et al., 2007), a peer operated competition pitting against one another algorithms designed to discover the morphological structure of natural languages from nothing more t…
View article: Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations
Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations Open
Much work in Natural Language Processing (NLP) has been for resource-rich languages, making generalization to new, less-resourced languages challenging. We present two approaches for improving generalization to low-resourced languages by a…
View article: Automatically Tagging Constructions of Causation and Their Slot-Fillers
Automatically Tagging Constructions of Causation and Their Slot-Fillers Open
This paper explores extending shallow semantic parsing beyond lexical-unit triggers, using causal relations as a test case. Semantic parsing becomes difficult in the face of the wide variety of linguistic realizations that causation can ta…
View article: The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations
The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations Open
Language of cause and effect captures an essential component of the semantics of a text. However, causal language is also intertwined with other semantic relations, such as temporal precedence and correlation. This makes it difficult to de…
View article: URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors
URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors Open
Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, Lori Levin. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017.
View article: Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages
Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages Open
Code-switching has been found to have social motivations in addition to syntactic constraints. In this work, we explore the social effect of code-switching in an online community. We present a task from the Arabic Wikipedia to capture lang…