Michael Ginn
YOU?
Author Swipe
View article: Is linguistically-motivated data augmentation worth it?
Is linguistically-motivated data augmentation worth it? Open
Data augmentation, a widely-employed technique for addressing data scarcity, involves generating synthetic data examples which are then used to augment available training data. Researchers have seen surprising success from simple methods, …
View article: Tree Transformers are an Ineffective Model of Syntactic Constituency
Tree Transformers are an Ineffective Model of Syntactic Constituency Open
Linguists have long held that a key aspect of natural language syntax is the recursive organization of language units into constituent structures, and research has suggested that current state-of-the-art language models lack an inherent bi…
View article: Historia Magistra Vitae: Dynamic Topic Modeling of Roman Literature using Neural Embeddings
Historia Magistra Vitae: Dynamic Topic Modeling of Roman Literature using Neural Embeddings Open
Dynamic topic models have been proposed as a tool for historical analysis, but traditional approaches have had limited usefulness, being difficult to configure, interpret, and evaluate. In this work, we experiment with a recent approach fo…
View article: Can we teach language models to gloss endangered languages?
Can we teach language models to gloss endangered languages? Open
Interlinear glossed text (IGT) is a popular format in language documentation projects, where each morpheme is labeled with a descriptive annotation. Automating the creation of interlinear glossed text would be desirable to reduce annotator…
View article: GlossLM: A Massively Multilingual Corpus and Pretrained Model for Interlinear Glossed Text
GlossLM: A Massively Multilingual Corpus and Pretrained Model for Interlinear Glossed Text Open
Language documentation projects often involve the creation of annotated text in a format such as interlinear glossed text (IGT), which captures fine-grained morphosyntactic analyses in a morpheme-by-morpheme format. However, there are few …
View article: Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context
Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context Open
Generalization is of particular importance in resource-constrained settings, where the available training data may represent only a small fraction of the distribution of possible texts. We investigate the ability of morpheme labeling model…
View article: Taxonomic Loss for Morphological Glossing of Low-Resource Languages
Taxonomic Loss for Morphological Glossing of Low-Resource Languages Open
Morpheme glossing is a critical task in automated language documentation and can benefit other downstream applications greatly. While state-of-the-art glossing systems perform very well for languages with large amounts of existing data, it…
View article: SIGMORPHON 2023 Shared Task of Interlinear Glossing: Baseline Model
SIGMORPHON 2023 Shared Task of Interlinear Glossing: Baseline Model Open
Language documentation is a critical aspect of language preservation, often including the creation of Interlinear Glossed Text (IGT). Creating IGT is time-consuming and tedious, and automating the process can save valuable annotator effort…
View article: Findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing
Findings of the SIGMORPHON 2023 Shared Task on Interlinear Glossing Open
Michael Ginn, Sarah Moeller, Alexis Palmer, Anna Stacey, Garrett Nicolai, Mans Hulden, Miikka Silfverberg. Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology. 2023.
View article: Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context
Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context Open
Generalization is of particular importance in resource-constrained settings, where the available training data may represent only a small fraction of the distribution of possible texts. We investigate the ability of morpheme labeling model…
View article: Ginn-Khamov at SemEval-2023 Task 6, Subtask B: Legal Named Entities Extraction for Heterogenous Documents
Ginn-Khamov at SemEval-2023 Task 6, Subtask B: Legal Named Entities Extraction for Heterogenous Documents Open
This paper describes our submission to SemEval-2023 Task 6, Subtask B, a shared task on performing Named Entity Recognition in legal documents for specific legal entity types. Documents are divided into the preamble and judgement texts, an…