Marcel Bollmann
YOU?
Author Swipe
View article: How Good is Your Wikipedia? Auditing Data Quality for Low-resource and Multilingual NLP
How Good is Your Wikipedia? Auditing Data Quality for Low-resource and Multilingual NLP Open
Wikipedia's perceived high quality and broad language coverage have established it as a fundamental resource in multilingual NLP. In the context of low-resource languages, however, these quality assumptions are increasingly being scrutinis…
View article: CreoleVal: Multilingual Multitask Benchmarks for Creoles
CreoleVal: Multilingual Multitask Benchmarks for Creoles Open
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and a number of highly resourced languages imply a significant potential fo…
View article: CreoleVal: Multilingual Multitask Benchmarks for Creoles
CreoleVal: Multilingual Multitask Benchmarks for Creoles Open
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research.While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for…
View article: Two Decades of the ACL Anthology: Development, Impact, and Open Challenges
Two Decades of the ACL Anthology: Development, Impact, and Open Challenges Open
The ACL Anthology is a prime resource for research papers within computational linguistics and natural language processing, while continuing to be an open-source and community-driven project. Since Gildea et al. (2018) reported on its stat…
View article: How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task
How far can we get with one GPU in 100 hours? CoAStaL at MultiIndicMT Shared Task Open
This work shows that competitive translation results can be obtained in a constrained setting by incorporating the latest advances in memory and compute optimization. We train and evaluate large multilingual translation models using a sing…
View article: Error Analysis and the Role of Morphology
Error Analysis and the Role of Morphology Open
We evaluate two common conjectures in error analysis of NLP models: (i) Morphology is predictive of errors; and (ii) the importance of morphology increases with the morphological complexity of a language. We show across four different task…
View article: Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task
Moses and the Character-Based Random Babbling Baseline: CoAStaL at AmericasNLP 2021 Shared Task Open
We evaluated a range of neural machine translation techniques developed specifically for low-resource scenarios. Unsuccessfully. In the end, we submitted two runs: (i) a standard phrase-based model, and (ii) a random babbling baseline usin…
View article: On Forgetting to Cite Older Papers: An Analysis of the ACL Anthology
On Forgetting to Cite Older Papers: An Analysis of the ACL Anthology Open
The field of natural language processing is experiencing a period of unprecedented growth, and with it a surge of published papers. This represents an opportunity for us to take stock of how we cite the work of other researchers, and wheth…
View article: Naive Regularizers for Low-Resource Neural Machine Translation
Naive Regularizers for Low-Resource Neural Machine Translation Open
Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regul…
View article: Few-Shot and Zero-Shot Learning for Historical Text Normalization
Few-Shot and Zero-Shot Learning for Historical Text Normalization Open
Historical text normalization often relies on small training datasets. Recent work has shown that multi-task learning can lead to significant improvements by exploiting synergies with related datasets, but there has been no systematic stud…
View article: Historical Text Normalization with Delayed Rewards
Historical Text Normalization with Delayed Rewards Open
Training neural sequence-to-sequence models with simple token-level log-likelihood is now a standard approach to historical text normalization, albeit often outperformed by phrase-based models. Policy gradient training enables direct optim…
View article: A Large-Scale Comparison of Historical Text Normalization Systems
A Large-Scale Comparison of Historical Text Normalization Systems Open
There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encode…
View article: Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP Open
Neural part-of-speech (POS) taggers are known to not perform well with little training data.As a step towards overcoming this problem, we present an architecture for learning more robust neural POS taggers by jointly training a hierarchica…
View article: Multi-task learning for historical text normalization: Size matters
Multi-task learning for historical text normalization: Size matters Open
Historical text normalization suffers fromsmall datasets that exhibit high variance,and previous work has shown that multitasklearning can be used to leverage datafrom related problems in order to obtainmore robust models. Previous work ha…
View article: Learning attention for historical text normalization by learning to pronounce
Learning attention for historical text normalization by learning to pronounce Open
Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot of training data, which is not available for the named …
View article: Improving historical spelling normalization with bi-directional LSTMs\n and multi-task learning
Improving historical spelling normalization with bi-directional LSTMs\n and multi-task learning Open
Natural-language processing of historical documents is complicated by the\nabundance of variant spellings and lack of annotated data. A common approach is\nto normalize the spelling of historical words to modern forms. We explore the\nsuit…
View article: Improving historical spelling normalization with bi-directional LSTMs and multi-task learning
Improving historical spelling normalization with bi-directional LSTMs and multi-task learning Open
Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the spelling of historical words to modern forms. We explore the suitabi…
View article: Evaluating Inter-Annotator Agreement on Historical Spelling Normalization
Evaluating Inter-Annotator Agreement on Historical Spelling Normalization Open
This paper deals with means of evaluating inter-annotator agreement for a normalization task.This task differs from common annotation tasks in two important aspects: (i) the class of labels (the normalized wordforms) is open, and (ii) anno…