Chenchen Ding
YOU?
Author Swipe
View article: Transforming memristor noises into computational innovations
Transforming memristor noises into computational innovations Open
Memristor-based compute-in-memory (CIM) systems show promise in accelerating various computing tasks with high energy efficiency, while various inherent noises in memristors, generally viewed as non-ideal characteristics, are detrimental t…
View article: A Crucial Parameter for Rank-Frequency Relation in Natural Languages
A Crucial Parameter for Rank-Frequency Relation in Natural Languages Open
$f \propto r^{-α} \cdot (r+γ)^{-β}$ has been empirically shown more precise than a naïve power law $f\propto r^{-α}$ to model the rank-frequency ($r$-$f$) relation of words in natural languages. This work shows that the only crucial parame…
View article: Robust Neural Machine Translation for Abugidas by Glyph Perturbation
Robust Neural Machine Translation for Abugidas by Glyph Perturbation Open
View article: Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms
Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms Open
This article investigates approaches to effectively harness source-side linguistic features for low-resource multilingual neural machine translation (MNMT). Previous works focus on using various features of a word such as lemma, part-of-sp…
View article: Aligned Latin-Myanmar Transliteration Dataset
Aligned Latin-Myanmar Transliteration Dataset Open
Aligned Latin-Myanmar Transliteration Dataset Chenchen Ding Tue Nov 22 00:00:00 JST 2022 * Introduction This data set is a further refined and annotated version of the data at https://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/western-…
View article: Aligned Latin-Myanmar Transliteration Dataset
Aligned Latin-Myanmar Transliteration Dataset Open
Aligned Latin-Myanmar Transliteration Dataset Chenchen Ding Tue Nov 22 00:00:00 JST 2022 * Introduction This data set is a further refined and annotated version of the data at https://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/western-…
View article: Inputting Writing Systems with Medium Complexity: A Generalized Input Method Editor AKKHARA and Case Study on Myanmar Script
Inputting Writing Systems with Medium Complexity: A Generalized Input Method Editor AKKHARA and Case Study on Myanmar Script Open
In this study, an input method editor called AKKHARA is developed to accommodate writing systems comprising several tens to hundreds of symbols. As an engineering realization, AKKHARA accepts and applies a set of rewrite rules with priorit…
View article: Transliteration of Foreign Words in Burmese
Transliteration of Foreign Words in Burmese Open
This manuscript provides general descriptions on transliteration of foreign words in the Burmese language. Phenomena caused by phonetic and orthographic issues are discussed. Based on this work, we expect to gradually establish prescriptiv…
View article: Towards Tokenization and Part-of-Speech Tagging for Khmer: Data and Discussion
Towards Tokenization and Part-of-Speech Tagging for Khmer: Data and Discussion Open
As a highly analytic language, Khmer has considerable ambiguities in tokenization and part-of-speech (POS) tagging processing. This topic is investigated in this study. Specifically, a 20,000-sentence Khmer corpus with manual tokenization …
View article: Overview of the 8th Workshop on Asian Translation
Overview of the 8th Workshop on Asian Translation Open
Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi.…
View article: Constituency Parsing by Cross-Lingual Delexicalization
Constituency Parsing by Cross-Lingual Delexicalization Open
Cross-lingual transfer is an important technique for low-resource language processing. Temporarily, most research on syntactic parsing works on the dependency structures. This work investigates cross-lingual parsing on another type of impo…
View article: A Burmese (Myanmar) Treebank
A Burmese (Myanmar) Treebank Open
A 20,000-sentence Burmese (Myanmar) treebank on news articles has been released under a CC BY-NC-SA license. Complete phrase structure annotation was developed for each sentence from the morphologically annotated data prepared in previous …
View article: Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation
Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation Open
In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data. Integrating manually designed or au…
View article: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Open
View article: A Three-Parameter Rank-Frequency Relation in Natural Languages
A Three-Parameter Rank-Frequency Relation in Natural Languages Open
We present that, the rank-frequency relation in textual data follows f ∝ r-𝛼(r+𝛾)-𝛽, where f is the token frequency and r is the rank by frequency, with (𝛼, 𝛽, 𝛾) as parameters. The formulation is derived based on the empirical observation…
View article: Burmese (Myanmar) Treebank of Asian Language Treebank Project
Burmese (Myanmar) Treebank of Asian Language Treebank Project Open
* Introduction This is the Myanmar ALT of the Asian Language Treebank (ALT) Corpus. Please refer to http://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/index.html for an introduction of the ALT project. The process of building the Myanma…
View article: Towards Burmese (Myanmar) Morphological Analysis
Towards Burmese (Myanmar) Morphological Analysis Open
This article presents a comprehensive study on two primary tasks in Burmese (Myanmar) morphological analysis: tokenization and part-of-speech (POS) tagging. Twenty thousand Burmese sentences of newswire are annotated with two-layer tokeniz…
View article: English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019
English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019 Open
This paper presents the NICT's participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions. We built neural machine…
View article: Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English
Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English Open
This paper presents the NICT’s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks. For all the translation directions, we built state-of-the-art supervised neural (NM…
View article: MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method
MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method Open
Chenchen Ding, Masao Utiyama, Eiichiro Sumita. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstr…
View article: Overview of the 6th Workshop on Asian Translation
Overview of the 6th Workshop on Asian Translation Open
This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta…
View article: NOVA
NOVA Open
A feasible and flexible annotation system is designed for joint tokenization and part-of-speech (POS) tagging to annotate those languages without natural definitions of words . This design was motivated by the fact that word separators are…
View article: Statistical Khmer Name Romanization
Statistical Khmer Name Romanization Open
We discuss and solve the task of Khmer name Romanization. Although several standard Romanization systems exist for Khmer, conventional transcription methods are applied prevalently in practice. These are inconsistent and complicated in som…
View article: Simplified Abugidas
Simplified Abugidas Open
An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics. We investigate the feasibility of recovering the original text written in an abugida after omit…
View article: Burmese (Myanmar) Name Romanization: A Sub-syllabic Segmentation Scheme for Statistical Solutions
Burmese (Myanmar) Name Romanization: A Sub-syllabic Segmentation Scheme for Statistical Solutions Open
We focus on Burmese name Romanization, a critical task in the translation of Burmese into languages using Latin script. As Burmese is under researched and not well resourced, we collected and manually annotated 2, 335 Romanization instance…
View article: Word Segmentation for Burmese (Myanmar)
Word Segmentation for Burmese (Myanmar) Open
Experiments on various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrat…
View article: Improving fast_align by Reordering
Improving fast_align by Reordering Open
fast align is a simple, fast, and efficient approach for word alignment based on the IBM model 2. fast align performs well for language pairs with relatively similar word orders; however, it does not perform well for language pairs with dr…