Explanipedia

Transforming memristor noises into computational innovations Open

Chenchen Ding, Yuan Ren, Zhengwu Liu, Ngai Wong · 2025

Memristor-based compute-in-memory (CIM) systems show promise in accelerating various computing tasks with high energy efficiency, while various inherent noises in memristors, generally viewed as non-ideal characteristics, are detrimental t…

A Crucial Parameter for Rank-Frequency Relation in Natural Languages Open

Chenchen Ding · 2024

$f \propto r^{-α} \cdot (r+γ)^{-β}$ has been empirically shown more precise than a naïve power law $f\propto r^{-α}$ to model the rank-frequency ($r$-$f$) relation of words in natural languages. This work shows that the only crucial parame…

Robust Neural Machine Translation for Abugidas by Glyph Perturbation Open

Hour Kaing, Chenchen Ding, Hideki Tanaka, Masao Utiyama · 2024

Low-resource Multilingual Neural Translation Using Linguistic Feature-based Relevance Mechanisms Open

Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2023

This article investigates approaches to effectively harness source-side linguistic features for low-resource multilingual neural machine translation (MNMT). Previous works focus on using various features of a word such as lemma, part-of-sp…

Aligned Latin-Myanmar Transliteration Dataset Open

Chenchen Ding · 2022

Aligned Latin-Myanmar Transliteration Dataset Chenchen Ding Tue Nov 22 00:00:00 JST 2022 * Introduction This data set is a further refined and annotated version of the data at https://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/western-…

Aligned Latin-Myanmar Transliteration Dataset Open

Chenchen Ding · 2022

Aligned Latin-Myanmar Transliteration Dataset Chenchen Ding Tue Nov 22 00:00:00 JST 2022 * Introduction This data set is a further refined and annotated version of the data at https://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/western-…

Inputting Writing Systems with Medium Complexity: A Generalized Input Method Editor AKKHARA and Case Study on Myanmar Script Open

Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2022

In this study, an input method editor called AKKHARA is developed to accommodate writing systems comprising several tens to hundreds of symbols. As an engineering realization, AKKHARA accepts and applies a set of rewrite rules with priorit…

Transliteration of Foreign Words in Burmese Open

Chenchen Ding · 2021

This manuscript provides general descriptions on transliteration of foreign words in the Burmese language. Phenomena caused by phonetic and orthographic issues are discussed. Based on this work, we expect to gradually establish prescriptiv…

Towards Tokenization and Part-of-Speech Tagging for Khmer: Data and Discussion Open

Hour Kaing, Chenchen Ding, Masao Utiyama, Eiichiro Sumita, Sethserey Sam , et al. · 2021

As a highly analytic language, Khmer has considerable ambiguities in tokenization and part-of-speech (POS) tagging processing. This topic is investigated in this study. Specifically, a 20,000-sentence Khmer corpus with manual tokenization …

Overview of the 8th Workshop on Asian Translation Open

Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama , et al. · 2021

Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi.…

Constituency Parsing by Cross-Lingual Delexicalization Open

Hour Kaing, Chenchen Ding, Masao Utiyama, Eiichiro Sumita, Katsuhito Sudoh , et al. · 2021

Cross-lingual transfer is an important technique for low-resource language processing. Temporarily, most research on syntactic parsing works on the dependency structures. This work investigates cross-lingual parsing on another type of impo…

A Burmese (Myanmar) Treebank Open

Chenchen Ding, Sann Su Su Yee, Win Pa Pa, Khin Mar Soe, Masao Utiyama , et al. · 2020

A 20,000-sentence Burmese (Myanmar) treebank on news articles has been released under a CC BY-NC-SA license. Complete phrase structure annotation was developed for each sentence from the morphologically annotated data prepared in previous …

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation Open

Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2020

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data. Integrating manually designed or au…

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Open

David Schlangen, Qun Liu, Huawei Noah's Ark Lab, Hong Kong, Adrián Pastor , et al. · 2020

A Three-Parameter Rank-Frequency Relation in Natural Languages Open

Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2020

We present that, the rank-frequency relation in textual data follows f ∝ r-𝛼(r+𝛾)-𝛽, where f is the token frequency and r is the rank by frequency, with (𝛼, 𝛽, 𝛾) as parameters. The formulation is derived based on the empirical observation…

Burmese (Myanmar) Treebank of Asian Language Treebank Project Open

Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2019

* Introduction This is the Myanmar ALT of the Asian Language Treebank (ALT) Corpus. Please refer to http://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/index.html for an introduction of the ALT project. The process of building the Myanma…

Towards Burmese (Myanmar) Morphological Analysis Open

Chenchen Ding, Hnin Thu Zar Aye, Win Pa Pa, Khin Thandar Nwet, Khin Mar Soe , et al. · 2019

This article presents a comprehensive study on two primary tasks in Burmese (Myanmar) morphological analysis: tokenization and part-of-speech (POS) tagging. Twenty thousand Burmese sentences of newswire are annotated with two-layer tokeniz…

English-Myanmar Supervised and Unsupervised NMT: NICT’s Machine Translation Systems at WAT-2019 Open

Rui Wang, Haipeng Sun, Kehai Chen, Chenchen Ding, Masao Utiyama , et al. · 2019

This paper presents the NICT's participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions. We built neural machine…

Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English Open

Benjamin Marie, Hour Kaing, Aye Myat Mon, Chenchen Ding, Atsushi Fujita , et al. · 2019

This paper presents the NICT’s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks. For all the translation directions, we built state-of-the-art supervised neural (NM…

MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method Open

Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2019

Chenchen Ding, Masao Utiyama, Eiichiro Sumita. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstr…

Overview of the 6th Workshop on Asian Translation Open

Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre , et al. · 2019

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta…

NOVA Open

Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2018

A feasible and flexible annotation system is designed for joint tokenization and part-of-speech (POS) tagging to annotate those languages without natural definitions of words . This design was motivated by the fact that word separators are…

Statistical Khmer Name Romanization Open

Chenchen Ding, Vichet Chea, Masao Utiyama, Eiichiro Sumita, Sethserey Sam , et al. · 2018

We discuss and solve the task of Khmer name Romanization. Although several standard Romanization systems exist for Khmer, conventional transcription methods are applied prevalently in practice. These are inconsistent and complicated in som…

Simplified Abugidas Open

Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2018

An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics. We investigate the feasibility of recovering the original text written in an abugida after omit…

Burmese (Myanmar) Name Romanization: A Sub-syllabic Segmentation Scheme for Statistical Solutions Open

Chenchen Ding, Win Pa Pa, Masao Utiyama, Eiichiro Sumita · 2018

We focus on Burmese name Romanization, a critical task in the translation of Burmese into languages using Latin script. As Burmese is under researched and not well resourced, we collected and manually annotated 2, 335 Romanization instance…

Word Segmentation for Burmese (Myanmar) Open

Chenchen Ding, Ye Kyaw Thu, Masao Utiyama, Eiichiro Sumita · 2016

Experiments on various word segmentation approaches for the Burmese language are conducted and discussed in this note. Specifically, dictionary-based, statistical, and machine learning approaches are tested. Experimental results demonstrat…

Improving fast_align by Reordering Open

Chenchen Ding, Masao Utiyama, Eiichiro Sumita · 2015

fast align is a simple, fast, and efficient approach for word alignment based on the IBM model 2. fast align performs well for language pairs with relatively similar word orders; however, it does not perform well for language pairs with dr…

Chenchen Ding YOU? Author Swipe