Joint Approach to Deromanization of Code-mixed Texts Article Swipe

View

Related Concepts

Computer science Romanization Transliteration Scripting language Natural language processing Code-mixing Artificial intelligence Code (set theory) Task (project management) Language identification Hindi Bengali Joint (building) Identification (biology) Code-switching Programming language Natural language Linguistics Engineering Set (abstract data type) Architectural engineering Biology Systems engineering Botany Philosophy

Rashed Rubby Riyadh , Grzegorz Kondrak ·

YOU? · · 2019 · Open Access · · DOI: https://doi.org/10.18653/v1/w19-1403 · OA: W2964084554

The conversion of romanized texts back to the native scripts is a challenging task because of the inconsistent romanization conventions and non-standard language use. This problem is compounded by code-mixing, i.e., using words from more than one language within the same discourse. In this paper, we propose a novel approach for handling these two problems together in a single system. Our approach combines three components: language identification, back-transliteration, and sequence prediction. The results of our experiments on Bengali and Hindi datasets establish the state of the art for the task of deromanization of code-mixed texts.