Joint Approach to Deromanization of Code-mixed Texts Article Swipe
Related Concepts
Computer science
Romanization
Transliteration
Scripting language
Natural language processing
Code-mixing
Artificial intelligence
Code (set theory)
Task (project management)
Language identification
Hindi
Bengali
Joint (building)
Identification (biology)
Code-switching
Programming language
Natural language
Linguistics
Engineering
Set (abstract data type)
Architectural engineering
Biology
Systems engineering
Botany
Philosophy
Rashed Rubby Riyadh
,
Grzegorz Kondrak
·
YOU?
·
· 2019
· Open Access
·
· DOI: https://doi.org/10.18653/v1/w19-1403
· OA: W2964084554
YOU?
·
· 2019
· Open Access
·
· DOI: https://doi.org/10.18653/v1/w19-1403
· OA: W2964084554
The conversion of romanized texts back to the native scripts is a challenging task because of the inconsistent romanization conventions and non-standard language use. This problem is compounded by code-mixing, i.e., using words from more than one language within the same discourse. In this paper, we propose a novel approach for handling these two problems together in a single system. Our approach combines three components: language identification, back-transliteration, and sequence prediction. The results of our experiments on Bengali and Hindi datasets establish the state of the art for the task of deromanization of code-mixed texts.
Related Topics
Finding more related topics…