Telugu ≈ Telugu
View article
Stress and vowel harmony in Telugu Open
Thesis: S.M. in Linguistics, Massachusetts Institute of Technology, Department of Linguistics and Philosophy, 2016.
View article
MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages Open
Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languag…
View article
Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset Open
Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasur…
View article
Waiting for the state: Gender, citizenship and everyday encounters with bureaucracy in India Open
This article focuses on practices and meanings of time and waiting experienced by poor, low-class Dalits and Muslims in their routine encounters with the state in India. Drawing on ethnographic research from Tamil Nadu and Uttar Pradesh, i…
View article
Holistic spatial semantics and post-Talmian motion event typology: A case study of Thai and Telugu Open
Leonard Talmy’s influential binary motion event typology has encountered four main challenges: (a) additional language types; (b) extensive “type-internal” variation; (c) the role of other relevant form classes than verbs and “satellites;”…
View article
Caste, kinship and the realisation of ‘American Dream’: high-skilled Telugu migrants in the U.S.A. Open
Literature on the Indian diaspora domiciled in the U.S.A. largely portrays the group as educated, highly skilled migrants in pursuit of their American Dream, without critically engaging with the regionally particularised migration trajecto…
View article
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages Open
Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality …
View article
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model Open
In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models…
View article
ACTSA: Annotated Corpus for Telugu Sentiment Analysis Open
Sentiment analysis deals with the task of determining the polarity of a document or sentence and has received a lot of attention in recent years for the English language. With the rapid growth of social media these days, a lot of data is a…
View article
Natural Language Processing and Sentiment Analysis on Bangla Social Media Comments on Russia–Ukraine War Using Transformers Open
The Bangla Language ranks seventh in the list of most spoken languages with 265 native and non-native speakers around the world and the second Indo-Aryan language after Hindi. However, the growth of research for tasks such as sentiment ana…
View article
A Multilingual Parallel Corpora Collection Effort for Indian Languages Open
We present sentence aligned parallel corpora across 10 Indian Languages - Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English - many of which are categorized as low resource. The corpora are compi…
View article
Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages Open
A large percentage of the world's population speaks a language of the Indian subcontinent, comprising languages from both Indo-Aryan (e.g. Hindi, Punjabi, Gujarati, etc.) and Dravidian (e.g. Tamil, Telugu, Malayalam, etc.) families. A univ…
View article
Measuring the multilingual reality: lessons from classrooms in Delhi and Hyderabad Open
India’s linguistic diversity is reflected in classrooms across the country, where multiple languages are used by teachers and learners to negotiate meaning and instruction – a multilingual, multicultural student body is the norm, whether i…
View article
Indic-Transformers: An Analysis of Transformer Language Models for\n Indian Languages Open
Language models based on the Transformer architecture have achieved\nstate-of-the-art performance on a wide range of NLP tasks such as text\nclassification, question-answering, and token classification. However, this\nperformance is usuall…
View article
"A Passage to India": Pre-trained Word Embeddings for Indian Languages Open
Dense word vectors or 'word embeddings' which encode semantic properties of words, have now become integral to NLP tasks like Machine Translation (MT), Question Answering (QA), Word Sense Disambiguation (WSD), and Information Retrieval (IR…
View article
Homogeneity vs Heterogeneity in Indian English: Investigating Influences of L1 on f0 Range Open
We present an exploratory analysis of several long-term distributional measures of f0 range in the speech of university-educated speakers of Indian English from four L1 backgrounds (Telugu, Tamil, Hindi and Bengali). The aim of this study …
View article
Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction Open
Progress with supervised Open Information Extraction (OpenIE) has been primarily limited to English due to the scarcity of training data in other languages. In this paper, we explore techniques to automatically convert English text for tra…
View article
A Sentiment Analysis System for the Hindi Language by Integrating Gated Recurrent Unit with Genetic Algorithm Open
The growing availability and popularity of opinion rich resources such as blogs, shopping websites, review portals, and social media platforms have attracted several researchers to perform the sentiment analysis task. Unlike English, Chine…
View article
SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification Open
In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of…
View article
How Ready Are Indian Primary School Children for English Medium Instruction? An Analysis of the Relationship between the Reading Skills of Low-SES Children, Their Oral Vocabulary and English Input in the Classroom in Government Schools in India Open
The aim of the study was to find out to what extent low socio-economic status (SES) children enrolled in government-run primary schools in Hyderabad are ready to receive instruction through the medium of English (English medium instruction…
View article
An Efficient Deep Learning Model with Interrelated Tagging Prototype with Segmentation for Telugu Optical Character Recognition Open
More than 66 million people in India speak Telugu, a language that dates back thousands of years and is widely spoken in South India. There has not been much progress reported on the advancement of Telugu text Optical Character Recognition…
View article
Reduction of features to identify characters from degraded historical manuscripts Open
The historical writings were found on stones, palm leaves, cloth, etc. This paper deals with the identification of Telugu Palm leaf characters by acquiring a additional 3D feature on palm leaves. The background of these manuscripts is iden…
View article
Offensive Language Identification in Low-resourced Code-mixed Dravidian\n languages using Pseudo-labeling Open
Social media has effectively become the prime hub of communication and\ndigital marketing. As these platforms enable the free manifestation of thoughts\nand facts in text, images and video, there is an extensive need to screen them\nto pro…
View article
Motion event descriptions in Swedish, French, Thai and Telugu: a study in post-Talmian motion event typology Open
Motion-event typology has moved into a “post-Talmian” terrain of approaches focusing on an open-ended number of patterns across languages and constructions. Following a proposal to distinguish between four typological clusters, we systemat…
View article
Translanguaging in Primary Level ESL Classroom in India: An Exploratory Study Open
In this paper, a series of ESL classroom observations of a teacher in an Indian primary level government run school are presented to show concrete uses of translanguaging. Translanguaging practices were based on the inputs the teacher rece…
View article
Sentiment Analysis in Code-Mixed Telugu-English Text with Unsupervised Data Normalization Open
In a multilingual society, people communicate in more than one language, leading to Code-Mixed data.Sentimental analysis on Code-Mixed Telugu-English Text (CMTET) poses unique challenges.The unstructured nature of the Code-Mixed Data is du…
View article
Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages Open
Sentiment analysis (SA) is the systematic identification, extraction, quantification, and study of affective states and subjective information using natural language processing. It is widely used for analyzing users’ feedback, such as revi…
View article
Hyperbolic Feature-based Sarcasm Detection in Telugu Conversation Sentences Open
Recognition of sarcastic statements has been a challenge in the process of sentiment analysis. A sarcastic sentence contains only positive words conveying a negative sentiment. Therefore, it is tough for any automated machine to identify t…
View article
Author Identification using Sequential Minimal Optimization with rule-based Decision Tree on Indian Literature in Marathi Open
Authorship Identification is the task of identifying who wrote a given piece of text from a given set of candidate authors (suspects). The increasingly large volumes of texts on the Internet enhance the great yet urgent necessity for autho…
View article
Multi Variant Handwritten Telugu Character Recognition Using Transfer Learning Open
Optical Character Recognition (OCR) has become one of the most important techniques in computer vision, given that it can easily obtain information from various images. However, existing OCR techniques cannot recognition Telugu literature …