Explanipedia

Stress and vowel harmony in Telugu Open

Sudheer Kolachina · 2016

Thesis: S.M. in Linguistics, Massachusetts Institute of Technology, Department of Linguistics and Philosophy, 2016.

MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages Open

Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan , et al. · 2021

Computer science Philosophy Physics

Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languag…

Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset Open

Dhriti Sengupta, Ananyo Choudhury, Analabha Basu, Michèle Ramsay · 2016

Biology Geography Sociology

Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasur…

Waiting for the state: Gender, citizenship and everyday encounters with bureaucracy in India Open

Grace Carswell, Thomas M. Chambers, Geert De Neve · 2018

Sociology Political science Computer science

This article focuses on practices and meanings of time and waiting experienced by poor, low-class Dalits and Muslims in their routine encounters with the state in India. Drawing on ethnographic research from Tamil Nadu and Uttar Pradesh, i…

Holistic spatial semantics and post-Talmian motion event typology: A case study of Thai and Telugu Open

Viswanatha Naidu, Jordan Zlatev, Vasanta Duggirala, Joost van de Weijer, Simon Devylder , et al. · 2018

Computer science Geography Philosophy

Leonard Talmy’s influential binary motion event typology has encountered four main challenges: (a) additional language types; (b) extensive “type-internal” variation; (c) the role of other relevant form classes than verbs and “satellites;”…

Caste, kinship and the realisation of ‘American Dream’: high-skilled Telugu migrants in the U.S.A. Open

Sanam Roohi · 2017

Sociology Geography Political science

Literature on the Indian diaspora domiciled in the U.S.A. largely portrays the group as educated, highly skilled migrants in pursuit of their American Dream, without critically engaging with the regionally particularised migration trajecto…

Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages Open

Bharathi Raja Chakravarthi, Mihael Arčan, John P. McCrae · 2019

Computer science Philosophy

Under-resourced languages are a significant challenge for statistical approaches to machine translation, and recently it has been shown that the usage of training data from closely-related languages can improve machine translation quality …

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model Open

Saleh Soltan, Shankar Ananthakrishnan, Jack FitzGerald, Ashutosh Gupta, Wael Hamza , et al. · 2022

Computer science Philosophy

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models…

ACTSA: Annotated Corpus for Telugu Sentiment Analysis Open

Sandeep Sricharan Mukku, Radhika Mamidi · 2017

Computer science

Sentiment analysis deals with the task of determining the polarity of a document or sentence and has received a lot of attention in recent years for the English language. With the rapid growth of social media these days, a lot of data is a…

Natural Language Processing and Sentiment Analysis on Bangla Social Media Comments on Russia–Ukraine War Using Transformers Open

Mahmud Hasan, Labiba Islam, Ismat Jahan, Sabrina Mannan Meem, Rashedur M. Rahman · 2023

Computer science Physics Geography

The Bangla Language ranks seventh in the list of most spoken languages with 265 native and non-native speakers around the world and the second Indo-Aryan language after Hindi. However, the growth of research for tasks such as sentiment ana…

A Multilingual Parallel Corpora Collection Effort for Indian Languages Open

Shashank Siripragada, Jerin Philip, Vinay P. Namboodiri, C. V. Jawahar · 2020

Computer science Philosophy

We present sentence aligned parallel corpora across 10 Indian Languages - Hindi, Telugu, Tamil, Malayalam, Gujarati, Urdu, Bengali, Oriya, Marathi, Punjabi, and English - many of which are categorized as low resource. The corpora are compi…

Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages Open

Vikrant Goyal, Sourav Kumar, Dipti Misra Sharma · 2020

Computer science Philosophy

A large percentage of the world's population speaks a language of the Indian subcontinent, comprising languages from both Indo-Aryan (e.g. Hindi, Punjabi, Gujarati, etc.) and Dravidian (e.g. Tamil, Telugu, Malayalam, etc.) families. A univ…

Measuring the multilingual reality: lessons from classrooms in Delhi and Hyderabad Open

Amy Lightfoot, Anusha Balasubramanian, Ianthi Maria Tsimpli, Lina Mukhopadhyay, Jeanine Treffers‐Daller · 2021

Sociology Psychology Geography

India’s linguistic diversity is reflected in classrooms across the country, where multiple languages are used by teachers and learners to negotiate meaning and instruction – a multilingual, multicultural student body is the norm, whether i…

Indic-Transformers: An Analysis of Transformer Language Models for\n Indian Languages Open

Kushal Jain, Adwait Deshpande, Kumar Shridhar, Felix Laumann, Ayushman Dash · 2020

Computer science Engineering Philosophy

Language models based on the Transformer architecture have achieved\nstate-of-the-art performance on a wide range of NLP tasks such as text\nclassification, question-answering, and token classification. However, this\nperformance is usuall…

"A Passage to India": Pre-trained Word Embeddings for Indian Languages Open

Saurav Kumar, Saunack Kumar, Diptesh Kanojia, Pushpak Bhattacharyya · 2021

Computer science History Philosophy

Dense word vectors or 'word embeddings' which encode semantic properties of words, have now become integral to NLP tasks like Machine Translation (MT), Question Answering (QA), Word Sense Disambiguation (WSD), and Information Retrieval (IR…

Homogeneity vs Heterogeneity in Indian English: Investigating Influences of L1 on f0 Range Open

Olga Maxwell, Elinor Payne, Rosey Billington · 2018

Psychology Mathematics Computer science

We present an exploratory analysis of several long-term distributional measures of f0 range in the speech of university-educated speakers of Indian English from four L1 backgrounds (Telugu, Tamil, Hindi and Bengali). The aim of this study …

Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction Open

Keshav Kolluru, Muqeeth Mohammed, Shubham Mittal, Soumen Chakrabarti, Mausam Mausam · 2022

Computer science Philosophy Psychology

Progress with supervised Open Information Extraction (OpenIE) has been primarily limited to English due to the scarcity of training data in other languages. In this paper, we explore techniques to automatically convert English text for tra…

A Sentiment Analysis System for the Hindi Language by Integrating Gated Recurrent Unit with Genetic Algorithm Open

Kush Shrivastava, Shishir Kumar · 2020

Computer science Economics Psychology

The growing availability and popularity of opinion rich resources such as blogs, shopping websites, review portals, and social media platforms have attracted several researchers to perform the sentiment analysis task. Unlike English, Chine…

SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification Open

Sai Muralidhar Jayanthi, Akshat Gupta · 2021

Computer science Engineering Biology

In this paper we present our submission for the EACL 2021-Shared Task on Offensive Language Identification in Dravidian languages. Our final system is an ensemble of mBERT and XLM-RoBERTa models which leverage task-adaptive pre-training of…

How Ready Are Indian Primary School Children for English Medium Instruction? An Analysis of the Relationship between the Reading Skills of Low-SES Children, Their Oral Vocabulary and English Input in the Classroom in Government Schools in India Open

Jeanine Treffers‐Daller, Lina Mukhopadhyay, Anusha Balasubramanian, Vasim Tamboli, Ianthi Maria Tsimpli · 2022

Psychology Computer science Philosophy

The aim of the study was to find out to what extent low socio-economic status (SES) children enrolled in government-run primary schools in Hyderabad are ready to receive instruction through the medium of English (English medium instruction…

An Efficient Deep Learning Model with Interrelated Tagging Prototype with Segmentation for Telugu Optical Character Recognition Open

Srinivasa Rao Dhanikonda, P. Sowjanya, M. Laxmidevi Ramanaiah, Rahul Joshi, B. H. Krishna Mohan , et al. · 2022

Computer science Mathematics

More than 66 million people in India speak Telugu, a language that dates back thousands of years and is widely spoken in South India. There has not been much progress reported on the advancement of Telugu text Optical Character Recognition…

Reduction of features to identify characters from degraded historical manuscripts Open

T. R. Vijaya Lakshmi · 2017

Computer science Mathematics Engineering

The historical writings were found on stones, palm leaves, cloth, etc. This paper deals with the identification of Telugu Palm leaf characters by acquiring a additional 3D feature on palm leaves. The background of these manuscripts is iden…

Offensive Language Identification in Low-resourced Code-mixed Dravidian\n languages using Pseudo-labeling Open

Adeep Hande, Karthik Puranik, Konthala Yasaswini, Ruba Priyadharshini, Sajeetha Thavareesan , et al. · 2021

Computer science Mathematics Philosophy

Social media has effectively become the prime hub of communication and\ndigital marketing. As these platforms enable the free manifestation of thoughts\nand facts in text, images and video, there is an extensive need to screen them\nto pro…

Motion event descriptions in Swedish, French, Thai and Telugu: a study in post-Talmian motion event typology Open

Jordan Zlatev, Johan Blomberg, Simon Devylder, Viswanatha Naidu, Joost van de Weijer · 2021

History Computer science Philosophy

Motion-event typology has moved into a “post-Talmian” terrain of approaches focusing on an open-ended number of patterns across languages and constructions. Following a proposal to distinguish between four typological clusters, we systemat…

Translanguaging in Primary Level ESL Classroom in India: An Exploratory Study Open

Lina Mukhopadhyay · 2020

Psychology Sociology Philosophy

In this paper, a series of ESL classroom observations of a teacher in an Indian primary level government run school are presented to show concrete uses of translanguaging. Translanguaging practices were based on the inputs the teacher rece…

Sentiment Analysis in Code-Mixed Telugu-English Text with Unsupervised Data Normalization Open

Kusampudi Siva Subrahamanyam Varma, Preetham Sathineni, Radhika Mamidi · 2021

Computer science Sociology Philosophy

In a multilingual society, people communicate in more than one language, leading to Code-Mixed data.Sentimental analysis on Code-Mixed Telugu-English Text (CMTET) poses unique challenges.The unstructured nature of the Code-Mixed Data is du…

Deep Ensemble Network for Sentiment Analysis in Bi-lingual Low-resource Languages Open

Pradeep Kumar Roy · 2023

Computer science Philosophy

Sentiment analysis (SA) is the systematic identification, extraction, quantification, and study of affective states and subjective information using natural language processing. It is widely used for analyzing users’ feedback, such as revi…

Hyperbolic Feature-based Sarcasm Detection in Telugu Conversation Sentences Open

Santosh Kumar Bharti, R. Subramanyam Naidu, Korra Sathya Babu · 2020

Computer science History Philosophy

Recognition of sarcastic statements has been a challenge in the process of sentiment analysis. A sarcastic sentence contains only positive words conveying a negative sentiment. Therefore, it is tough for any automated machine to identify t…

Author Identification using Sequential Minimal Optimization with rule-based Decision Tree on Indian Literature in Marathi Open

Sunil D. Kale, Rajesh Prasad · 2018

Computer science Biology Philosophy

Authorship Identification is the task of identifying who wrote a given piece of text from a given set of candidate authors (suspects). The increasingly large volumes of texts on the Internet enhance the great yet urgent necessity for autho…

Multi Variant Handwritten Telugu Character Recognition Using Transfer Learning Open

Tejasree Ganji, Muni Sekhar Velpuru, Raman Dugyala · 2021

Computer science Mathematics Philosophy

Optical Character Recognition (OCR) has become one of the most important techniques in computer vision, given that it can easily obtain information from various images. However, existing OCR techniques cannot recognition Telugu literature …

Telugu ≈ Telugu