Maunendra Sankar Desarkar

DIWALI -- Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context Open

Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar · 2025

Large language models (LLMs) are widely used in various tasks and applications. However, despite their wide capabilities, they are shown to lack cultural alignment \citep{ryan-etal-2024-unintended, alkhamissi-etal-2024-investigating} and p…

MorphTok: Morphologically Grounded Tokenization for Indian Languages Open

Maharaj Brahma, N J Karthika, Atul Kumar Singh, Devaraj Adiga, Sameer Bhate , et al. · 2025

Tokenization is a crucial step in NLP, especially with the rise of large language models (LLMs), impacting downstream performance, computational cost, and efficiency. Existing LLMs rely on the classical Byte-pair Encoding (BPE) algorithm f…

NLIP at BEA 2025 Shared Task: Evaluation of Pedagogical Ability of AI Tutors Open

Tapan Kumar Saha, Sagar Ganguli, Maunendra Sankar Desarkar · 2025

DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context Open

Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar · 2025

NLIP_Lab-IITH Multilingual MT System for WAT24 MT Shared Task Open

Maharaj Brahma, Pramit Sahoo, Maunendra Sankar Desarkar · 2024

This paper describes NLIP Lab's multilingual machine translation system for the WAT24 shared task on multilingual Indic MT task for 22 scheduled languages belonging to 4 language families. We explore pre-training for Indic languages using …

NLIP_Lab-IITH Low-Resource MT System for WMT24 Indic MT Shared Task Open

Pramit Sahoo, Maharaj Brahma, Maunendra Sankar Desarkar · 2024

In this paper, we describe our system for the WMT 24 shared task of Low-Resource Indic Language Translation. We consider eng $\leftrightarrow$ {as, kha, lus, mni} as participating language pairs. In this shared task, we explore the finetun…

DAC: Quantized Optimal Transport Reward-based Reinforcement Learning Approach to Detoxify Query Auto-Completion Open

Aishwarya Maheswaran, Kaushal Kumar Maurya, Manish Gupta, Maunendra Sankar Desarkar · 2024

Transformer based Multitask Learning for Image Captioning and Object Detection Open

Debolena Basak, P. K. Srijith, Maunendra Sankar Desarkar · 2024

In several real-world scenarios like autonomous navigation and mobility, to obtain a better visual understanding of the surroundings, image captioning and object detection play a crucial role. This work introduces a novel multitask learnin…

BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation Open

Suvodip Dey, Maunendra Sankar Desarkar · 2024

The standard language modeling (LM) loss by itself has been shown to be inadequate for effective dialogue modeling. As a result, various training approaches, such as auxiliary loss functions and leveraging human feedback, are being adopted…

CharSpan: Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages Open

Kaushal Kumar Maurya, Rahul Kejriwal, Maunendra Sankar Desarkar, Anoop Kunchukuttan · 2024

trie-nlg: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes Open

Kaushal Kumar Maurya, Maunendra Sankar Desarkar, Manish Gupta, Puneet Agrawal · 2023

Trie-NLG: Trie Context Augmentation to Improve Personalized Query Auto-Completion for Short and Unseen Prefixes Open

Kaushal Kumar Maurya, Maunendra Sankar Desarkar, Manish Gupta, Puneet Agrawal · 2023

Query auto-completion (QAC) aims to suggest plausible completions for a given query prefix. Traditionally, QAC systems have leveraged tries curated from historical query logs to suggest most popular completions. In this context, there are …

Towards Improvement of Grounded Cross-lingual Natural Language Inference with VisioTextual Attention Open

Arkadipta De, Maunendra Sankar Desarkar, Asif Ekbal · 2023

Natural Language Inference (NLI) has been one of the fundamental tasks in Natural Language Processing (NLP). Recognizing Textual Entailment (RTE) between the two pieces of text is a crucial problem. It adds further challenges when it invol…

CharSpan: Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages Open

Kaushal Kumar Maurya, Rahul Kejriwal, Maunendra Sankar Desarkar, Anoop Kunchukuttan · 2023

We address the task of machine translation (MT) from extremely low-resource language (ELRL) to English by leveraging cross-lingual transfer from 'closely-related' high-resource language (HRL). The development of an MT system for ELRL is ch…

SelectNoise: Unsupervised Noise Injection to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages Open

Maharaj Brahma, Kaushal Kumar Maurya, Maunendra Sankar Desarkar · 2023

In this work, we focus on the task of machine translation (MT) from extremely low-resource language (ELRLs) to English. The unavailability of parallel data, lack of representation from large multilingual pre-trained models, and limited mon…

Towards Low-resource Language Generation with Limited Supervision Open

Kaushal Kumar Maurya, Maunendra Sankar Desarkar · 2023

We present a research narrative aimed at enabling language technology for multiple natural language generation (NLG) tasks in low-resource languages (LRLs). With approximately 7,000 languages spoken globally, many lack the resources requir…

On Text Style Transfer via Style-Aware Masked Language Models Open

Sharan Narasimhan, Hattarki Pooja, Suvodip Dey, Maunendra Sankar Desarkar · 2023

Text Style Transfer (TST) is performable through approaches such as latent space disentanglement, cycle-consistency losses, prototype editing etc. The prototype editing approach, which is known to be quite successful in TST, involves two k…

DivHSK: Diverse Headline Generation using Self-Attention based Keyword Selection Open

E.T. Venkatesh, Kaushal Kumar Maurya, Deepak Kumar, Maunendra Sankar Desarkar · 2023

Diverse headline generation is an NLP task where given a news article, the goal is to generate multiple headlines that are true to the content of the article but are different among themselves. This task aims to exhibit and exploit semanti…

Dial-M: A Masking-based Framework for Dialogue Evaluation Open

Suvodip Dey, Maunendra Sankar Desarkar · 2023

In dialogue systems, automatically evaluating machine-generated responses is critical and challenging. Despite the tremendous progress in dialogue generation research, its evaluation heavily depends on human judgments. The standard word-ov…

ComplAI: Theory of A Unified Framework for Multi-factor Assessment of Black-Box Supervised Machine Learning Models Open

Arkadipta De, Satya Swaroop Gudipudi, Sourab Panchanan, Maunendra Sankar Desarkar · 2022

The advances in Artificial Intelligence are creating new opportunities to improve lives of people around the world, from business to healthcare, from lifestyle to education. For example, some systems profile the users using their demograph…

On Text Style Transfer via Style Masked Language Models Open

Sharan Narasimhan, Pooja Shekar, Suvodip Dey, Maunendra Sankar Desarkar · 2022

Text Style Transfer (TST) is performable through approaches such as latent space disentanglement, cycle-consistency losses, prototype editing etc. The prototype editing approach, which is known to be quite successful in TST, involves two k…

DialoGen: Generalized Long-Range Context Representation for Dialogue Systems Open

Suvodip Dey, Maunendra Sankar Desarkar, P. K. Srijith · 2022

Long-range context modeling is crucial to both dialogue understanding and generation. The most popular method for dialogue context representation is to concatenate the last-$k$ utterances in chronological order. However, this method may no…

HyperHawkes: Hypernetwork based Neural Temporal Point Process Open

Manisha Dubey, P. K. Srijith, Maunendra Sankar Desarkar · 2022

Temporal point process serves as an essential tool for modeling time-to-event data in continuous time space. Despite having massive amounts of event sequence data from various domains like social media, healthcare etc., real world applicat…

Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer Open

Sharan Narasimhan, Suvodip Dey, Maunendra Sankar Desarkar · 2022

Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geo…

Effective utilization of labeled data from related tasks using graph contrastive pretraining Open

Samujjwal Ghosh, Subhadeep Maji, Maunendra Sankar Desarkar · 2022

Contrastive pretraining techniques for text classification has been largely studied in an unsupervised setting. However, oftentimes labeled data from related past datasets which share label semantics with current task is available. We hypo…

Multi-Context Based Neural Approach for COVID-19 Fake-News Detection Open

Arkadipta De, Maunendra Sankar Desarkar · 2022

When the world is facing the COVID-19 pandemic, society is also fighting another battle to tackle misinformation. Due to the widespread effect of COVID 19 and increased usage of social media, fake news and rumors about COVID-19 are being s…

Towards Fair Evaluation of Dialogue State Tracking by Flexible Incorporation of Turn-level Performances Open

Suvodip Dey, Ramamohan Kummara, Maunendra Sankar Desarkar · 2022

Dialogue State Tracking (DST) is primarily evaluated using Joint Goal Accuracy (JGA) defined as the fraction of turns where the ground-truth dialogue state exactly matches the prediction. Generally in DST, the dialogue state or belief stat…

Meta-X$_{NLG}$: A Meta-Learning Approach Based on Language Clustering for Zero-Shot Cross-Lingual Transfer and Generation Open

Kaushal Kumar Maurya, Maunendra Sankar Desarkar · 2022

Recently, the NLP community has witnessed a rapid advancement in multilingual and cross-lingual transfer research where the supervision is transferred from high-resource languages (HRLs) to low-resource languages (LRLs). However, the cross…

Graph Neural Network Enhanced Language Models for Efficient Multilingual Text Classification Open

Samujjwal Ghosh, Subhadeep Maji, Maunendra Sankar Desarkar · 2022

Online social media works as a source of various valuable and actionable information during disasters. These information might be available in multiple languages due to the nature of user generated content. An effective system to automatic…

Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer Open

Sharan Narasimhan, Suvodip Dey, Maunendra Sankar Desarkar · 2022

Sharan Narasimhan, Suvodip Dey, Maunendra Desarkar. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022.

Maunendra Sankar Desarkar YOU? Author Swipe