Meet Doshi
YOU?
Author Swipe
View article: Granite Embedding R2 Models
Granite Embedding R2 Models Open
We introduce the Granite Embedding R2 models, a comprehensive family of high-performance English encoder-based embedding models engineered for enterprise-scale dense retrieval applications. Building upon our first-generation release, these…
View article: Mistral-SPLADE: LLMs for better Learned Sparse Retrieval
Mistral-SPLADE: LLMs for better Learned Sparse Retrieval Open
Learned Sparse Retrievers (LSR) have evolved into an effective retrieval strategy that can bridge the gap between traditional keyword-based sparse retrievers and embedding-based dense retrievers. At its core, learned sparse retrievers try …
View article: How effective is Multi-source pivoting for Translation of Low Resource Indian Languages?
How effective is Multi-source pivoting for Translation of Low Resource Indian Languages? Open
Machine Translation (MT) between linguistically dissimilar languages is challenging, especially due to the scarcity of parallel corpora. Prior works suggest that pivoting through a high-resource language can help translation into a related…
View article: Pretraining Language Models Using Translationese
Pretraining Language Models Using Translationese Open
In this paper, we explore the utility of translationese as synthetic data created using machine translation for pre-training language models (LMs) for low-resource languages (LRLs). Our simple methodology consists of translating large amou…
View article: PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities Open
LLMs have demonstrated remarkable capability for understanding semantics, but they often struggle with understanding pragmatics. To demonstrate this fact, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen…
View article: Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap
Machine Translation Advancements for Low-Resource Indian Languages in WMT23: CFILT-IITB’s Effort for Bridging the Gap Open
This paper is related to the submission of the CFILT-IITB team for the task called IndicMT in WMT23. The paper describes our MT systems submitted to the WMT23 IndicMT shared task. The task focused on MT system development from/to English a…