Explanipedia

ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models Open

Biddut Sarker Bijoy, Mohd. Hasan, Pegah Alipoormolabashi, Avirup Sil, Aruna Balasubramanian , et al. · 2025

Multi-agent systems with smaller language models (SLMs) present a viable alternative to single agent systems powered by large language models (LLMs) for addressing complex problems. In this work, we study how these alternatives compare in …

A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning Open

Can Xu, Yiping Lu, Liu Han, Avirup Sil · 2025

Multi-turn problem solving is critical yet challenging for Large Reasoning Models (LRMs) to reflect on their reasoning and revise from feedback. Existing Reinforcement Learning (RL) methods train large reasoning models on a single-turn par…

ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges Open

Cheng Qian, Hongyi Du, Hongru Wang, Xiusi Chen, Yuji Zhang , et al. · 2025

Recent progress in large language models (LLMs) has enabled substantial advances in solving mathematical problems. However, existing benchmarks often fail to reflect the complexity of real-world problems, which demand open-ended, interdisc…

Agent Trajectory Explorer: Visualizing and Providing Feedback on Agent Trajectories Open

Michael Desmond, Ja Young Lee, Ibrahim Ibrahim, James M. Johnson, Avirup Sil , et al. · 2025

Computer science Physics

Agentic systems interleave large language model (LLM) reasoning, tool usage, and tool observations over multiple iterations to tackle complex tasks. The raw data from an agent's problem-solving process (the agents' trajectory) is not an id…

SeaView: Software Engineering Agent Visual Interface for Enhanced Workflow Open

Timothy Bula, Saurabh Pujar, Luca Buratti, Mihaela Bornea, Avirup Sil · 2025

Auto-regressive LLM-based software engineering (SWE) agents, henceforth SWE agents, have made tremendous progress (>60% on SWE-Bench Verified) on real-world coding challenges including GitHub issue resolution. SWE agents use a combination …

Granite Embedding Models Open

Aashka Trivedi, Yulong Li, Mihaela Bornea, David Cox, Martin Franz , et al. · 2025

We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities. This report p…

<span>CLAPnq</span>: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems Open

Sara Rosenthal, Avirup Sil, Radu Florian, Salim Roukos · 2025

Computer science

Retrieval Augmented Generation (RAG) has become a popular application for large language models. It is preferable that successful RAG systems provide accurate answers that are supported by being grounded in a passage without any hallucinat…

FIRST: Faster Improved Listwise Reranking with Single Token Decoding Open

Revanth Gangi Reddy, JaeHyeok Doo, Yifei Xu, Md Arafat Sultan, Deevya Swain , et al. · 2024

Computer science

Large Language Models (LLMs) have significantly advanced the field of information retrieval, particularly for reranking. Listwise LLM rerankers have showcased superior performance and generalizability compared to existing supervised approa…

Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels Open

Jasper Xian, S. Justin Samuel, Faraz Khoubsirat, R. Pradeep, Md Arafat Sultan , et al. · 2024

Computer science Mathematics Geography

We develop a method for training small-scale (under 100M parameter) neural information retrieval models with as few as 10 gold relevance labels. The method depends on generating synthetic queries for documents using a language model (LM), …

CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems Open

Sara Rosenthal, Avirup Sil, Radu Florian, Salim Roukos · 2024

Philosophy Psychology Computer science

Retrieval Augmented Generation (RAG) has become a popular application for large language models. It is preferable that successful RAG systems provide accurate answers that are supported by being grounded in a passage without any hallucinat…

An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation Open

Md Arafat Sultan, Aashka Trivedi, Parul Awasthy, Avirup Sil · 2024

Computer science Mathematics Psychology

We present a large-scale empirical study of how choices of configuration parameters affect performance in knowledge distillation (KD). An example of such a KD parameter is the measure of distance between the predictions of the teacher and …

Muted: Multilingual Targeted Offensive Speech Identification and Visualization Open

Christoph Tillmann, Aashka Trivedi, Sara Brin Rosenthal, Santosh Borse, Rong Zhang , et al. · 2023

Computer science Engineering

Offensive language such as hate, abuse, and profanity (HAP) occurs in various content on the web. While previous work has mostly dealt with sentence level annotations, there have been a few recent attempts to identify offensive spans as we…

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Open

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi · 2023

Computer science Mathematics Economics

Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an a…

GAAMA 2.0: An Integrated System That Answers Boolean and Extractive Questions Open

Scott McCarley, Mihaela Bornea, Sara Brin Rosenthal, Anthony Ferritto, Md Arafat Sultan , et al. · 2023

Computer science Engineering

Recent machine reading comprehension datasets include extractive and boolean questions but current approaches do not offer integrated support for answering both question types. We present a front-end demo to a multilingual machine reading …

ReFIT: Relevance Feedback from a Reranker during Inference Open

Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil , et al. · 2023

Computer science Philosophy Political science

Retrieve-and-rerank is a prevalent framework in neural information retrieval, wherein a bi-encoder network initially retrieves a pre-defined number of candidates (e.g., K=100), which are then reranked by a more powerful cross-encoder model…

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers Open

Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz , et al. · 2023

Computer science Mathematics Chemistry

Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challen…

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development Open

Avirup Sil, Jaydeep Sen, Bhavani Iyer, Martin Franz, Kshitij Fadnis , et al. · 2023

Computer science Geography Philosophy

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components suc…

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers Open

Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz , et al. · 2023

Computer science Psychology Chemistry

Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Sultan, Christopher Potts. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

Muted: Multilingual Targeted Offensive Speech Identification and Visualization Open

Christoph Tillmann, Aashka Trivedi, Sara Brin Rosenthal, Santosh Borse, Rong Zhang , et al. · 2023

Computer science Engineering Geology

Christoph Tillmann, Aashka Trivedi, Sara Rosenthal, Santosh Borse, Rong Zhang, Avirup Sil, Bishwaranjan Bhattacharjee. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023.

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking Open

Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil , et al. · 2022

Computer science Engineering Business

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the…

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers Open

Ameet Deshpande, Md Arafat Sultan, Anthony Ferritto, Ashwin Kalyan, Karthik Narasimhan , et al. · 2022

Computer science Engineering Art

Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this parad…

Zero-Shot Dynamic Quantization for Transformer Inference Open

Yousef El-Kurdi, Jerry Quinn, Avirup Sil · 2022

Computer science Engineering Philosophy

We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure,or they require an…

Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval Open

Revanth Gangi Reddy, Md Arafat Sultan, Martin Franz, Avirup Sil, Heng Ji · 2022

Computer science

We show that supervised neural information retrieval (IR) models are prone to\nlearning sparse attention patterns over passage tokens, which can result in key\nphrases including named entities receiving low attention weights, eventually\nl…

Improved Text Classification via Contrastive Adversarial Training Open

Lin Pan, Chung-Wei Hang, Avirup Sil, Saloni Potdar · 2022

Computer science Engineering Geology

We propose a simple and general method to regularize the fine-tuning of Transformer-based encoders for text classification tasks. Specifically, during fine-tuning we generate adversarial examples by perturbing the word embedding matrix of …

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding Open

Revant Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen , et al. · 2022

Computer science Sociology Geography

Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images. However, QA using images is often limited to just picking the answer from a pre-def…

GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions Open

Scott McCarley, Mihaela Bornea, Sara Brin Rosenthal, Anthony Ferritto, Md Arafat Sultan , et al. · 2022

Computer science Engineering

Recent machine reading comprehension datasets include extractive and boolean questions but current approaches do not offer integrated support for answering both question types. We present a multilingual machine reading comprehension system…

Task Transfer and Domain Adaptation for Zero-Shot Question Answering Open

Xiang Pan, Alex Sheng, David Shimshoni, Aditya Singhal, Sara Brin Rosenthal , et al. · 2022

Computer science Psychology Mathematics

Pretrained language models have shown success in various areas of natural language processing, including reading comprehension tasks. However, when applying machine learning methods to new domains, labeled data may not always be available.…

Not to Overfit or Underfit the Source Domains? An Empirical Study of Domain Generalization in Question Answering Open

Md Arafat Sultan, Avirup Sil, Radu Florian · 2022

Computer science Mathematics Geography

Machine learning models are prone to overfitting their training (source) domains, which is commonly believed to be the reason why they falter in novel target domains. Here we examine the contrasting view that multi-source domain generaliza…

Avirup Sil YOU? Author Swipe