Explanipedia

GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time Open

Divij Handa, Mihir Parmar, Aswin RRV, Md Nayem Uddin, Hamid Palangi , et al. · 2025

Repeated Sampling (RS) is a simple inference-time algorithm that has been shown to improve model performance on complex tasks. Although it is an effective way of scaling inference time, it often struggles to generate diverse solution candi…

BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software Open

Zehua Zhang, Ati Priya Bajaj, Divij Handa, Siyu Liu, Arvind S. Raj , et al. · 2025

Automatically compiling open-source software (OSS) projects is a vital, labor-intensive, and complex task, which makes it a good challenge for LLM Agents. Existing methods rely on manually curated rules and workflows, which cannot adapt to…

ThinkTuning: Instilling Cognitive Reflections without Distillation Open

Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar , et al. · 2025

Recent advances in test-time scaling have led to the emergence of thinking LLMs that exhibit self-reflective behaviors and multi-step reasoning. While RL drives this self-improvement paradigm, a recent study (Gandhi et al., 2025) shows tha…

Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents Open

Shrinidhi Kumbhar, Venkatesh Mishra, Kevin Coutinho, Divij Handa, Ashif Sikandar Iquebal , et al. · 2025

Computer science Engineering

Materials discovery and design are essential for advancing technology across various industries by enabling the development of application-specific materials. Recent research has leveraged Large Language Models (LLMs) to accelerate this pr…

UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization Open

Md Nayem Uddin, Amir Saeidi, Divij Handa, Agastya Seth, Tran Cao Son , et al. · 2024

Psychology Computer science

This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real world. We present a se…

ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints Open

Divij Handa, П.И. Дoлин, Shrinidhi Kumbhar, Chitta Baral, Tran Cao Son · 2024

Computer science Mathematics Psychology

Reasoning about Actions and Change (RAC) has historically played a pivotal role in solving foundational AI problems, such as the frame problem. It has driven advancements in AI fields, such as non-monotonic and commonsense reasoning. RAC r…

When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers Open

Divij Handa, Advait Chirmule, Bimal Gajera, Chitta Baral · 2024

Computer science Mathematics Philosophy

Recent advancements in Large Language Model (LLM) safety have primarily focused on mitigating attacks crafted in natural language or common ciphers (e.g. Base64), which are likely integrated into newer models' safety training. However, we …

Can NLP Models Correctly Reason Over Contexts that Break the Common Assumptions? Open

Neeraj Varshney, Mihir Parmar, Nisarg Patel, Divij Handa, Sayantan Sarkar , et al. · 2023

Computer science Biology

Pre-training on large corpora of text enables the language models to acquire a vast amount of factual and commonsense knowledge which allows them to achieve remarkable performance on a variety of language understanding tasks. They typicall…

Divij Handa YOU? Author Swipe