Swaroop Mishra
YOU?
Author Swipe
View article: Towards Robust Mathematical Reasoning
Towards Robust Mathematical Reasoning Open
Finding the right north-star metrics is highly critical for advancing the mathematical reasoning capabilities of foundation models, especially given that existing evaluations are either too easy or only focus on getting correct short answe…
View article: Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning
Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning Open
Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, yet their ability to perform structured symbolic planning remains limited, particularly in domains requiring formal representations like the Plann…
View article: PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving Open
Recent agent frameworks and inference-time algorithms often struggle with complex planning problems due to limitations in verifying generated plans or reasoning and varying complexity of instances within a single task. Many existing method…
View article: PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving Open
View article: Towards Robust Mathematical Reasoning
Towards Robust Mathematical Reasoning Open
View article: Reverse Thinking Makes LLMs Stronger Reasoners
Reverse Thinking Makes LLMs Stronger Reasoners Open
View article: Reverse Thinking Makes LLMs Stronger Reasoners
Reverse Thinking Makes LLMs Stronger Reasoners Open
Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. This often enhances overall reasoning perf…
View article: Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark Open
Multi-modal Large Language Models (MLLMs) exhibit impressive problem-solving abilities in various domains, but their visual comprehension and abstract reasoning skills remain under-evaluated. To this end, we present PolyMATH, a challenging…
View article: Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting Open
Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval …
View article: NATURAL PLAN: Benchmarking LLMs on Natural Language Planning
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning Open
We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full informat…
View article: Cutting Through the Noise: Boosting LLM Performance on Math Word Problems
Cutting Through the Noise: Boosting LLM Performance on Math Word Problems Open
Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adv…
View article: Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses Open
Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the …
View article: In-Context Principle Learning from Mistakes
In-Context Principle Learning from Mistakes Open
In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct inpu…
View article: Self-Discover: Large Language Models Self-Compose Reasoning Structures
Self-Discover: Large Language Models Self-Compose Reasoning Structures Open
We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-disc…
View article: Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses Open
Large language model (LLM) powered chatbots are primarily text-based today, and impose a large interactional cognitive load, especially for exploratory or sensemaking tasks such as planning a trip or learning about a new city. Because the …
View article: Instruction-Following Evaluation for Large Language Models
Instruction-Following Evaluation for Large Language Models Open
One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while L…
View article: TarGEN: Targeted Data Generation with Large Language Models
TarGEN: Targeted Data Generation with Large Language Models Open
The rapid advancement of large language models (LLMs) has sparked interest in data synthesis techniques, aiming to generate diverse and high-quality synthetic datasets. However, these synthetic datasets often suffer from a lack of diversit…
View article: InstructExcel: A Benchmark for Natural Language Instruction in Excel
InstructExcel: A Benchmark for Natural Language Instruction in Excel Open
With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript AP…
View article: AutoMix: Automatically Mixing Language Models
AutoMix: Automatically Mixing Language Models Open
Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and per…
View article: Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models Open
We present Step-Back Prompting, a simple prompting technique that enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide rea…
View article: How FaR Are Large Language Models From Agents with Theory-of-Mind?
How FaR Are Large Language Models From Agents with Theory-of-Mind? Open
"Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi a…
View article: Large Language Models Cannot Self-Correct Reasoning Yet
Large Language Models Cannot Self-Correct Reasoning Yet Open
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their g…
View article: Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning Open
Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many l…
View article: Instruction Tuned Models are Quick Learners
Instruction Tuned Models are Quick Learners Open
Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of downstream t…
View article: InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis
InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis Open
We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples to each training sample, and instruction tune the model (Tk-Ins…
View article: Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow
Real-Time Visual Feedback to Guide Benchmark Creation: A Human-and-Metric-in-the-Loop Workflow Open
Recent research has shown that language models exploit `artifacts' in benchmarks to solve tasks, rather than truly learning them, leading to inflated model performance. In pursuit of creating better benchmarks, we propose VAIDA, a novel be…
View article: “John is 50 years old, can his son be 65?” Evaluating NLP Models’ Understanding of Feasibility
“John is 50 years old, can his son be 65?” Evaluating NLP Models’ Understanding of Feasibility Open
Himanshu Gupta, Neeraj Varshney, Swaroop Mishra, Kuntal Kumar Pal, Saurabh Arjun Sawant, Kevin Scaria, Siddharth Goyal, Chitta Baral. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguisti…
View article: InstructExcel: A Benchmark for Natural Language Instruction in Excel
InstructExcel: A Benchmark for Natural Language Instruction in Excel Open
Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, Elnaz Nouri. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023.
View article: HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models
HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models Open
Controlling the text generated by language models and customizing the content has been a long-standing challenge. Existing prompting techniques proposed in pursuit of providing control are task-specific and lack generality; this provides o…
View article: Self-Instruct: Aligning Language Models with Self-Generated Instructions
Self-Instruct: Aligning Language Models with Self-Generated Instructions Open
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023.