Explanipedia

CoDocBench: A Dataset for Code-Documentation Alignment in Software Maintenance Open

Kunal Suresh Pai, Prémkumar Dévanbu, Toufique Ahmed · 2025

One of the central tasks in software maintenance is being able to understand and develop code changes. Thus, given a natural language description of the desired new operation of a function, an agent (human or AI) might be asked to generate…

Calibration and Correctness of Language Models for Code ICSE Artifact Open

C. Katharina Spieß, David Gros, Kunal Suresh Pai, Michael Pradel, Md Rafiqul Islam Rabin , et al. · 2025

Machine learning models are widely used, but can also often be wrong. Users would benefit from a reliable indication of whether a given output from a given model should be trusted, so a rational decision can be made whether to use the outp…

Vision Paper: Proof-Carrying Code Completions Open

Parnian Shabani Kamran, Prémkumar Dévanbu, Caleb Stanford · 2024

Can LLMs Replace Manual Annotation of Software Engineering Artifacts? Open

Toufique Ahmed, Prémkumar Dévanbu, Christoph Treude, Michael Pradel · 2024

Experimental evaluations of software engineering innovations, e.g., tools and processes, often include human-subject studies as a component of a multi-pronged strategy to obtain greater generalizability of the findings. However, human-subj…

Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy Open

Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Prémkumar Dévanbu , et al. · 2024

Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential secur…

Calibration of Large Language Models on Code Summarization Open

Yuvraj Virk, Prémkumar Dévanbu, Toufique Ahmed · 2024

A brief, fluent, and relevant summary can be helpful during program comprehension; however, such a summary does require significant human effort to produce. Often, good summaries are unavailable in software projects, which makes maintenanc…

Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization) Open

Toufique Ahmed, Kunal Suresh Pai, Prémkumar Dévanbu, Earl T. Barr · 2024

Large Language Models (LLM) are a new class of computation engines, "programmed" via prompt engineering. Researchers are still learning how to best "program" these LLMs to help developers. We start with the intuition that developers tend t…

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair Open

Islem Bouzenia, Prémkumar Dévanbu, Michael Pradel · 2024

Automated program repair has emerged as a powerful technique to mitigate the impact of software bugs on system reliability and user experience. This paper introduces RepairAgent, the first work to address the program repair challenge throu…

Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code Open

Zhou Yang, Zhensu Sun, Terry Zhuo Yue, Prémkumar Dévanbu, David Lo · 2024

Large language models for code (LLM4Code), which demonstrate strong performance (e.g., high accuracy) in processing source code, have significantly transformed software engineering. Many studies separately investigate the non-functional pr…

Studying LLM Performance on Closed- and Open-source Data Open

Toufique Ahmed, Christian Bird, Prémkumar Dévanbu, Saikat Chakraborty · 2024

Large Language models (LLMs) are finding wide use in software engineering practice. These models are extremely data-hungry, and are largely trained on open-source (OSS) code distributed with permissive licenses. In terms of actual use howe…

Towards Understanding What Code Language Models Learned Open

Toufique Ahmed, Dian Yu, Chengxuan Huang, Cathy Wang, Prémkumar Dévanbu , et al. · 2023

Pre-trained language models are effective in a variety of natural language tasks, but it has been argued their capabilities fall short of fully learning meaning or understanding language. To understand the extent to which language models c…

Better patching using LLM prompting, via Self-Consistency Open

Toufique Ahmed, Prémkumar Dévanbu · 2023

Large Language models (LLMs) can be induced to solve non-trivial problems with "few-shot" prompts including illustrative problem-solution examples. Now if the few-shots also include "chain of thought" (CoT) explanations, which are of the f…

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques Open

Aftab M. Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Prémkumar Dévanbu , et al. · 2023

In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification …

AI Safety Subproblems for Software Engineering Researchers Open

David Gros, Prémkumar Dévanbu, Yu Zhou · 2023

In this 4-page manuscript we discuss the problem of long-term AI Safety from a Software Engineering (SE) research viewpoint. We briefly summarize long-term AI Safety, and the challenge of avoiding harms from AI as systems meet or exceed hu…

Automatic Semantic Augmentation of Language Model Prompts (for Code Summarization) Open

Toufique Ahmed, Kunal Suresh Pai, Prémkumar Dévanbu, Earl T. Barr · 2023

Large Language Models (LLM) are a new class of computation engines, "programmed" via prompt engineering. We are still learning how to best "program" these LLMs to help developers. We start with the intuition that developers tend to conscio…

Large Language Models and Simple, Stupid Bugs Open

Kevin Jesse, Toufique Ahmed, Prémkumar Dévanbu, Emily Morgan · 2023

With the advent of powerful neural language models, AI-based systems to assist developers in coding tasks are becoming widely available; Copilot is one such system. Copilot uses Codex, a large language model (LLM), to complete code conditi…

Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries Open

Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Prémkumar Dévanbu , et al. · 2023

Reverse engineering binaries is required to understand and analyse programs for which the source code is unavailable. Decompilers can transform the largely unreadable binaries into a more readable source code-like representation. However, …

CAPYBARA: Decompiled Binary Functions and Related Summaries Open

Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Prémkumar Dévanbu , et al. · 2022

CAPYBARA This dataset is published as part of the paper: "Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries". It includes both the training/evaluation data as well as the raw data. The data…

CAPYBARA: Decompiled Binary Functions and Related Summaries Open

Ali Al-Kaswan, Toufique Ahmed, Maliheh Izadi, Anand Ashok Sawant, Prémkumar Dévanbu , et al. · 2022

CAPYBARA This dataset is published as part of the paper: "Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries". It includes both the training/evaluation data as well as the raw data. The data…

FlexType: A Plug-and-Play Framework for Type Inference Models Open

Sivani Voruganti, Kevin Jesse, Prémkumar Dévanbu · 2022

Types in TypeScript play an important role in the correct usage of variables and APIs. Type errors such as variable or function misuse can be avoided with explicit type annotations. In this work, we introduce FlexType, an IDE extension tha…

Few-shot training LLMs for project-specific code-summarization Open

Toufique Ahmed, Prémkumar Dévanbu · 2022

Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code. A particularly exciting aspect of LLMs is their knack for few-sh…

Few-shot training LLMs for project-specific code-summarization Open

Toufique Ahmed, Prémkumar Dévanbu · 2022

Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code. A particularly exciting aspect of LLMs is their knack for few-sh…

NatGen: Generative pre-training by "Naturalizing" source code Open

Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Prémkumar Dévanbu, Baishakhi Ray · 2022

Pre-trained Generative Language models (e.g. PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation. These models have adopted varying pre-training…

Learning code summarization from a small and local dataset Open

Toufique Ahmed, Prémkumar Dévanbu · 2022

Foundation models (e.g., CodeBERT, GraphCodeBERT, CodeT5) work well for many software engineering tasks. These models are pre-trained (using self-supervision) with billions of code tokens, and then fine-tuned with hundreds of thousands of …

ManyTypes4TypeScript Open

Kevin Jesse, Prémkumar Dévanbu · 2022

In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 million type annotations, across 13,953 pr…

Multilingual training for software engineering Open

Toufique Ahmed, Prémkumar Dévanbu · 2022

Well-trained machine-learning models, which leverage large amounts of open-source software data, have now become an interesting approach to automating many software engineering tasks. Several SE tasks have all been subject to this approach…

ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference Open

Kevin Jesse, Prémkumar Dévanbu · 2022

In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 million type annotations, across 13,953 pr…

ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference Open

Kevin Jesse, Prémkumar Dévanbu · 2022

In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 million type annotations, across 13,953 pr…

ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference Open

Kevin Jesse, Prémkumar Dévanbu · 2022

In this paper, we present ManyTypes4TypeScript, a very large corpus for training and evaluating machine-learning models for sequence-based type inference in TypeScript. The dataset includes over 9 million type annotations, across 13,953 pr…

Learning to Find Usages of Library Functions in Optimized Binaries Open

Toufique Ahmed, Prémkumar Dévanbu, Anand Ashok Sawant · 2021

Much software, whether beneficent or malevolent, is distributed only as\nbinaries, sans source code. Absent source code, understanding binaries'\nbehavior can be quite challenging, especially when compiled under higher levels\nof compiler …

Prémkumar Dévanbu YOU? Author Swipe