Explanipedia

PurpCode: Reasoning for Safer Code Generation Open

Jiawei Liu, Nirav Diwan, Zhe Wang, Haoyu Zhai, Xiaona Zhou , et al. · 2025

We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learnin…

A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications Open

Boyang Yang, Zi-Jian Cai, Brian Le, Lingming Zhang, Tegawendé F. Bissyandé , et al. · 2025

Large language models (LLMs) are reshaping automated program repair. We present a unified taxonomy that groups 62 recent LLM-based repair systems into four paradigms defined by parameter adaptation and control authority over the repair loo…

Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging Open

S.M. Feng, Jiawei Liu, Ruihang Lai, Charlie F. Ruan, Yong Yu , et al. · 2025

While existing machine learning (ML) frameworks focus on established platforms, like running CUDA on server-grade GPUs, there have been growing demands to enable emerging AI applications in a broader set of scenarios, such as running Large…

Demystifying LLM-Based Software Engineering Agents Open

Chunqiu Steven Xia, Yinlin Deng, S.M. Dunn, Lingming Zhang · 2025

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practition…

SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks Open

H.-C. Lee, Ziqi Zhang, Hongwei Lu, Lingming Zhang · 2025

Rigorous security-focused evaluation of large language model (LLM) agents is imperative for establishing trust in their safe deployment throughout the software development lifecycle. However, existing benchmarks largely rely on synthetic c…

KernelGPT: Enhanced Kernel Fuzzing via Large Language Models Open

Chenyuan Yang, Zijie Zhao, Lingming Zhang · 2025

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Open

Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, Lingming Zhang , et al. · 2025

The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While DeepSeek-R1 and other follow-up work primarily focu…

TESTEVAL: Benchmarking Large Language Models for Test Case Generation Open

Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu , et al. · 2025

UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging Open

Cheryl Lee, Chunqiu Steven Xia, Lili Yang, Jen-tse Huang, Z. Q. Zhu , et al. · 2025

SelfCodeAlign: Self-Alignment for Code Generation Open

Yuxiang Wei, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain , et al. · 2024

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for…

WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models Open

Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu , et al. · 2024

Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However,…

Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT Open

Chunqiu Steven Xia, Lingming Zhang · 2024

Large Language Model-Based Agents for Software Engineering: A Survey Open

Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen , et al. · 2024

The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs w…

Evaluating Language Models for Efficient Code Generation Open

Jiawei Liu, Songrun Xie, Junhao Wang, Yuxiang Wei, Yifeng Ding , et al. · 2024

We introduce Differential Performance Evaluation (DPE), a framework designed to reliably evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding benchmarks often fail to provide reliable insights into code e…

Agentless: Demystifying LLM-based Software Engineering Agents Open

Chunqiu Steven Xia, Yinlin Deng, S.M. Dunn, Lingming Zhang · 2024

Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practition…

RepoQA: Evaluating Long Context Code Understanding Open

Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding , et al. · 2024

Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a la…

TESTEVAL: Benchmarking Large Language Models for Test Case Generation Open

Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu , et al. · 2024

Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program und…

UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging Open

Cheryl Lee, Chunqiu Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang , et al. · 2024

Software debugging is a time-consuming endeavor involving a series of steps, such as fault localization and patch generation, each requiring thorough analysis and a deep understanding of the underlying logic. While large language models (L…

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts Open

Yifeng Ding, Jiawei Liu, Yuxiang Wei, Terry Yue Zhuo, Lingming Zhang · 2024

We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to i…

Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM Open

Chunqiu Steven Xia, Yinlin Deng, Lingming Zhang · 2024

LLMs have become the go-to choice for code generation tasks, with an exponential increase in the training, development, and usage of LLMs specifically for code generation. To evaluate the ability of LLMs on code, both academic and industry…

KernelGPT: Enhanced Kernel Fuzzing via Large Language Models Open

Chenyuan Yang, Zijie Zhao, Lingming Zhang · 2023

Bugs in operating system kernels can affect billions of devices and users all over the world. As a result, a large body of research has been focused on kernel fuzzing, i.e., automatically generating syscall (system call) sequences to detec…

Magicoder: Empowering Code Generation with OSS-Instruct Open

Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, Lingming Zhang · 2023

We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trai…

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair Open

Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang · 2023

During Automated Program Repair (APR), it can be challenging to synthesize\ncorrect patches for real-world systems in general-purpose programming\nlanguages. Recent Large Language Models (LLMs) have been shown to be helpful\n"copilots" in …

WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models Open

Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu , et al. · 2023

Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences. Fuzzing has been studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on …

ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference" Open

Jiawei Liu, Jinjun Peng, Yuyao Wang, Lingming Zhang · 2023

This is the artifact for the ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference". Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the…