Lingming Zhang
YOU?
Author Swipe
View article: PurpCode: Reasoning for Safer Code Generation
PurpCode: Reasoning for Safer Code Generation Open
We introduce PurpCode, the first post-training recipe for training safe code reasoning models towards generating secure code and defending against malicious cyberactivities. PurpCode trains a reasoning model in two stages: (i) Rule Learnin…
View article: A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications
A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications Open
Large language models (LLMs) are reshaping automated program repair. We present a unified taxonomy that groups 62 recent LLM-based repair systems into four paradigms defined by parameter adaptation and control authority over the repair loo…
View article: Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging
Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging Open
While existing machine learning (ML) frameworks focus on established platforms, like running CUDA on server-grade GPUs, there have been growing demands to enable emerging AI applications in a broader set of scenarios, such as running Large…
View article: Demystifying LLM-Based Software Engineering Agents
Demystifying LLM-Based Software Engineering Agents Open
Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practition…
View article: SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks Open
Rigorous security-focused evaluation of large language model (LLM) agents is imperative for establishing trust in their safe deployment throughout the software development lifecycle. However, existing benchmarks largely rely on synthetic c…
View article: KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models Open
View article: SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Open
The recent DeepSeek-R1 release has demonstrated the immense potential of reinforcement learning (RL) in enhancing the general reasoning capabilities of large language models (LLMs). While DeepSeek-R1 and other follow-up work primarily focu…
View article: TESTEVAL: Benchmarking Large Language Models for Test Case Generation
TESTEVAL: Benchmarking Large Language Models for Test Case Generation Open
View article: UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging
UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging Open
View article: SelfCodeAlign: Self-Alignment for Code Generation
SelfCodeAlign: Self-Alignment for Code Generation Open
Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for…
View article: WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models Open
Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences over the software supply chain. In the literature, fuzzing has been extensively studied to uncover compiler defects. However,…
View article: Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT
Automated Program Repair via Conversation: Fixing 162 out of 337 Bugs for $0.42 Each using ChatGPT Open
View article: Large Language Model-Based Agents for Software Engineering: A Survey
Large Language Model-Based Agents for Software Engineering: A Survey Open
The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs w…
View article: Evaluating Language Models for Efficient Code Generation
Evaluating Language Models for Efficient Code Generation Open
We introduce Differential Performance Evaluation (DPE), a framework designed to reliably evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding benchmarks often fail to provide reliable insights into code e…
View article: Agentless: Demystifying LLM-based Software Engineering Agents
Agentless: Demystifying LLM-based Software Engineering Agents Open
Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practition…
View article: RepoQA: Evaluating Long Context Code Understanding
RepoQA: Evaluating Long Context Code Understanding Open
Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a la…
View article: TESTEVAL: Benchmarking Large Language Models for Test Case Generation
TESTEVAL: Benchmarking Large Language Models for Test Case Generation Open
Testing plays a crucial role in the software development cycle, enabling the detection of bugs, vulnerabilities, and other undesirable behaviors. To perform software testing, testers need to write code snippets that execute the program und…
View article: UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging
UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging Open
Software debugging is a time-consuming endeavor involving a series of steps, such as fault localization and patch generation, each requiring thorough analysis and a deep understanding of the underlying logic. While large language models (L…
View article: XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts Open
We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to i…
View article: Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM
Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM Open
LLMs have become the go-to choice for code generation tasks, with an exponential increase in the training, development, and usage of LLMs specifically for code generation. To evaluate the ability of LLMs on code, both academic and industry…
View article: KernelGPT: Enhanced Kernel Fuzzing via Large Language Models
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models Open
Bugs in operating system kernels can affect billions of devices and users all over the world. As a result, a large body of research has been focused on kernel fuzzing, i.e., automatically generating syscall (system call) sequences to detec…
View article: Magicoder: Empowering Code Generation with OSS-Instruct
Magicoder: Empowering Code Generation with OSS-Instruct Open
We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trai…
View article: Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair Open
During Automated Program Repair (APR), it can be challenging to synthesize\ncorrect patches for real-world systems in general-purpose programming\nlanguages. Recent Large Language Models (LLMs) have been shown to be helpful\n"copilots" in …
View article: WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models
WhiteFox: White-Box Compiler Fuzzing Empowered by Large Language Models Open
Compiler correctness is crucial, as miscompilation can falsify program behaviors, leading to serious consequences. Fuzzing has been studied to uncover compiler defects. However, compiler fuzzing remains challenging: Existing arts focus on …
View article: ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference"
ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference" Open
This is the artifact for the ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference". Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the…
View article: ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference"
ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference" Open
This is the artifact for the ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference". Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the…
View article: ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference"
ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference" Open
This is the artifact for the ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference". Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the…
View article: ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference"
ESEC/FSE'23 Artifact for "NeuRI: Diversifying DNN Generation via Inductive Rule Inference" Open
This is the artifact for the ESEC/FSE'23 paper "NeuRI: Diversifying DNN Generation via Inductive Rule Inference". Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the…
View article: Fuzz4All: Universal Fuzzing with Large Language Models
Fuzz4All: Universal Fuzzing with Large Language Models Open
Fuzzing has achieved tremendous success in discovering bugs and vulnerabilities in various software systems. Systems under test (SUTs) that take in programming or formal language as inputs, e.g., compilers, runtime engines, constraint solv…
View article: ISSTA2023 Artifact for "Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models"
ISSTA2023 Artifact for "Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models" Open
This is the artifact for the ISSTA'2023 paper "Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models". Deep Learning (DL) systems have received exponential growth in popularity and have beco…