Explanipedia

A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI Open

Cuiyun Gao, Guodong Fan, Chun Yong Chong, Shizhan Chen, Lijun Liu , et al. · 2025

Model hallucination is one of the most critical challenges faced by Large Language Models (LLMs), especially in high-stakes code intelligence tasks. As LLMs become increasingly integrated into software engineering tasks, understanding and …

EffiReasonTrans: RL-Optimized Reasoning for Code Translation Open

Yanlin Wang, R. Ou, Yanli Wang, Mingwei Liu, Jiachi Chen , et al. · 2025

Code translation is a crucial task in software development and maintenance. While recent advancements in large language models (LLMs) have improved automated code translation accuracy, these gains often come at the cost of increased infere…

RiskTagger: An LLM-based Agent for Automatic Annotation of Web3 Crypto Money Laundering Behaviors Open

Dan Lin, Yihui Ding, Weipeng Zou, Jiachi Chen, Xiapu Luo , et al. · 2025

While the rapid growth of Web3 has driven the development of decentralized finance, user anonymity and cross-chain asset flows make on-chain laundering behaviors more covert and complex. In this context, constructing high-quality anti-mone…

Research progress of the reliability of DNA data storage Open

Xiuming Yue, Zibin Zheng, Rong Cao, Peng Zhou, Xin Chen · 2025

CCIHunter: Enhancing Smart Contract Code-Comment Inconsistencies Detection via Two-Stage Pre-training Open

Ziwei Li, Jiajing Wu, Zhiying Wu, D. Tan, Weipeng Zou , et al. · 2025

Smart contracts are self-executing computer programs on blockchains. With the development of blockchain technology, the number of smart contracts has grown rapidly, as has the concern for their security. Regrettably, inconsistencies betwee…

EvolMathEval: Towards Evolvable Benchmarks for Mathematical Reasoning via Evolutionary Testing Open

Shengbo Wang, Mingwei Liu, Zhonghai Li, An Li, Yanlin Wang , et al. · 2025

The rapid advancement of Large Language Models (LLMs) poses a significant challenge to existing mathematical reasoning benchmarks. However, these benchmarks tend to become easier over time as LLMs can learn from the published benchmarks. T…

An Empirical Study on Embodied Artificial Intelligence Robot (EAIR) Software Bugs Open

Zeqin Liao, Zibin Zheng, Peifan Reng, Henglong Liang, Zhancheng Gao , et al. · 2025

Embodied Artificial Intelligence Robots (EAIR) is an emerging and rapidly evolving technological domain. Ensuring their program correctness is fundamental to their successful deployment. However, a general and in-depth understanding of EAI…

An Empirical Study of Interaction Bugs in ROS-based Software Open

Zhixiang Chen, Zhuangbin Chen, Xudong Cai, Wei Li, Zibin Zheng · 2025

Modern robotic systems integrate multiple independent software and hardware components, each responsible for distinct functionalities such as perception, decision-making, and execution. These components interact extensively to accomplish c…

Copy-and-Paste? Identifying EVM-Inequivalent Code Smells in Multi-chain Reuse Contracts Open

Zexu Wang, Jiachi Chen, Tao Zhang, Yu Zhang, Weizhe Zhang , et al. · 2025

As the development of Solidity contracts on Ethereum , more developers are reusing them on other compatible blockchains. However, developers may overlook the differences between the designs of the blockchain system, such as the Gas Mechani…

LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation Open

Z. H. Zhang, Chong Wang, Yanlin Wang, Ensheng Shi, Yuchi Ma , et al. · 2025

Code generation aims to automatically generate code from input requirements, significantly enhancing development efficiency. Recent large language models (LLMs) based approaches have shown promising results and revolutionized code generati…

OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Open

Lianghong Guo, Wei Tao, Rain Jiang, Yanlin Wang, Jiachi Chen , et al. · 2025

The GitHub issue resolution task aims to resolve issues reported in repositories automatically. With advances in large language models (LLMs), this task has gained increasing attention, and several benchmarks are proposed to evaluate the i…

CKTyper: Enhancing Type Inference for Java Code Snippets by Leveraging Crowdsourcing Knowledge in Stack Overflow Open

Anji Li, Neng Zhang, Ying Zou, Z F Chen, Jian Wang , et al. · 2025

Code snippets are widely used in technical forums to demonstrate solutions to programming problems. They can be leveraged by developers to accelerate problem-solving. However, code snippets often lack concrete types of the APIs used in the…

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models Open

Yanlin Wang, Ting Jiang, Mingwei Liu, Jiachi Chen, Mingzhi Mao , et al. · 2025

Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, w…

An Empirical Study of Code Clones from Commercial AI Code Generators Open

Weibin Wu, Haoxuan Hu, Zhongqin Fan, Yongjie Qiao, Zhuangbin Chen , et al. · 2025

Deep learning (DL) has revolutionized various software engineering tasks. Particularly, the emergence of AI code generators has pushed the boundaries of automatic programming to synthesize entire programs based on user-defined specificatio…

Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing Open

Weibin Wu, Yuan Cao, Ning Yi, R. Ou, Zibin Zheng · 2025

Question answering (QA) is a fundamental task of large language models (LLMs), which requires LLMs to automatically answer human-posed questions in natural language. However, LLMs are known to distort facts and make non-factual statements …

MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine Open

Shufeng Kong, Xi Yang, Yuanyuan Wei, Zijie Wang, Jing Tang , et al. · 2025

Traditional Chinese Medicine (TCM) is a holistic medical system with millennia of accumulated clinical experience, playing a vital role in global healthcare-particularly across East Asia. However, the implicit reasoning, diverse textual fo…

TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment Open

Suli Wu, Dan Li, H. Ye, Zhuomin Chen, Jiahui Zhou , et al. · 2025

High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by e…

Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs Open

Zihao Pan, Tong Yu, Weibin Wu, Jingyi Wang, Haiyong Chen , et al. · 2025

Adversarial attacks aim to generate malicious inputs that mislead deep models, but beyond causing model failure, they cannot provide certain interpretable information such as ``\textit{What content in inputs make models more likely to fail…

OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Open

Lianghong Guo, Tao Wei, Rain Jiang, Yanlin Wang, Jiachi Chen , et al. · 2025

The GitHub issue resolution task aims to resolve issues reported in repositories automatically. With advances in large language models (LLMs), this task has gained increasing attention, and several benchmarks are proposed to evaluate the i…

FinanceFuzz: Fuzzing Smart Contracts with Financial Properties Open

Jiazhen Gan, Jianzhong Su, Kaixin Lin, Zibin Zheng · 2025

Hunting in the Dark Forest: A Pre-trained Model for On-chain Attack Transaction Detection in Web3 Open

Zhiying Wu, Jiajing Wu, Hui Zhang, Zibin Zheng, Weiqiang Wang · 2025

Copy-and-Paste? Identifying EVM-Inequivalent Code Smells in Multi-chain Reuse Contracts Open

Zexu Wang, Jiachi Chen, Tao Zhang, Yu Zhang, Weizhe Zhang , et al. · 2025

As the development of Solidity contracts on Ethereum, more developers are reusing them on other compatible blockchains. However, developers may overlook the differences between the designs of the blockchain system, such as the Gas Mechanis…

What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond Open

Wenchao Gu, Juntao Chen, Yanlin Wang, Tianyue Jiang, Xiuyang Li , et al. · 2025

Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted…

Characterizing Smart Contract Evolution Open

Xiangping Chen, Z. Qian, Peiyong Liao, Yuan Huang, Changlin Yang , et al. · 2025

Smart contracts are programs that permanently store and automatically execute on the blockchain system such as Ethereum. Due to the non-tamperable nature of the underlying blockchain, smart contracts are difficult to update once deployed, …

WakeMint: Detecting Sleepminting Vulnerabilities in NFT Smart Contracts Open

Lei Xiao, Shuo Yang, Wen Chen, Zibin Zheng · 2025

The non-fungible tokens (NFTs) market has evolved over the past decade, with NFTs serving as unique digital identifiers on a blockchain that certify ownership and authenticity. However, their high value also attracts attackers who exploit …

Are Large Language Models In-Context Graph Learners? Open

Jintang Li, Ruofan Wu, Yuchang Zhu, Huizhe Zhang, Liang Chen , et al. · 2025

Large language models (LLMs) have demonstrated remarkable in-context reasoning capabilities across a wide range of tasks, particularly with unstructured inputs such as language or images. However, LLMs struggle to handle structured data, s…

Measuring Diversity in Synthetic Datasets Open

Yuchang Zhu, Huizhe Zhang, Bingzhe Wu, Jintang Li, Zibin Zheng , et al. · 2025

Large language models (LLMs) are widely adopted to generate synthetic datasets for various natural language processing (NLP) tasks, such as text classification and summarization. However, accurately measuring the diversity of these synthet…

Tracezip: Efficient Distributed Tracing via Trace Compression Open

Zhuangbin Chen, Jiali Pu, Zibin Zheng · 2025

Distributed tracing serves as a fundamental building block in the monitoring and testing of cloud service systems. To reduce computational and storage overheads, the de facto practice is to capture fewer traces via sampling. However, exist…

Efficient and Trustworthy Block Propagation for Blockchain-enabled Mobile Embodied AI Networks: A Graph Resfusion Approach Open

Jiawen Kang, Jiana Liao, Rui Gao, Jinbo Wen, Huawei Huang , et al. · 2025

By synergistically integrating mobile networks and embodied artificial intelligence (AI), Mobile Embodied AI Networks (MEANETs) represent an advanced paradigm that facilitates autonomous, context-aware, and interactive behaviors within dyn…

How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs Open

Jialun Cao, Yi‐Hsin Chan, Zixuan Ling, Wenxuan Wang, Shuqing Li , et al. · 2025

Various benchmarks have been proposed to assess the performance of large language models (LLMs) in different coding scenarios. We refer to them as code-related benchmarks. However, there are no systematic guidelines by which such a benchma…

Zibin Zheng YOU? Author Swipe