Zibin Zheng
YOU?
Author Swipe
View article: A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI
A Systematic Literature Review of Code Hallucinations in LLMs: Characterization, Mitigation Methods, Challenges, and Future Directions for Reliable AI Open
Model hallucination is one of the most critical challenges faced by Large Language Models (LLMs), especially in high-stakes code intelligence tasks. As LLMs become increasingly integrated into software engineering tasks, understanding and …
View article: EffiReasonTrans: RL-Optimized Reasoning for Code Translation
EffiReasonTrans: RL-Optimized Reasoning for Code Translation Open
Code translation is a crucial task in software development and maintenance. While recent advancements in large language models (LLMs) have improved automated code translation accuracy, these gains often come at the cost of increased infere…
View article: RiskTagger: An LLM-based Agent for Automatic Annotation of Web3 Crypto Money Laundering Behaviors
RiskTagger: An LLM-based Agent for Automatic Annotation of Web3 Crypto Money Laundering Behaviors Open
While the rapid growth of Web3 has driven the development of decentralized finance, user anonymity and cross-chain asset flows make on-chain laundering behaviors more covert and complex. In this context, constructing high-quality anti-mone…
View article: Research progress of the reliability of DNA data storage
Research progress of the reliability of DNA data storage Open
View article: CCIHunter: Enhancing Smart Contract Code-Comment Inconsistencies Detection via Two-Stage Pre-training
CCIHunter: Enhancing Smart Contract Code-Comment Inconsistencies Detection via Two-Stage Pre-training Open
Smart contracts are self-executing computer programs on blockchains. With the development of blockchain technology, the number of smart contracts has grown rapidly, as has the concern for their security. Regrettably, inconsistencies betwee…
View article: EvolMathEval: Towards Evolvable Benchmarks for Mathematical Reasoning via Evolutionary Testing
EvolMathEval: Towards Evolvable Benchmarks for Mathematical Reasoning via Evolutionary Testing Open
The rapid advancement of Large Language Models (LLMs) poses a significant challenge to existing mathematical reasoning benchmarks. However, these benchmarks tend to become easier over time as LLMs can learn from the published benchmarks. T…
View article: An Empirical Study on Embodied Artificial Intelligence Robot (EAIR) Software Bugs
An Empirical Study on Embodied Artificial Intelligence Robot (EAIR) Software Bugs Open
Embodied Artificial Intelligence Robots (EAIR) is an emerging and rapidly evolving technological domain. Ensuring their program correctness is fundamental to their successful deployment. However, a general and in-depth understanding of EAI…
View article: An Empirical Study of Interaction Bugs in ROS-based Software
An Empirical Study of Interaction Bugs in ROS-based Software Open
Modern robotic systems integrate multiple independent software and hardware components, each responsible for distinct functionalities such as perception, decision-making, and execution. These components interact extensively to accomplish c…
View article: Copy-and-Paste? Identifying EVM-Inequivalent Code Smells in Multi-chain Reuse Contracts
Copy-and-Paste? Identifying EVM-Inequivalent Code Smells in Multi-chain Reuse Contracts Open
As the development of Solidity contracts on Ethereum , more developers are reusing them on other compatible blockchains. However, developers may overlook the differences between the designs of the blockchain system, such as the Gas Mechani…
View article: LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation
LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation Open
Code generation aims to automatically generate code from input requirements, significantly enhancing development efficiency. Recent large language models (LLMs) based approaches have shown promising results and revolutionized code generati…
View article: OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution
OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Open
The GitHub issue resolution task aims to resolve issues reported in repositories automatically. With advances in large language models (LLMs), this task has gained increasing attention, and several benchmarks are proposed to evaluate the i…
View article: CKTyper: Enhancing Type Inference for Java Code Snippets by Leveraging Crowdsourcing Knowledge in Stack Overflow
CKTyper: Enhancing Type Inference for Java Code Snippets by Leveraging Crowdsourcing Knowledge in Stack Overflow Open
Code snippets are widely used in technical forums to demonstrate solutions to programming problems. They can be leveraged by developers to accelerate problem-solving. However, code snippets often lack concrete types of the APIs used in the…
View article: Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models
Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models Open
Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, w…
View article: An Empirical Study of Code Clones from Commercial AI Code Generators
An Empirical Study of Code Clones from Commercial AI Code Generators Open
Deep learning (DL) has revolutionized various software engineering tasks. Particularly, the emergence of AI code generators has pushed the boundaries of automatic programming to synthesize entire programs based on user-defined specificatio…
View article: Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing
Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing Open
Question answering (QA) is a fundamental task of large language models (LLMs), which requires LLMs to automatically answer human-posed questions in natural language. However, LLMs are known to distort facts and make non-factual statements …
View article: MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine
MTCMB: A Multi-Task Benchmark Framework for Evaluating LLMs on Knowledge, Reasoning, and Safety in Traditional Chinese Medicine Open
Traditional Chinese Medicine (TCM) is a holistic medical system with millennia of accumulated clinical experience, playing a vital role in global healthcare-particularly across East Asia. However, the implicit reasoning, diverse textual fo…
View article: TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment Open
High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by e…
View article: Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs
Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs Open
Adversarial attacks aim to generate malicious inputs that mislead deep models, but beyond causing model failure, they cannot provide certain interpretable information such as ``\textit{What content in inputs make models more likely to fail…
View article: OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution
OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution Open
The GitHub issue resolution task aims to resolve issues reported in repositories automatically. With advances in large language models (LLMs), this task has gained increasing attention, and several benchmarks are proposed to evaluate the i…
View article: FinanceFuzz: Fuzzing Smart Contracts with Financial Properties
FinanceFuzz: Fuzzing Smart Contracts with Financial Properties Open
View article: Hunting in the Dark Forest: A Pre-trained Model for On-chain Attack Transaction Detection in Web3
Hunting in the Dark Forest: A Pre-trained Model for On-chain Attack Transaction Detection in Web3 Open
View article: Copy-and-Paste? Identifying EVM-Inequivalent Code Smells in Multi-chain Reuse Contracts
Copy-and-Paste? Identifying EVM-Inequivalent Code Smells in Multi-chain Reuse Contracts Open
As the development of Solidity contracts on Ethereum, more developers are reusing them on other compatible blockchains. However, developers may overlook the differences between the designs of the blockchain system, such as the Gas Mechanis…
View article: What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond
What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond Open
Repository-level code generation remains challenging due to complex code dependencies and the limitations of large language models (LLMs) in processing long contexts. While retrieval-augmented generation (RAG) frameworks are widely adopted…
View article: Characterizing Smart Contract Evolution
Characterizing Smart Contract Evolution Open
Smart contracts are programs that permanently store and automatically execute on the blockchain system such as Ethereum. Due to the non-tamperable nature of the underlying blockchain, smart contracts are difficult to update once deployed, …
View article: WakeMint: Detecting Sleepminting Vulnerabilities in NFT Smart Contracts
WakeMint: Detecting Sleepminting Vulnerabilities in NFT Smart Contracts Open
The non-fungible tokens (NFTs) market has evolved over the past decade, with NFTs serving as unique digital identifiers on a blockchain that certify ownership and authenticity. However, their high value also attracts attackers who exploit …
View article: Are Large Language Models In-Context Graph Learners?
Are Large Language Models In-Context Graph Learners? Open
Large language models (LLMs) have demonstrated remarkable in-context reasoning capabilities across a wide range of tasks, particularly with unstructured inputs such as language or images. However, LLMs struggle to handle structured data, s…
View article: Measuring Diversity in Synthetic Datasets
Measuring Diversity in Synthetic Datasets Open
Large language models (LLMs) are widely adopted to generate synthetic datasets for various natural language processing (NLP) tasks, such as text classification and summarization. However, accurately measuring the diversity of these synthet…
View article: Tracezip: Efficient Distributed Tracing via Trace Compression
Tracezip: Efficient Distributed Tracing via Trace Compression Open
Distributed tracing serves as a fundamental building block in the monitoring and testing of cloud service systems. To reduce computational and storage overheads, the de facto practice is to capture fewer traces via sampling. However, exist…
View article: Efficient and Trustworthy Block Propagation for Blockchain-enabled Mobile Embodied AI Networks: A Graph Resfusion Approach
Efficient and Trustworthy Block Propagation for Blockchain-enabled Mobile Embodied AI Networks: A Graph Resfusion Approach Open
By synergistically integrating mobile networks and embodied artificial intelligence (AI), Mobile Embodied AI Networks (MEANETs) represent an advanced paradigm that facilitates autonomous, context-aware, and interactive behaviors within dyn…
View article: How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs Open
Various benchmarks have been proposed to assess the performance of large language models (LLMs) in different coding scenarios. We refer to them as code-related benchmarks. However, there are no systematic guidelines by which such a benchma…