Rongyu Cao
YOU?
Author Swipe
View article: CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment Open
While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by form…
View article: Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model Open
Diffusion language models (DLMs) are emerging as a powerful and promising alternative to the dominant autoregressive paradigm, offering inherent advantages in parallel generation and bidirectional context modeling. However, the performance…
View article: Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format
Format-Adapter: Improving Reasoning Capability of LLMs by Adapting Suitable Format Open
Generating and voting multiple answers is an effective method to mitigate reasoning inconsistencies of large language models (LLMs). Prior works have shown that multiple reasoning formats outperform a single format when generating multiple…
View article: SWE-GPT: A Process-Centric Language Model for Automated Software Improvement
SWE-GPT: A Process-Centric Language Model for Automated Software Improvement Open
Large language models (LLMs) have demonstrated remarkable performance in code generation, significantly enhancing the coding efficiency of developers. Recent advancements in LLM-based agents have led to significant progress in end-to-end a…
View article: Do Code LLMs Understand Design Patterns?
Do Code LLMs Understand Design Patterns? Open
Code Large Language Models (LLMs) demonstrate great versatility in adapting to various downstream tasks, including code generation and completion, as well as bug detection and fixing. However, Code LLMs often fail to capture existing codin…
View article: LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues
LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues Open
Reproducing buggy code is the first and crucially important step in issue resolving, as it aids in identifying the underlying problems and validating that generated patches resolve the problem. While numerous approaches have been proposed …
View article: Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement Open
Recent advancements in LLM-based agents have led to significant progress in automatic software engineering, particularly in software maintenance and evolution. Despite these encouraging advances, current research faces two major challenges…
View article: Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? Open
Code completion, a key downstream task in code generation, is one of the most frequent and impactful methods for enhancing developer productivity in software development. As intelligent completion tools evolve, we need a robust evaluation …
View article: In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks
In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks Open
In-context learning (ICL) is an effective approach to help large language models (LLMs) adapt to various tasks by providing demonstrations of the target task. Considering the high cost of labeling demonstrations, many methods propose synth…
View article: Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration Open
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution. Deployed in TONGYI Lingma, an IDE-based coding assi…
View article: CircAGFG1 Promotes Ovarian Cancer Progression Through the miR-409-3 p/ZEB1 Axis
CircAGFG1 Promotes Ovarian Cancer Progression Through the miR-409-3 p/ZEB1 Axis Open
Objectives Circular RNAs (circRNAs) serve a crucial regulatory role in ovarian cancer (OC). Circular RNA ArfGAP with FG repeats 1 (circAGFG1) has been shown to be involved in promoting the progression of several cancers, containing triple-…
View article: Benefit distribution of integrated regional energy systems under carbon trading mechanisms based on improved Shapley value methods
Benefit distribution of integrated regional energy systems under carbon trading mechanisms based on improved Shapley value methods Open
Carbon trading mechanisms and the development of integrated energy systems are important ways to realize the “carbon peaking and carbon neutrality” goal, and the problem of benefit distribution is of paramount importance to achieving the g…
View article: CATS: A Pragmatic Chinese Answer-to-Sequence Dataset with Large Scale and High Quality
CATS: A Pragmatic Chinese Answer-to-Sequence Dataset with Large Scale and High Quality Open
There are three problems existing in the popular data-to-text datasets. First, the large-scale datasets either contain noise or lack real application scenarios. Second, the datasets close to real applications are relatively small in size. …
View article: Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs Open
Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most o…
View article: CATS: A Pragmatic Chinese Answer-to-Sequence Dataset with Large Scale and High Quality
CATS: A Pragmatic Chinese Answer-to-Sequence Dataset with Large Scale and High Quality Open
Liang Li, Ruiying Geng, Chengyang Fang, Bing Li, Can Ma, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023.
View article: Application of long non-coding RNA RBAT1 in improving diagnosis and prognosis of ovarian carcinoma
Application of long non-coding RNA RBAT1 in improving diagnosis and prognosis of ovarian carcinoma Open
Tumorigenesis of bladder cancer and retinoblastoma is correlated with long non-coding RNA (lncRNA) RBAT1. However, the role of RBAT1 in ovarian carcinoma (OC) is unclear. Thus, the study explored the role of RBAT1 in OC. This research enro…
View article: A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions Open
Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational dat…
View article: Extracting Variable-Depth Logical Document Hierarchy from Long Documents: Method, Evaluation, and Application
Extracting Variable-Depth Logical Document Hierarchy from Long Documents: Method, Evaluation, and Application Open
View article: Extracting Zero-shot Structured Information from Form-like Documents: Pretraining with Keys and Triggers
Extracting Zero-shot Structured Information from Form-like Documents: Pretraining with Keys and Triggers Open
In this paper, we revisit the problem of extracting the values of a given set of key fields from form-like documents. It is the vital step to support many downstream applications, such as knowledge base construction, question answering, do…
View article: LncRNA DLGAP1-AS2 Suppresses the Maturation of miR-16 to Suppress Cell Invasion and Migration of Ovarian Cancer Cells
LncRNA DLGAP1-AS2 Suppresses the Maturation of miR-16 to Suppress Cell Invasion and Migration of Ovarian Cancer Cells Open
Background: This study aimed to explore the role of lncRNA DLGAP1-AS2 in ovarian cancer (OC). Methods: Expression of DLGAP1-AS2, mature miR-16 and miR-16 precursor in paired OC tissues and non-tumor tissues collected from 62 OC patients wa…
View article: Hierarchical Neural Network for Extracting Knowledgeable Snippets and\n Documents
Hierarchical Neural Network for Extracting Knowledgeable Snippets and\n Documents Open
In this study, we focus on extracting knowledgeable snippets and annotating\nknowledgeable documents from Web corpus, consisting of the documents from\nsocial media and We-media. Informally, knowledgeable snippets refer to the text\ndescri…
View article: Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents
Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents Open
In this study, we focus on extracting knowledgeable snippets and annotating knowledgeable documents from Web corpus, consisting of the documents from social media and We-media. Informally, knowledgeable snippets refer to the text describin…
View article: Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation
Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation Open
Different from other sequential data, sentences in natural language are structured by linguistic grammars. Previous generative conversational models with chain-structured decoder ignore this structure in human language and might generate p…
View article: Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation
Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation Open
Different from other sequential data, sentences in natural language are structured by linguistic grammars. Previous generative conversational models with chain-structured decoder ignore this structure in human language and might generate p…
View article: Generative Neural Machine for Tree Structures
Generative Neural Machine for Tree Structures Open
Tree structures are commonly used in the tasks of semantic analysis and understanding over the data of different modalities, such as natural language, 2D or 3D graphics and images, or Web pages. Previous studies model the structures in a …
View article: Mechanism-Aware Neural Machine for Dialogue Response Generation
Mechanism-Aware Neural Machine for Dialogue Response Generation Open
To the same utterance, people's responses in everyday dialogue may be diverse largely in terms of content semantics, speaking styles, communication intentions and so on. Previous generative conversational models ignore these 1-to-n relatio…
View article: Robust Indoor Human Activity Recognition Using Wireless Signals
Robust Indoor Human Activity Recognition Using Wireless Signals Open
Wireless signals–based activity detection and recognition technology may be complementary to the existing vision-based methods, especially under the circumstance of occlusions, viewpoint change, complex background, lighting condition chang…