Lei Cao
YOU?
Author Swipe
View article: QUEST: Query Optimization in Unstructured Document Analysis
QUEST: Query Optimization in Unstructured Document Analysis Open
Most recently, researchers have started building large language models (LLMs) powered data systems that allow users to analyze unstructured text documents like working with a database because LLMs are very effective in extracting attribute…
View article: City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data
City-Level Foreign Direct Investment Prediction with Tabular Learning on Judicial Data Open
To advance the United Nations Sustainable Development Goal on promoting sustained, inclusive, and sustainable economic growth, foreign direct investment (FDI) plays a crucial role in catalyzing economic expansion and fostering innovation. …
View article: UniCell: Towards a Unified Solution for Cell Annotation, Nomenclature Harmonization, Atlas Construction in Single-Cell Transcriptomics
UniCell: Towards a Unified Solution for Cell Annotation, Nomenclature Harmonization, Atlas Construction in Single-Cell Transcriptomics Open
Standardizing cell type annotations across single-cell RNA-seq datasets remains a major challenge due to inconsistencies in nomenclature, variation in annotation granularity, and the presence of rare or previously unseen populations. We pr…
View article: Clinical Trial Nursing Experience with a Knee Rehabilitation Device for Elderly Patients
Clinical Trial Nursing Experience with a Knee Rehabilitation Device for Elderly Patients Open
Objective: To explore the clinical application effect of a new rehabilitation training device for elderly knee joints, and to provide a basis for its promotion and the optimization of elderly knee joint rehabilitation nursing. Methods: Eig…
View article: IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification
IncepFormerNet: A multi-scale multi-head attention network for SSVEP classification Open
In recent years, deep learning (DL) models have shown outstanding performance in EEG classification tasks, particularly in Steady-State Visually Evoked Potential(SSVEP)-based Brain-Computer-Interfaces(BCI)systems. DL methods have been succ…
View article: A large annotated cervical cytology images dataset for AI models to aid cervical cancer screening
A large annotated cervical cytology images dataset for AI models to aid cervical cancer screening Open
View article: DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management Open
View article: Not Another Dual Attention UNet Transformer (NNDA-UNETR): a plug-and-play parallel dual attention block in U-Net with enhanced residual blocks for medical image segmentation
Not Another Dual Attention UNet Transformer (NNDA-UNETR): a plug-and-play parallel dual attention block in U-Net with enhanced residual blocks for medical image segmentation Open
NNDA-UNETR offers a robust solution for accurate segmentation in multi-organ tasks, particularly where organ adhesion poses challenges. Its lightweight design also makes it well-suited for deployment in real-world medical environments with…
View article: An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning
An ultrasonography of thyroid nodules dataset with pathological diagnosis annotation for deep learning Open
View article: ST-GEARS: Advancing 3D downstream research through accurate spatial information recovery
ST-GEARS: Advancing 3D downstream research through accurate spatial information recovery Open
Three-dimensional Spatial Transcriptomics has revolutionized our understanding of tissue regionalization, organogenesis, and development. However, existing approaches overlook either spatial information or experiment-induced distortions, l…
View article: CascadeServe: Unlocking Model Cascades for Inference Serving
CascadeServe: Unlocking Model Cascades for Inference Serving Open
Machine learning (ML) models are increasingly deployed to production, calling for efficient inference serving systems. Efficient inference serving is complicated by two challenges: (i) ML models incur high computational costs, and (ii) the…
View article: A Declarative System for Optimizing AI Workloads
A Declarative System for Optimizing AI Workloads Open
A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract…
View article: RITA: Group Attention is All You Need for Timeseries Analytics
RITA: Group Attention is All You Need for Timeseries Analytics Open
Timeseries analytics is important in many real-world applications. Recently, the Transformer model, popular in natural language processing, has been leveraged to learn high quality feature embeddings from timeseries: embeddings are key to …
View article: Editorial: Exploration of the non-invasive brain-computer interface and neurorehabilitation
Editorial: Exploration of the non-invasive brain-computer interface and neurorehabilitation Open
Keywords: brain-computer interface (BCI), electroencephalogram (EEG), stroke, rehabilitation, algorithm
View article: A Novel Variable Neighborhood Search Approach for Cell Clustering for Spatial Transcriptomics
A Novel Variable Neighborhood Search Approach for Cell Clustering for Spatial Transcriptomics Open
This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confron…
View article: Multiple cosmic strings in Chern-Simons-Higgs theory with gravity
Multiple cosmic strings in Chern-Simons-Higgs theory with gravity Open
In this paper, we consider the self-dual equation arising from Abelian Chern-Simons-Higgs theory coupled to the Einstein equations over the plane $\mathbb{R}^2$ and a compact surface $S$. We prove the existence of symmetric topological sol…
View article: A multi-feature stock price prediction model based on multi-feature calculation, LASSO feature selection, and Ca-LSTM network
A multi-feature stock price prediction model based on multi-feature calculation, LASSO feature selection, and Ca-LSTM network Open
This paper addresses the crucial realm of stock price prediction, highly coveted by individual investors and institutions for its substantial economic implications. The inherent non-stationary and intricate nature of stock market fluctuati…
View article: Automatic Data Transformation Using Large Language Model - An Experimental Study on Building Energy Data
Automatic Data Transformation Using Large Language Model - An Experimental Study on Building Energy Data Open
Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledg…
View article: SEED: Domain-Specific Data Curation With Large Language Models
SEED: Domain-Specific Data Curation With Large Language Models Open
Data curation tasks that prepare data for analytics are critical for turning data into actionable insights. However, due to the diverse requirements of applications in different domains, generic off-the-shelf tools are typically insufficie…
View article: Cosmic strings in a generalized linear formulation of gauge field theory
Cosmic strings in a generalized linear formulation of gauge field theory Open
In this note we construct self-dual cosmic strings from a gauge field theory with a generalized linear formation of potential energy density. By integrating the Einstein equation, we obtain a nonlinear elliptic equation which is equal with…
View article: Short-term surgical outcomes of spontaneous intracerebral hemorrhage in China from 2019 to 2021: a retrospective cohort study
Short-term surgical outcomes of spontaneous intracerebral hemorrhage in China from 2019 to 2021: a retrospective cohort study Open
View article: Domain Wall Solution Arising in Abelian Higgs Model Subject to Born-Infeld Theory of Electrodynamics
Domain Wall Solution Arising in Abelian Higgs Model Subject to Born-Infeld Theory of Electrodynamics Open
In this note we research the Abelian Higgs model subject to the Born-Infeld theory of electrodynamics for which the BPS equations can be reduced into a quasi-linear differential equation. We show that the equation exists a unique solution …
View article: Lingua Manga: A Generic Large Language Model Centric System for Data Curation
Lingua Manga: A Generic Large Language Model Centric System for Data Curation Open
Data curation is a wide-ranging area which contains many critical but time-consuming data processing tasks. However, the diversity of such tasks makes it challenging to develop a general-purpose data curation system. To address this issue,…
View article: RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training
RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training Open
We propose RoTaR, a row-based table representation learning method, to address the efficiency and scalability issues faced by existing table representation learning methods. The key idea of RoTaR is to generate query-agnostic row represent…
View article: Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation
Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation Open
Zero-shot NL2SQL is crucial in achieving natural language to SQL that is adaptive to new environments (e.g., new databases, new linguistic phenomena or SQL structures) with zero annotated NL2SQL samples from such environments. Existing app…
View article: RITA: Group Attention is All You Need for Timeseries Analytics
RITA: Group Attention is All You Need for Timeseries Analytics Open
Timeseries analytics is of great importance in many real-world applications. Recently, the Transformer model, popular in natural language processing, has been leveraged to learn high quality feature embeddings from timeseries, core to the …
View article: AutoOD: Automatic Outlier Detection
AutoOD: Automatic Outlier Detection Open
Outlier detection is critical in real world. Due to the existence of many outlier detection techniques which often return different results for the same data set, the users have to address the problem of determining which among these techn…
View article: Extract-Transform-Load for Video Streams
Extract-Transform-Load for Video Streams Open
Social media, self-driving cars, and traffic cameras produce video streams at large scales and cheap cost. However, storing and querying video at such scales is prohibitively expensive. We propose to treat large-scale video analytics as a …
View article: Application of AR virtual implantation technology based on deep learning and emotional technology in the creation of interactive picture books
Application of AR virtual implantation technology based on deep learning and emotional technology in the creation of interactive picture books Open
In recent years, the field of deep learning has flourished, not only breaking through many difficult problems that are difficult to be solved by traditional algorithms but also bursting with greater vitality when combined with other fields…
View article: A Stable Large-Scale Multiobjective Optimization Algorithm with Two Alternative Optimization Methods
A Stable Large-Scale Multiobjective Optimization Algorithm with Two Alternative Optimization Methods Open
For large-scale multiobjective evolutionary algorithms based on the grouping of decision variables, the challenge is to design a stable grouping strategy to balance convergence and population diversity. This paper proposes a large-scale mu…