Explanipedia

Towards Efficient Random-Order Enumeration for Join Queries Open

Pengyu Chen, Zizheng Guo, Jianwei Yang, Dongjing Miao · 2025

In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful repr…

Cost-effective Missing Value Imputation for Data-effective Machine Learning Open

Chengliang Chai, Kaisen Jin, Nan Tang, Ju Fan, Dongjing Miao , et al. · 2025

Computer science

Given a dataset with incomplete data (e.g., missing values), training a machine learning model over the incomplete data requires two steps. First, it requires a data-effective step that cleans the data in order to improve the data quality …

An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem Open

Huaiyuan Liu, Xianzhang Liu, Donghua Yang, Hongzhi Wang, Yingchi Long , et al. · 2024

Computer science Mathematics Psychology

The Maximum Minimal Cut Problem (MMCP), a NP-hard combinatorial optimization\n(CO) problem, has not received much attention due to the demanding and\nchallenging bi-connectivity constraint. Moreover, as a CO problem, it is also a\ndaunting…

Data Debugging is NP-hard for Classifiers Trained with SGD Open

Zizheng Guo, Pengyu Chen, Yanyu Fu, Dongjing Miao · 2024

Computer science

Data debugging is to find a subset of the training data such that the model obtained by retraining on the subset has a better accuracy. A bunch of heuristic approaches are proposed, however, none of them are guaranteed to solve this proble…

QUEST: An Efficient Query Evaluation Scheme Towards Scan-Intensive Cross-Model Analysis Open

Jianfeng Huang, Dongjing Miao, Xin Liu · 2023

Computer science Mathematics

Modern data-driven applications require that databases support fast cross-model analytical queries. Achieving fast analytical queries in a database system is challenging since they are usually scan-intensive (i.e., they need to intensively…

GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data Open

Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, Dongjing Miao , et al. · 2023

Computer science Engineering

Given a dataset with incomplete data (e.g., missing values), training a machine learning model over the incomplete data requires two steps. First, it requires a data-effective step that cleans the data in order to improve the data quality …

Computing All Restricted Skyline Probabilities on Uncertain Datasets Open

Xiangyu Gao, Jianzhong Li, Dongjing Miao · 2023

Computer science

Restricted skyline (rskyline) query is widely used in multi-criteria decision making. It generalizes the skyline query by additionally considering a set of personalized scoring functions F. Since uncertainty is inherent in datasets for mul…

Random-Order Enumeration for Self-Reducible NP-Problems Open

Pengyu Chen, Dongjing Miao, Weitian Tong, Zizheng Guo, Jianzhong Li , et al. · 2023

Mathematics Computer science Physics

In plenty of data analysis tasks, a basic and time-consuming process is to produce a large number of solutions and feed them into downstream processing. Various enumeration algorithms have been developed for this purpose. An enumeration al…

Computational Complexity And Algorithms For Dirty Data Evaluation And Repairing Open

Dongjing Miao · 2022

Computer science

In this dissertation, we study the dirty data evaluation and repairing problem in relational database. Dirty data is usually inconsistent, inaccurate, incomplete and stale. Existing methods and theories of consistency describe using integr…

Sublinear-time Reductions for Big Data Computing Open

Xiangyu Gao, Jianzhong Li, Dongjing Miao · 2021

Computer science Mathematics Biology

With the rapid popularization of big data, the dichotomy between tractable and intractable problems in big data computing has been shifted. Sublinear time, rather than polynomial time, has recently been regarded as the new standard of trac…

Dynamic Approximate Maximum Independent Set on Massive Graphs Open

Xiangyu Gao, Jianzhong Li, Dongjing Miao · 2020

Computer science Mathematics

Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide spectrum of fields. Since graphs in many applications are changing frequently over time, the problem o…

Dynamic Near Maximum Independent Set with Time Independent of Graph Size. Open

Xiangyu Gao, Jianzhong Li, Dongjing Miao, Xianmin Liu · 2020

Computer science Mathematics

Maximum Independent Set ({MaxIS}) problem is a fundamental problem in graph theory, which is NP-hard. Since the underlying graphs are always changing in numerous applications, computing a {MaxIS} over dynamic graphs has received increasing…

Fully Dynamic Approximate Maximum Independent Set on Massive Graphs Open

Xiangyu Gao, Jianzhong Li, Dongjing Miao · 2020

Computer science Mathematics Economics

Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide range of areas such as social network analysis, graphical information systems and coding theory. Since…

Complexity and Efficient Algorithms for Data Inconsistency Evaluating and Repairing Open

Dongjing Miao, Zhipeng Cai, Jianzhong Li, Xiangyu Gao, Xianmin Liu · 2020

Mathematics Computer science Physics

Data inconsistency evaluating and repairing are major concerns in data quality management. As the basic computing task, optimal subset repair is not only applied for cost estimation during the progress of database repairing, but also direc…

Recognizing the Tractability in Big Data Computing Open

Xiangyu Gao, Jianzhong Li, Dongjing Miao, Xianmin Liu · 2019

Computer science Mathematics

Due to the limitation on computational power of existing computers, the polynomial time does not works for identifying the tractable problems in big data computing. This paper adopts the sublinear time as the new tractable standard to reco…

The hardness of resilience for nested aggregation query Open

Dongjing Miao, Jiguo Yu, Zhipeng Cai · 2019

Computer science Mathematics Physics

Resilience problem is defined on a database d, given a boolean query q where q(d) is initially true, and an integer k, it is to find the tuple set d′ of smallest size such that the query result q(d∖d′) becomes false. As a potential explana…

Dongjing Miao YOU? Author Swipe