Dongjing Miao
YOU?
Author Swipe
View article: Towards Efficient Random-Order Enumeration for Join Queries
Towards Efficient Random-Order Enumeration for Join Queries Open
In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful repr…
View article: Cost-effective Missing Value Imputation for Data-effective Machine Learning
Cost-effective Missing Value Imputation for Data-effective Machine Learning Open
Given a dataset with incomplete data (e.g., missing values), training a machine learning model over the incomplete data requires two steps. First, it requires a data-effective step that cleans the data in order to improve the data quality …
View article: An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem
An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem Open
The Maximum Minimal Cut Problem (MMCP), a NP-hard combinatorial optimization\n(CO) problem, has not received much attention due to the demanding and\nchallenging bi-connectivity constraint. Moreover, as a CO problem, it is also a\ndaunting…
View article: Data Debugging is NP-hard for Classifiers Trained with SGD
Data Debugging is NP-hard for Classifiers Trained with SGD Open
Data debugging is to find a subset of the training data such that the model obtained by retraining on the subset has a better accuracy. A bunch of heuristic approaches are proposed, however, none of them are guaranteed to solve this proble…
View article: QUEST: An Efficient Query Evaluation Scheme Towards Scan-Intensive Cross-Model Analysis
QUEST: An Efficient Query Evaluation Scheme Towards Scan-Intensive Cross-Model Analysis Open
Modern data-driven applications require that databases support fast cross-model analytical queries. Achieving fast analytical queries in a database system is challenging since they are usually scan-intensive (i.e., they need to intensively…
View article: GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data
GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data Open
Given a dataset with incomplete data (e.g., missing values), training a machine learning model over the incomplete data requires two steps. First, it requires a data-effective step that cleans the data in order to improve the data quality …
View article: Computing All Restricted Skyline Probabilities on Uncertain Datasets
Computing All Restricted Skyline Probabilities on Uncertain Datasets Open
Restricted skyline (rskyline) query is widely used in multi-criteria decision making. It generalizes the skyline query by additionally considering a set of personalized scoring functions F. Since uncertainty is inherent in datasets for mul…
View article: Random-Order Enumeration for Self-Reducible NP-Problems
Random-Order Enumeration for Self-Reducible NP-Problems Open
In plenty of data analysis tasks, a basic and time-consuming process is to produce a large number of solutions and feed them into downstream processing. Various enumeration algorithms have been developed for this purpose. An enumeration al…
View article: Computational Complexity And Algorithms For Dirty Data Evaluation And Repairing
Computational Complexity And Algorithms For Dirty Data Evaluation And Repairing Open
In this dissertation, we study the dirty data evaluation and repairing problem in relational database. Dirty data is usually inconsistent, inaccurate, incomplete and stale. Existing methods and theories of consistency describe using integr…
View article: Sublinear-time Reductions for Big Data Computing
Sublinear-time Reductions for Big Data Computing Open
With the rapid popularization of big data, the dichotomy between tractable and intractable problems in big data computing has been shifted. Sublinear time, rather than polynomial time, has recently been regarded as the new standard of trac…
View article: Dynamic Approximate Maximum Independent Set on Massive Graphs
Dynamic Approximate Maximum Independent Set on Massive Graphs Open
Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide spectrum of fields. Since graphs in many applications are changing frequently over time, the problem o…
View article: Dynamic Near Maximum Independent Set with Time Independent of Graph Size.
Dynamic Near Maximum Independent Set with Time Independent of Graph Size. Open
Maximum Independent Set ({MaxIS}) problem is a fundamental problem in graph theory, which is NP-hard. Since the underlying graphs are always changing in numerous applications, computing a {MaxIS} over dynamic graphs has received increasing…
View article: Fully Dynamic Approximate Maximum Independent Set on Massive Graphs
Fully Dynamic Approximate Maximum Independent Set on Massive Graphs Open
Computing a maximum independent set (MaxIS) is a fundamental NP-hard problem in graph theory, which has important applications in a wide range of areas such as social network analysis, graphical information systems and coding theory. Since…
View article: Complexity and Efficient Algorithms for Data Inconsistency Evaluating and Repairing
Complexity and Efficient Algorithms for Data Inconsistency Evaluating and Repairing Open
Data inconsistency evaluating and repairing are major concerns in data quality management. As the basic computing task, optimal subset repair is not only applied for cost estimation during the progress of database repairing, but also direc…
View article: Recognizing the Tractability in Big Data Computing
Recognizing the Tractability in Big Data Computing Open
Due to the limitation on computational power of existing computers, the polynomial time does not works for identifying the tractable problems in big data computing. This paper adopts the sublinear time as the new tractable standard to reco…
View article: The hardness of resilience for nested aggregation query
The hardness of resilience for nested aggregation query Open
Resilience problem is defined on a database d, given a boolean query q where q(d) is initially true, and an integer k, it is to find the tuple set d′ of smallest size such that the query result q(d∖d′) becomes false. As a potential explana…