Explanipedia

Approximating Opaque Top-k Queries Open

Jiwon Chang, Fatemeh Nargesian · 2025

Combining query answering and data science workloads has become prevalent. An important class of such workloads is top-k queries with a scoring function implemented as an opaque UDF - a black box whose internal structure and scores on the …

Causal Dataset Discovery with Large Language Models Open

Junfei Liu, Shaotong Sun, Fatemeh Nargesian · 2024

Computer science

Causal data discovery is crucial in scientific research by uncovering causal links among a variety of observed variables. Causal dataset discovery is the task of identifying datasets that contain columns that have causal relationships with…

PLUTUS: Understanding Data Distribution Tailoring for Machine Learning Open

Jiwon Chang, Christina Dionysio, Fatemeh Nargesian, Matthias Böehm · 2024

Computer science Mathematics

Existing data debugging tools allow users to trace model performance problems all the way to the data by efficiently identifying slices (conjunctions of features and values) for which a trained model performs significantly worse than the e…

TrustLOG: The Second Workshop on Trustworthy Learning on Graphs Open

Jingrui He, Jian Kang, Fatemeh Nargesian, Haohui Wang, An Zhang , et al. · 2024

Computer science

Learning on graphs (LOG) has a profound impact on various high-impact domains, such as information retrieval, social network analysis, computational chemistry and transportation. Despite decades of theoretical development, algorithmic adva…

FairEM360: A Suite for Responsible Entity Matching Open

Nima Shahbazi, Mahdi Erfanian, Abolfazl Asudeh, Fatemeh Nargesian, Divesh Srivastava · 2024

Computer science Business Political science

Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced…

Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching Open

Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, Divesh Srivastava · 2023

Computer science Mathematics Economics

Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on the…

KOIOS: Top-k Semantic Overlap Set Search Open

Pranay Mundra, Jianhao Zhang, Fatemeh Nargesian, Nikolaus Augsten · 2023

Computer science Mathematics

We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to incre…

Sampling over Union of Joins Open

Yurong Liu, Yunlong, Xu Xu, Fatemeh Nargesian · 2023

Computer science Mathematics Geography

Data scientists often draw on multiple relational data sources for analysis. A standard assumption in learning and approximate query answering is that the data is a uniform and independent sample of the underlying distribution. To avoid th…

Pylon: Semantic Table Union Search in Data Lakes Open

Tianji Cong, Fatemeh Nargesian, H. V. Jagadish · 2023

Computer science Economics Political science

The large size and fast growth of data repositories, such as data lakes, has spurred the need for data discovery to help analysts find related data. The problem has become challenging as (i) a user typically does not know what datasets exi…

TSUBASA: Climate Network Construction on Historical and Real-Time Data Open

Yunlong Xu, Jinshu Liu, Fatemeh Nargesian · 2022

Computer science Engineering Geology

A climate network represents the global climate system by the interactions of a set of anomaly time-series. Network science has been applied on climate data to study the dynamics of a climate network. The core task and first step to enable…

Data Lake Organization Open

Fatemeh Nargesian, Ken Q. Pu, Bahar Ghadiri Bashardoost, Erkang Zhu, Renée J. Miller · 2022

Computer science Geology

We consider the problem of creating a navigation structure that allows a user to most effectively navigate a data lake. We define an organization as a graph that contains nodes representing sets of attributes within a data lake and edges i…

Tailoring data source distributions for fairness-aware data integration Open

Fatemeh Nargesian, Abolfazl Asudeh, H. V. Jagadish · 2021

Computer science Biology Political science

Data scientists often develop data sets for analysis by drawing upon sources of data available to them. A major challenge is to ensure that the data set used for analysis has an appropriate representation of relevant (demographic) groups: …

AWLCO: All-Window Length Co-Occurrence Open

Joshua Sobel, Noah Bertram, Chen Ding, Fatemeh Nargesian, Daniel Gildea · 2021

Computer science Biology

Analyzing patterns in a sequence of events has applications in text analysis, computer programming, and genomics research. In this paper, we consider the all-window-length analysis model which analyzes a sequence of events with respect to …

AWLCO: All-Window Length Co-Occurrence Open

Joshua Sobel, Noah Bertram, Chen Ding, Fatemeh Nargesian, Daniel Gildea · 2021

Mathematics Computer science Biology

Analyzing patterns in a sequence of events has applications in text analysis, computer programming, and genomics research. In this paper, we consider the all-window-length analysis model which analyzes a sequence of events with respect to …

Knowledge Translation: Extended Technical Report Open

Bahar Ghadiri Bashardoost, Renée J. Miller, Kelly Lyons, Fatemeh Nargesian · 2020

Computer science Chemistry

We introduce Kensho, a tool for generating mapping rules between two Knowledge Bases (KBs). To create the mapping rules, Kensho starts with a set of correspondences and enriches them with additional semantic information automatically ident…

JOSIE Open

Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller · 2019

Computer science Mathematics Engineering

We present a new solution for finding joinable tables in massive data lakes: given a table and one join column, find tables that can be joined with the given table on the largest number of distinct values. The problem can be formulated as …

Optimizing Organizations for Navigating Data Lakes. Open

Fatemeh Nargesian, Ken Q. Pu, Erkang Zhu, Bahar Ghadiri Bashardoost, Renée J. Miller · 2018

Computer science Geology

We consider the problem of creating a navigation structure that allows a user to most effectively navigate a data lake. We define an organization as a graph that contains nodes representing sets of attributes within a data lake and edges i…

Dataset Evolver: An Interactive Feature Engineering Notebook Open

Fatemeh Nargesian, Udayan Khurana, Tejaswini Pedapati, Horst Samulowitz, Deepak S. Turaga · 2018

Computer science Engineering Philosophy

We present DATASET EVOLVER, an interactive Jupyter notebook-based tool to support data scientists perform feature engineering for classification tasks. It provides users with suggestions on new features to construct, based on automated fea…

Learning Feature Engineering for Classification Open

Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, Deepak S. Turaga · 2017

Computer science Philosophy Chemistry

Feature engineering is the task of improving predictive modelling performance on a dataset by transforming its feature space. Existing approaches to automate this process rely on either transformed feature space exploration through evaluat…

LSH Ensemble: Internet-Scale Domain Search Open

Erkang Zhu, Fatemeh Nargesian, Ken Q. Pu, Renée J. Miller · 2016

Computer science Geography Mathematics

We study the problem of domain search where a domain is a set of distinct values from an unspecified universe. We use Jaccard set containment, defined as $|Q \cap X|/|Q|$, as the relevance measure of a domain $X$ to a query domain $Q$. Our…

Fatemeh Nargesian YOU? Author Swipe