Babak Salimi
YOU?
Author Swipe
View article: KAIROS: Scalable Model-Agnostic Data Valuation
KAIROS: Scalable Model-Agnostic Data Valuation Open
Training data increasingly shapes not only model accuracy but also regulatory compliance and market valuation of AI assets. Yet existing valuation methods remain inadequate: model-based techniques depend on a single fitted model and inheri…
View article: Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems
Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems Open
Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on histori…
View article: Using Causal Inference to Explore Government Policy Impact on Computer Usage
Using Causal Inference to Explore Government Policy Impact on Computer Usage Open
We explore the causal relationship between COVID-19 lockdown policies and changes in personal computer usage. In particular, we examine how lockdown policies affected average daily computer usage, as well as how it affected usage patterns …
View article: A Lightweight Method to Disrupt Memorized Sequences in LLM
A Lightweight Method to Disrupt Memorized Sequences in LLM Open
As language models scale, their performance improves dramatically across a wide range of tasks, but so does their tendency to memorize and regurgitate parts of their training data verbatim. This tradeoff poses serious legal, ethical, and s…
View article: Scalable Out-of-distribution Robustness in the Presence of Unobserved Confounders
Scalable Out-of-distribution Robustness in the Presence of Unobserved Confounders Open
We consider the task of out-of-distribution (OOD) generalization, where the distribution shift is due to an unobserved confounder ($Z$) affecting both the covariates ($X$) and the labels ($Y$). This confounding introduces heterogeneity in …
View article: Learning from Uncertain Data: From Possible Worlds to Possible Models
Learning from Uncertain Data: From Possible Worlds to Possible Models Open
We introduce an efficient method for learning linear models from uncertain data, where uncertainty is represented as a set of possible variations in the data, leading to predictive multiplicity. Our approach leverages abstract interpretati…
View article: Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation
Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation Open
Conditional independence (CI) constraints are critical for defining and evaluating fairness in machine learning, as well as for learning unconfounded or causal representations. Traditional methods for ensuring fairness either blindly learn…
View article: Graph Machine Learning based Doubly Robust Estimator for Network Causal Effects
Graph Machine Learning based Doubly Robust Estimator for Network Causal Effects Open
We address the challenge of inferring causal effects in social network data. This results in challenges due to interference -- where a unit's outcome is affected by neighbors' treatments -- and network-induced confounding factors. While th…
View article: OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport
OTClean: Data Cleaning for Conditional Independence Violations using Optimal Transport Open
Ensuring Conditional Independence (CI) constraints is pivotal for the development of fair and trustworthy machine learning models. In this paper, we introduce \sys, a framework that harnesses optimal transport theory for data repair under …
View article: NEXUS: On Explaining Confounding Bias
NEXUS: On Explaining Confounding Bias Open
When analyzing large datasets, analysts are often interested in the explanations for unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major challenge that…
View article: Causal Data Integration
Causal Data Integration Open
Causal inference is fundamental to empirical scientific discoveries in natural and social sciences; however, in the process of conducting causal inference, data management problems can lead to false discoveries. Two such problems are (i) n…
View article: Consistent Range Approximation for Fair Predictive Modeling
Consistent Range Approximation for Fair Predictive Modeling Open
This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximati…
View article: On Explaining Confounding Bias
On Explaining Confounding Bias Open
When analyzing large datasets, analysts are often interested in the explanations for surprising or unexpected results produced by their queries. In this work, we focus on aggregate SQL queries that expose correlations in the data. A major …
View article: Combining Counterfactuals With Shapley Values To Explain Image Models
Combining Counterfactuals With Shapley Values To Explain Image Models Open
With the widespread use of sophisticated machine learning models in sensitive applications, understanding their decision-making has become an essential task. Models trained on tabular data have witnessed significant progress in explanation…
View article: Interpretable Data-Based Explanations for Fairness Debugging
Interpretable Data-Based Explanations for Fairness Debugging Open
A wide variety of fairness metrics and eXplainable Artificial Intelligence (XAI) approaches have been proposed in the literature to identify bias in machine learning models that are used in critical real-life contexts. However, merely repo…
View article: Explainable AI: Foundations, Applications, Opportunities for Data Management Research
Explainable AI: Foundations, Applications, Opportunities for Data Management Research Open
Algorithmic decision-making systems are successfully being adopted in a wide range of domains for diverse tasks. While the potential benefits of algorithmic decision-making are many, the importance of trusting these systems has only recent…
View article: Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces
Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces Open
Despite their high accuracies, modern complex image classifiers cannot be trusted for sensitive tasks due to their unknown decision-making process and potential biases. Counterfactual explanations are very effective in providing transparen…
View article: Generating Interpretable Data-Based Explanations for Fairness Debugging using Gopher
Generating Interpretable Data-Based Explanations for Fairness Debugging using Gopher Open
Machine learning (ML) models, while increasingly being used to make life-altering decisions, are known to reinforce systemic bias and discrimination. Consequently, practitioners and model developers need tools to facilitate debugging for b…
View article: HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach
HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach Open
What-if (provisioning for an update to a database) and how-to (how to modify the database to achieve a goal) analyses provide insights to users who wish to examine hypothetical scenarios without making actual changes to a database and ther…
View article: Interpretable Data-Based Explanations for Fairness Debugging
Interpretable Data-Based Explanations for Fairness Debugging Open
A wide variety of fairness metrics and eXplainable Artificial Intelligence (XAI) approaches have been proposed in the literature to identify bias in machine learning models that are used in critical real-life contexts. However, merely repo…
View article: Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals
Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals Open
There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opaqueness of AI-based decision-making systems, allowing humans to scrutinize and trust them. Prior work in this context ha…
View article: Heterogeneous Treatment Effects in Social Networks
Heterogeneous Treatment Effects in Social Networks Open
We study treatment effect modifiers for causal analysis in a social network, where neighbors' characteristics or network structure may affect the outcome of a unit, and the goal is to identify sub-populations with varying treatment effects…
View article: Detecting Treatment Effect Modifiers in Social Networks.
Detecting Treatment Effect Modifiers in Social Networks. Open
We study treatment effect modifiers for causal analysis in a social network, where neighbors' characteristics or network structure may affect the outcome of a unit, and the goal is to identify sub-populations with varying treatment effects…
View article: Explaining Black-Box Algorithms Using Probabilistic Contrastive\n Counterfactuals
Explaining Black-Box Algorithms Using Probabilistic Contrastive\n Counterfactuals Open
There has been a recent resurgence of interest in explainable artificial\nintelligence (XAI) that aims to reduce the opaqueness of AI-based\ndecision-making systems, allowing humans to scrutinize and trust them. Prior\nwork in this context…
View article: Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification
Through the Data Management Lens: Experimental Analysis and Evaluation of Fair Classification Open
Classification, a heavily-studied data-driven machine learning task, drives an increasing number of prediction systems involving critical human decisions such as loan approval and criminal risk assessment. However, classifiers often demons…
View article: Mining Approximate Acyclic Schemes from Relations
Mining Approximate Acyclic Schemes from Relations Open
Acyclic schemes have numerous applications in databases and in machine learning, such as improved design, more efficient storage, and increased performance for queries and machine learning algorithms. Multivalued dependencies (MVDs) are th…
View article: Causal Relational Learning
Causal Relational Learning Open
Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials ;…
View article: Causal Relational Learning
Causal Relational Learning Open
Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; …
View article: Mining Approximate Acyclic Schemes from Relations
Mining Approximate Acyclic Schemes from Relations Open
Acyclic schemes have numerous applications in databases and in machine learning, such as improved design, more efficient storage, and increased performance for queries and machine learning algorithms. Multivalued dependencies (MVDs) are th…
View article: Data Management for Causal Algorithmic Fairness
Data Management for Causal Algorithmic Fairness Open
Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflects discrimination, suggesting a data management problem. In thi…