Michael Mathioudakis
YOU?
Author Swipe
View article: Cost-aware retraining for machine learning
Cost-aware retraining for machine learning Open
Retraining a machine learning (ML) model is essential for maintaining its performance as the data change over time. However, retraining is also costly, as it typically requires re-processing the entire dataset. As a result, a trade-off ari…
View article: Optimizing a Data Science System for Text Reuse Analysis
Optimizing a Data Science System for Text Reuse Analysis Open
Text reuse is a methodological element of fundamental importance in humanities research: pieces of text that re-appear across different documents, verbatim or paraphrased, provide invaluable information about the historical spread and evol…
View article: Cost-Effective Retraining of Machine Learning Models
Cost-Effective Retraining of Machine Learning Models Open
It is important to retrain a machine learning (ML) model in order to maintain its performance as the data changes over time. However, this can be costly as it usually requires processing the entire dataset again. This creates a trade-off b…
View article: Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways (Extended Abstract)
Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways (Extended Abstract) Open
Recommender systems typically suggest to users content similar to what they consumed in the past. A user, if happening to be exposed to strongly polarized content, might be steered towards more and more radicalized content by subsequent re…
View article: Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models
Fair Max–Min Diversity Maximization in Streaming and Sliding-Window Models Open
Diversity maximization is a fundamental problem with broad applications in data summarization, web search, and recommender systems. Given a set X of n elements, the problem asks for a subset S of k≪n elements with maximum diversity, as qua…
View article: Graph Summarization via Node Grouping: A Spectral Algorithm
Graph Summarization via Node Grouping: A Spectral Algorithm Open
Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is mi…
View article: Max-Min Diversification with Fairness Constraints: Exact and Approximation Algorithms
Max-Min Diversification with Fairness Constraints: Exact and Approximation Algorithms Open
Diversity maximization aims to select a diverse and representative subset of items from a large dataset. It is a fundamental optimization task that finds applications in data summarization, feature selection, web search, recommender system…
View article: Robustness of Sketched Linear Classifiers to Adversarial Attacks
Robustness of Sketched Linear Classifiers to Adversarial Attacks Open
Linear classifiers are well-known to be vulnerable to adversarial attacks: they may predict incorrect labels for input data that are adversarially modified with small perturbations. However, this phenomenon has not been properly understood…
View article: Designing Affirmative Action Policies under Uncertainty
Designing Affirmative Action Policies under Uncertainty Open
We study university admissions under a centralized system that uses grades and standardized test scores to match applicants to university programs. In the context of this system, we explore affirmative action policies that seek to narrow t…
View article: Scalably Using Node Attributes and Graph Structure for Node Classification
Scalably Using Node Attributes and Graph Structure for Node Classification Open
The task of node classification concerns a network where nodes are associated with labels, but labels are known only for some of the nodes. The task consists of inferring the unknown labels given the known node labels, the structure of the…
View article: Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study
Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study Open
Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine effectiveness and efficiency (i.e., they should effect…
View article: Streaming Algorithms for Diversity Maximization with Fairness Constraints
Streaming Algorithms for Diversity Maximization with Fairness Constraints Open
Diversity maximization is a fundamental problem with wide applications in\ndata summarization, web search, and recommender systems. Given a set $X$ of $n$\nelements, it asks to select a subset $S$ of $k \\ll n$ elements with maximum\n\\emp…
View article: Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways
Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways Open
Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards mor…
View article: Workload-Aware Materialization of Junction Trees
Workload-Aware Materialization of Junction Trees Open
Bayesian networks are popular probabilistic models that capture the conditional dependencies among a set of variables. Inference in Bayesian networks is a fundamental task for answering probabilistic queries over a subset of variables in t…
View article: Workload-Aware Materialization of Junction Trees
Workload-Aware Materialization of Junction Trees Open
Bayesian networks are popular probabilistic models that capture the conditional dependencies among a set of variables. Inference in Bayesian networks is a fundamental task for answering probabilistic queries over a subset of variables in t…
View article: Minimum Coresets for Maxima Representation of Multidimensional Data
Minimum Coresets for Maxima Representation of Multidimensional Data Open
Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization tech…
View article: Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks
Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks Open
Bayesian networks are general, well-studied probabilistic models that capture dependencies among a set of variables. Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we prop…
View article: Joint Use of Node Attributes and Proximity for Semi-Supervised Classification on Graphs
Joint Use of Node Attributes and Proximity for Semi-Supervised Classification on Graphs Open
The task of node classification is to infer unknown node labels, given the labels for some of the nodes along with the network structure and other node attributes. Typically, approaches for this task assume homophily, whereby neighboring n…
View article: Intersectional Affirmative Action Policies for Top-k Candidates Selection
Intersectional Affirmative Action Policies for Top-k Candidates Selection Open
We study the problem of selecting the top-k candidates from a pool of applicants, where each candidate is associated with a score indicating his/her aptitude. Depending on the specific scenario, such as job search or college admissions, th…
View article: GRMR: Generalized Regret-Minimizing Representatives
GRMR: Generalized Regret-Minimizing Representatives Open
Extracting a small subset of representative tuples from a large database is an important task in multi-criteria decision making. The regret-minimizing set (RMS) problem is recently proposed for representative discovery from databases. Spec…
View article: Towards Data-Driven Affirmative Action Policies under Uncertainty
Towards Data-Driven Affirmative Action Policies under Uncertainty Open
In this paper, we study university admissions under a centralized system that uses grades and standardized test scores to match applicants to university programs. We consider affirmative action policies that seek to increase the number of …
View article: Affirmative action policies for top-k candidates selection
Affirmative action policies for top-k candidates selection Open
We consider the problem of designing affirmative action policies for\nselecting the top-k candidates from a pool of applicants. We assume that for\neach candidate we have socio-demographic attributes and a series of variables\nthat serve a…
View article: Affirmative Action Policies for Top-k Candidates Selection, With an Application to the Design of Policies for University Admissions
Affirmative Action Policies for Top-k Candidates Selection, With an Application to the Design of Policies for University Admissions Open
We consider the problem of designing affirmative action policies for selecting the top-k candidates from a pool of applicants. We assume that for each candidate we have socio-demographic attributes and a series of variables that serve as i…
View article: Query the model: precomputations for efficient inference with Bayesian Networks
Query the model: precomputations for efficient inference with Bayesian Networks Open
Variable Elimination is a fundamental algorithm for probabilistic inference over Bayesian networks. In this paper, we propose a novel materialization method for Variable Elimination, which can lead to significant efficiency gains when answ…
View article: Reducing Controversy by Connecting Opposing Views
Reducing Controversy by Connecting Opposing Views Open
Controversial issues often split the population into groups with opposing views. When such issues emerge on social media, we often observe the creation of "echo chambers," i.e., situations where like-minded people reinforce each other’s op…
View article: Markov Chain Monitoring
Markov Chain Monitoring Open
In networking applications, one often wishes to obtain estimates about the number of objects at different parts of the network (e.g., the number of cars at an intersection of a road network or the number of packets expected to reach a node…
View article: Markov Chain Monitoring
Markov Chain Monitoring Open
In networking applications, one often wishes to obtain estimates about the number of objects at different parts of the network (e.g., the number of cars at an intersection of a road network or the number of packets expected to reach a node…