Explanipedia

Multi-IaC-Eval: Benchmarking Cloud Infrastructure as Code Across Multiple Formats Open

Susan B. Davidson, Li Sun, Bharat Bhasker, Laurent Callot, Anoop Deoras · 2025

Infrastructure as Code (IaC) is fundamental to modern cloud computing, enabling teams to define and manage infrastructure through machine-readable configuration files. However, different cloud service providers utilize diverse IaC formats.…

Developing Domain-Specific Language Models for Legal Terminology Using the Vosk Speech Recognition Toolkit Open

Owen Graham, Susan B. Davidson · 2025

This study investigates the creation of domain-specific language models for legal terminology using the Vosk speech recognition toolkit. As the legal field increasingly adopts technology for transcription and documentation, the need for ac…

SHARQ: Explainability Framework for Association Rules on Relational Data Open

Hadar Ben-Efraim, Susan B. Davidson, Amit Somech · 2024

Computer science Psychology

Association rules are an important technique for gaining insights over large relational datasets consisting of tuples of elements (i.e. attribute-value pairs). However, it is difficult to explain the relative importance of data elements wi…

ASQP-RL Demo: Learning Approximation Sets for Exploratory Queries Open

Susan B. Davidson, Tova Milo, Kathy Razmadze, Gal Zeevi · 2024

Computer science Sociology

We demonstrate the Approximate Selection Query Processing (ASQP-RL) system, which uses Reinforcement Learning to select a subset of a large external dataset to process locally in a notebook during data exploration. Given a query workload o…

Learning Approximation Sets for Exploratory Queries Open

Susan B. Davidson, Tova Milo, Kathy Razmadze, Gal Zeevi · 2024

Computer science Sociology

In data exploration, executing complex non-aggregate queries over large databases can be time-consuming. Our paper introduces a novel approach to address this challenge, focusing on finding an optimized subset of data, referred to as the a…

Credit distribution in relational scientific databases Open

Dennis Dosso, Susan B. Davidson, Gianmaria Silvello · 2022

Computer science Mathematics

Digital data is a basic form of research product for which citation, and the generation of credit or recognition for authors, are still not well understood. The notion of data credit has therefore recently emerged as a new measure, defined…

Selecting Sub-tables for Data Exploration Open

Kathy Razmadze, Yael Amsterdamer, Amit Somech, Susan B. Davidson, Tova Milo · 2022

Computer science

We present a framework for creating small, informative sub-tables of large data tables to facilitate the first step of data science: data exploration. Given a large data table table T, the goal is to create a sub-table of small, fixed dime…

Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients Open

Lingjiao Chen, Leshang Chen, Hongyi Wang, Susan B. Davidson, Edgar Dobriban · 2021

Computer science Chemistry History

There has been a growing need to provide Byzantine-resilience in distributed model training. Existing robust distributed learning algorithms focus on developing sophisticated robust aggregators at the parameter servers, but pay less attent…

Chef: a cheap and fast pipeline for iteratively cleaning label uncertainties Open

Yinjun Wu, James Weimer, Susan B. Davidson · 2021

Computer science Chemistry Philosophy

High-quality labels are expensive to obtain for many machine learning tasks, such as medical image classification tasks. Therefore, probabilistic (weak) labels produced by weak supervision tools are used to seed a process in which influent…

Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series Open

Yinjun Wu, Jingchao Ni, Wei Cheng, Bo Zong, Dongjin Song , et al. · 2021

Computer science Physics Biology

Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS’s …

Dynamic Gaussian Mixture based Deep Generative Model For Robust Forecasting on Sparse Multivariate Time Series Open

Yinjun Wu, Jingchao Ni, Wei Cheng, Bo Zong, Dongjin Song , et al. · 2021

Computer science Physics Philosophy

Forecasting on sparse multivariate time series (MTS) aims to model the predictors of future values of time series given their incomplete past, which is important for many emerging applications. However, most existing methods process MTS's …

Web-based access to data for >600 disinfection by-products via the EPA CompTox Chemicals Dashboard Open

Antony Williams, Chris Grulke, Susan B. Davidson · 2021

Computer science Chemistry Engineering

The US EPA’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) is a freely available web-based application providing access to data for ~900,000 chemical substances, the majority of these represented as chemical structures. T…

DeltaGrad: Rapid retraining of machine learning models Open

Yinjun Wu, Edgar Dobriban, Susan B. Davidson · 2020

Computer science Chemistry Business

Machine learning models are not static and may need to be retrained on slightly changed datasets, for instance, with the addition or deletion of a set of data points. This has many applications, including privacy, robustness, bias reductio…

DeltaGrad: Rapid retraining of machine learning models Open

Yinjun Wu, Edgar Dobriban, Susan B. Davidson · 2020

Computer science Business

Machine learning models are not static and may need to be retrained on slightly changed datasets, for instance, with the addition or deletion of a set of data points. This has many applications, including privacy, robustness, bias reductio…

Data Provenance for Attributes: Attribute Lineage. Open

Dennis Dosso, Susan B. Davidson, Gianmaria Silvello · 2020

Computer science Geography Biology

In this paper we define a new kind of data provenance for database management systems, called attribute lineage for SPJRU queries, building on previous works on data provenance for tuples. \n \nWe take inspiration from the classical lineag…

Alawinia/Provclustering: Discovering Similar Workflows Via Provenance Clustering Open

Abdussalam Alawini, Leshang Chen, Stephen Fisher, Susan B. Davidson, Junhyong Kim · 2018

Computer science Biology

Several workflow management systems and scripting languages have adopted provenance tracking, yet many researchers choose to manually capture or instrument their processing scripts to write provenance information to files. The Next Generat…

Alawinia/Provclustering: Discovering Similar Workflows Via Provenance Clustering Open

Abdussalam Alawini, Leshang Chen, Stephen Fisher, Susan B. Davidson, Junhyong Kim · 2018

Computer science Geology

Several workflow management systems and scripting languages have adopted provenance tracking, yet many researchers choose to manually capture or instrument their processing scripts to write provenance information to files. The Next Generat…

DBLP-NSF dataset SQL dump Open

Abdussalam Alawini, Susan B. Davidson, Shivendra Kumar Pandey, Gianmaria Silvello, Yinjun Wu · 2018

Computer science

This dataset is called DBLP-NSF, which is a Postgresql database dump file that connects computer science publications—extracted from DBLP—to their NSF funding grants—extracted from the National Science Foundation grant dataset. This datase…

Automating data citation: the eagle-i experience. Open

Abdussalam Alawini, Leshang Chen, Susan B. Davidson, Natan Portilho Da Silva, Gianmaria Silvello · 2017

Computer science Biology

Data citation is of growing concern for owners of curated databases, who wish to give credit to the contributors and curators responsible for portions of the dataset and enable the data retrieved by a query to be later examined. While seve…

Data Citation Open

Susan B. Davidson, Peter Buneman, Daniel Deutch, Tova Milo, Gianmaria Silvello · 2017

Computer science Geology

Data citation is an interesting computational challenge, whose solution draws on several well-studied problems in database theory: query answering using views, and provenance. We describe the problem, suggest an approach to its solution, a…

Why data citation is a computational problem Open

Peter Buneman, Susan B. Davidson, James Frew · 2016

Computer science

Using database views to define citable units is the key to specifying and generating citations to data.

PROX: Approximated Summarization of Data Provenance Open

Eleanor Ainy, Pierre Bourhis, Susan B. Davidson, Daniel Deutch, Tova Milo · 2016

Computer science Geology

Many modern applications involve collecting large amounts of data from multiple sources, and then aggregating and manipulating it in intricate ways. The complexity of such applications, combined with the size of the collected data, makes i…

Susan B. Davidson YOU? Author Swipe