Bryan Wilder
YOU?
Author Swipe
View article: Real-time forecasting of data revisions in epidemic surveillance streams
Real-time forecasting of data revisions in epidemic surveillance streams Open
Epidemic data streams undergo frequent revisions due to reporting delays (“backfill”) and other factors. Relying on tentative surveillance values can seriously degrade the quality of situational awareness, forecasting accuracy and decision…
View article: Predicting Language Models' Success at Zero-Shot Probabilistic Prediction
Predicting Language Models' Success at Zero-Shot Probabilistic Prediction Open
Recent work has investigated the capabilities of large language models (LLMs) as zero-shot models for generating individual-level characteristics (e.g., to serve as risk models or augment survey datasets). However, when should a user have …
View article: Valid Inference with Imperfect Synthetic Data
Valid Inference with Imperfect Synthetic Data Open
Predictions and generations from large language models are increasingly being explored as an aid in limited data regimes, such as in computational social science and human subjects research. While prior technical work has mainly explored t…
View article: Bridging Prediction and Intervention Problems in Social Systems
Bridging Prediction and Intervention Problems in Social Systems Open
Many automated decision systems (ADS) are designed to solve prediction problems -- where the goal is to learn patterns from a sample of the population and apply them to individuals from the same population. In reality, these prediction sys…
View article: Can LLMs Reconcile Knowledge Conflicts in Counterfactual Reasoning
Can LLMs Reconcile Knowledge Conflicts in Counterfactual Reasoning Open
Large Language Models have been shown to contain extensive world knowledge in their parameters, enabling impressive performance on many knowledge intensive tasks. However, when deployed in novel settings, LLMs often encounter situations wh…
View article: Dependent Randomized Rounding for Budget Constrained Experimental Design
Dependent Randomized Rounding for Budget Constrained Experimental Design Open
Policymakers in resource-constrained settings require experimental designs that satisfy strict budget limits while ensuring precise estimation of treatment effects. We propose a framework that applies a dependent randomized rounding proced…
View article: An AI-Based Public Health Data Monitoring System
An AI-Based Public Health Data Monitoring System Open
Public health experts need scalable approaches to monitor large volumes of health data (e.g., cases, hospitalizations, deaths) for outbreaks or data quality issues. Traditional alert-based monitoring systems struggle with modern public hea…
View article: Explaining Concept Shift with Interpretable Feature Attribution
Explaining Concept Shift with Interpretable Feature Attribution Open
Regardless the amount of data a machine learning (ML) model is trained on, there will inevitably be data that differs from their training set, lowering model performance. Concept shift occurs when the distribution of labels conditioned on …
View article: Real-time Forecasting of Data Revisions in Epidemic Surveillance Streams
Real-time Forecasting of Data Revisions in Epidemic Surveillance Streams Open
Epidemic data streams undergo frequent revisions due to reporting delays (“backfill”) and other factors. Relying on tentative surveillance values can seriously degrade the quality of situational awareness, forecasting accuracy and decision…
View article: Federated epidemic surveillance
Federated epidemic surveillance Open
Epidemic surveillance is a challenging task, especially when crucial data is fragmented across institutions and data custodians are unable or unwilling to share it. This study aims to explore the feasibility of a simple federated surveilla…
View article: Reinforcement learning with combinatorial actions for coupled restless bandits
Reinforcement learning with combinatorial actions for coupled restless bandits Open
Reinforcement learning (RL) has increasingly been applied to solve real-world planning problems, with progress in handling large state spaces and time horizons. However, a key bottleneck in many domains is that RL methods cannot accommodat…
View article: Nowcasting reported covid-19 hospitalizations using de-identified, aggregated medical insurance claims data
Nowcasting reported covid-19 hospitalizations using de-identified, aggregated medical insurance claims data Open
We propose, implement, and evaluate a method for nowcasting the daily number of new COVID-19 hospitalizations, at the level of individual US states, based on de-identified, aggregated medical insurance claims data. Our analysis proceeds un…
View article: Expert Routing with Synthetic Data for Continual Learning
Expert Routing with Synthetic Data for Continual Learning Open
In many real-world settings, regulations and economic incentives permit the sharing of models but not data across institutional boundaries. In such scenarios, practitioners might hope to adapt models to new domains, without losing performa…
View article: Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources
Comparing Targeting Strategies for Maximizing Social Welfare with Limited Resources Open
Machine learning is increasingly used to select which individuals receive limited-resource interventions in domains such as human services, education, development, and more. However, it is often not apparent what the right quantity is for …
View article: Failure Modes of LLMs for Causal Reasoning on Narratives
Failure Modes of LLMs for Causal Reasoning on Narratives Open
The ability to robustly identify causal relationships is essential for autonomous decision-making and adaptation to novel scenarios. However, accurately inferring causal structure requires integrating both world knowledge and abstract logi…
View article: Accounting for Missing Covariates in Heterogeneous Treatment Estimation
Accounting for Missing Covariates in Heterogeneous Treatment Estimation Open
Many applications of causal inference require using treatment effects estimated on a study population to make decisions in a separate target population. We consider the challenging setting where there are covariates that are observed in th…
View article: Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects
Data-driven Design of Randomized Control Trials with Guaranteed Treatment Effects Open
Randomized controlled trials (RCTs) can be used to generate guarantees on treatment effects. However, RCTs often spend unnecessary resources exploring sub-optimal treatments, which can reduce the power of treatment guarantees. To address t…
View article: Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification
Utility-Directed Conformal Prediction: A Decision-Aware Framework for Actionable Uncertainty Quantification Open
Interest has been growing in decision-focused machine learning methods which train models to account for how their predictions are used in downstream optimization problems. Doing so can often improve performance on subsequent decision prob…
View article: Decision-Focused Evaluation of Worst-Case Distribution Shift
Decision-Focused Evaluation of Worst-Case Distribution Shift Open
Distribution shift is a key challenge for predictive models in practice, creating the need to identify potentially harmful shifts in advance of deployment. Existing work typically defines these worst-case shifts as ones that most degrade t…
View article: Preliminary Study of the Impact of AI-Based Interventions on Health and Behavioral Outcomes in Maternal Health Programs
Preliminary Study of the Impact of AI-Based Interventions on Health and Behavioral Outcomes in Maternal Health Programs Open
Automated voice calls are an effective method of delivering maternal and child health information to mothers in underserved communities. One method to fight dwindling listenership is through an intervention in which health workers make liv…
View article: Predicting first time depression onset in pregnancy: applying machine learning methods to patient-reported data
Predicting first time depression onset in pregnancy: applying machine learning methods to patient-reported data Open
Purpose To develop a machine learning algorithm, using patient-reported data from early pregnancy, to predict later onset of first time moderate-to-severe depression. Methods A sample of 944 U.S. patient participants from a larger longitud…
View article: Leaving the Nest: Going beyond Local Loss Functions for Predict-Then-Optimize
Leaving the Nest: Going beyond Local Loss Functions for Predict-Then-Optimize Open
Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can we use the structure of a decision-making task to tailor ML models for that sp…
View article: Outlier Ranking for Large-Scale Public Health Data
Outlier Ranking for Large-Scale Public Health Data Open
Disease control experts inspect public health data streams daily for outliers worth investigating, like those corresponding to data quality issues or disease outbreaks. However, they can only examine a few of the thousands of maximally-tie…
View article: Auditing Fairness under Unobserved Confounding
Auditing Fairness under Unobserved Confounding Open
Many definitions of fairness or inequity involve unobservable causal quantities that cannot be directly estimated without strong assumptions. For instance, it is particularly difficult to estimate notions of fairness that rely on hard-to-m…
View article: Outlier Ranking in Large-Scale Public Health Streams
Outlier Ranking in Large-Scale Public Health Streams Open
Disease control experts inspect public health data streams daily for outliers worth investigating, like those corresponding to data quality issues or disease outbreaks. However, they can only examine a few of the thousands of maximally-tie…
View article: Nowcasting Reported COVID-19 Hospitalizations Using De-Identified, Aggregated Medical Insurance Claims Data
Nowcasting Reported COVID-19 Hospitalizations Using De-Identified, Aggregated Medical Insurance Claims Data Open
We propose, implement, and evaluate a method for nowcasting the daily number of new COVID-19 hospitalizations, at the level of individual US states, based on de-identified, aggregated medical insurance claims data. Our analysis proceeds un…
View article: Complex Contagion Influence Maximization: A Reinforcement Learning Approach
Complex Contagion Influence Maximization: A Reinforcement Learning Approach Open
In influence maximization (IM), the goal is to find a set of seed nodes in a social network that maximizes the influence spread. While most IM problems focus on classical influence cascades (e.g., Independent Cascade and Linear Threshold) …
View article: Computationally Assisted Quality Control for Public Health Data Streams
Computationally Assisted Quality Control for Public Health Data Streams Open
Irregularities in public health data streams (like COVID-19 Cases) hamper data-driven decision-making for public health stakeholders. A real-time, computer-generated list of the most important, outlying data points from thousands of public…