Chengchun Shi
YOU?
Author Swipe
View article: PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing
PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing Open
Reinforcement learning (RL) aims to learn and evaluate a sequential decision rule, often referred to as a "policy", that maximizes the population-level benefit in an environment across possibly infinitely many time steps. However, the sequ…
View article: AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees
AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees Open
We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the ob…
View article: A Two-armed Bandit Framework for A/B Testing
A Two-armed Bandit Framework for A/B Testing Open
A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinfor…
View article: Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models Open
This paper studies reinforcement learning from human feedback (RLHF) for aligning large language models with human preferences. While RLHF has demonstrated promising results, many algorithms are highly sensitive to misspecifications in the…
View article: Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation Open
This paper studies off-policy evaluation (OPE) in reinforcement learning with a focus on behavior policy estimation for importance sampling. Prior work has shown empirically that estimating a history-dependent behavior policy can lead to l…
View article: Semi-pessimistic Reinforcement Learning
Semi-pessimistic Reinforcement Learning Open
Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected data. However, it faces challenges of distributional shift, where the learned policy may encounter unseen scenarios not covered in the offline data. Add…
View article: Statistics and AI: A Fireside Conversation
Statistics and AI: A Fireside Conversation Open
A 3-hour webinar titled “Statistics and AI – A Fireside Conversation” was held on Sunday, March 17, 2024, attracting an online audience of approximately 1,000. The event featured three sessions aimed at engaging the statistical community o…
View article: Statistics and AI: A Fireside Conversation
Statistics and AI: A Fireside Conversation Open
View article: HCDPD: A Heterogeneous Causal Framework for Disease Pattern Detection in Medical Imaging
HCDPD: A Heterogeneous Causal Framework for Disease Pattern Detection in Medical Imaging Open
Understanding the causal effects of diseases on body organs through medical imaging is crucial for advancing research and improving clinical outcomes. This paper introduces a novel causal inference framework, Heterogeneous Causal Disease P…
View article: Deep Distributional Learning with Non-crossing Quantile Network
Deep Distributional Learning with Non-crossing Quantile Network Open
In this paper, we introduce a non-crossing quantile (NQ) network for conditional distribution learning. By leveraging non-negative activation functions, the NQ network ensures that the learned distributions remain monotonic, effectively ad…
View article: Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Open
Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Te…
View article: Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing
Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing Open
When applied in healthcare, reinforcement learning (RL) seeks to dynamically match the right interventions to subjects to maximize population benefit. However, the learned policy may disproportionately allocate efficacious actions to one s…
View article: Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning Open
This paper studies off-policy evaluation (OPE) in the presence of unmeasured confounders. Inspired by the two-way fixed effects regression model widely used in the panel data literature, we propose a two-way unmeasured confounding assumpti…
View article: Dual Active Learning for Reinforcement Learning from Human Feedback
Dual Active Learning for Reinforcement Learning from Human Feedback Open
Aligning large language models (LLMs) with human preferences is critical to recent advances in generative artificial intelligence. Reinforcement learning from human feedback (RLHF) is widely applied to achieve this objective. A key step in…
View article: Off-Policy Evaluation in Doubly Inhomogeneous Environments
Off-Policy Evaluation in Doubly Inhomogeneous Environments Open
View article: ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments
ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments Open
Online experiments %in which experimental units receive a sequence of treatments over time are frequently employed in many technological companies to evaluate the performance of a newly developed policy, product, or treatment relative to a…
View article: Off-policy Evaluation with Deeply-abstracted States
Off-policy Evaluation with Deeply-abstracted States Open
Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally des…
View article: Combining Experimental and Historical Data for Policy Evaluation
Combining Experimental and Historical Data for Policy Evaluation Open
This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data …
View article: Recirculated transport mechanism aggravates ozone pollution over the mountainous coastal region: Increased contribution from vertical mixing
Recirculated transport mechanism aggravates ozone pollution over the mountainous coastal region: Increased contribution from vertical mixing Open
In coastal areas where many highly developed metropolis and city agglomeration are located, high ozone (O3) concentrations are one of the most severe environmental problems, which is gravely impacted by the sea-land breeze (SLB) circulatio…
View article: Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments
Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments Open
A/B testing has become the gold standard for policy evaluation in modern technological industries. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of variou…
View article: Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data
Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data Open
In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achievi…
View article: Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing
Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing Open
Many modern tech companies, such as Google, Uber, and Didi, use online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with ske…
View article: Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making
Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making Open
Computational cognitive modeling is an important tool for understanding the processes supporting human and animal decision-making. Choice data in decision-making tasks are inherently noisy, and separating noise from signal can improve the …
View article: On Efficient Inference of Causal Effects with Multiple Mediators
On Efficient Inference of Causal Effects with Multiple Mediators Open
This paper provides robust estimators and efficient inference of causal effects involving multiple interacting mediators. Most existing works either impose a linear model assumption among the mediators or are restricted to handle condition…
View article: Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making
Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making Open
A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained fro…
View article: Robust Offline Reinforcement learning with Heavy-Tailed Rewards
Robust Offline Reinforcement learning with Heavy-Tailed Rewards Open
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, …
View article: Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework
Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework Open
Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and …
View article: Testing for the Markov property in time series via deep conditional generative learning
Testing for the Markov property in time series via deep conditional generative learning Open
The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparame…
View article: Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making
Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making Open
Computational cognitive modeling is an important tool for understanding the processes supporting human and animal decision-making. Choice data in decision-making tasks are inherently noisy, and separating noise from signal can improve the …
View article: Off-policy Evaluation in Doubly Inhomogeneous Environments
Off-policy Evaluation in Doubly Inhomogeneous Environments Open
This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we pr…