Explanipedia

PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing Open

Jianhan Zhang, Jitao Wang, Chengchun Shi, John D. Piette, Donglin Zeng , et al. · 2025

Reinforcement learning (RL) aims to learn and evaluate a sequential decision rule, often referred to as a "policy", that maximizes the population-level benefit in an environment across possibly infinitely many time steps. However, the sequ…

AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees Open

Hongyi Zhou, Jin Zhu, P. Su, Kai Ye, Ying Yang , et al. · 2025

We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the ob…

A Two-armed Bandit Framework for A/B Testing Open

Jinjuan Wang, Q. G. Wen, Yu Zhang, Xiaodong Yan, Chengchun Shi · 2025

A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinfor…

Doubly Robust Alignment for Large Language Models Open

Enxiang Xu, Kai Ye, Hongyi Zhou, Lìyǐng Zhū, Francesco Quinzan , et al. · 2025

This paper studies reinforcement learning from human feedback (RLHF) for aligning large language models with human preferences. While RLHF has demonstrated promising results, many algorithms are highly sensitive to misspecifications in the…

Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation Open

Hongyi Zhou, Josiah P. Hanna, Jin Zhu, Ying Yang, Chengchun Shi · 2025

This paper studies off-policy evaluation (OPE) in reinforcement learning with a focus on behavior policy estimation for importance sampling. Prior work has shown empirically that estimating a history-dependent behavior policy can lead to l…

Semi-pessimistic Reinforcement Learning Open

Jin Zhu, Xin Zhou, Jiaang Yao, Gholamali Aminian, Omar Rivasplata , et al. · 2025

Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected data. However, it faces challenges of distributional shift, where the learned policy may encounter unseen scenarios not covered in the offline data. Add…

Statistics and AI: A Fireside Conversation Open

Xihong Lin, Tianxi Cai, David L. Donoho, Haoda Fu, Zheng Tracy Ke , et al. · 2025

A 3-hour webinar titled “Statistics and AI – A Fireside Conversation” was held on Sunday, March 17, 2024, attracting an online audience of approximately 1,000. The event featured three sessions aimed at engaging the statistical community o…

Statistics and AI: A Fireside Conversation Open

Xihong Lin, Tianxi Cai, David L. Donoho, Haoda Fu, Zheng Tracy Ke , et al. · 2025

HCDPD: A Heterogeneous Causal Framework for Disease Pattern Detection in Medical Imaging Open

Rongjie Liu, Chengchun Shi, Rui Song, Marc Niethammer, Tengfei Li , et al. · 2025

Understanding the causal effects of diseases on body organs through medical imaging is crucial for advancing research and improving clinical outcomes. This paper introduces a novel causal inference framework, Heterogeneous Causal Disease P…

Deep Distributional Learning with Non-crossing Quantile Network Open

Guohao Shen, Runpeng Dai, Guojun Wu, Shikai Luo, Chengchun Shi , et al. · 2025

In this paper, we introduce a non-crossing quantile (NQ) network for conditional distribution learning. By leveraging non-negative activation functions, the NQ network ensures that the learned distributions remain monotonic, effectively ad…

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Open

Kai Ye, Hongyi Zhou, Jin Bo Zhu, Francesco Quinzan, Chengchun Shi · 2025

Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Te…

Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing Open

Jitao Wang, Chengchun Shi, John D. Piette, Joshua R. Loftus, Donglin Zeng , et al. · 2025

When applied in healthcare, reinforcement learning (RL) seeks to dynamically match the right interventions to subjects to maximize population benefit. However, the learned policy may disproportionately allocate efficacious actions to one s…

Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning Open

Shuguang Yu, Shuxing Fang, Ruixin Peng, Zhengling Qi, Fan Zhou , et al. · 2024

This paper studies off-policy evaluation (OPE) in the presence of unmeasured confounders. Inspired by the two-way fixed effects regression model widely used in the panel data literature, we propose a two-way unmeasured confounding assumpti…

Dual Active Learning for Reinforcement Learning from Human Feedback Open

Pangpang Liu, Chengchun Shi, Will Wei Sun · 2024

Aligning large language models (LLMs) with human preferences is critical to recent advances in generative artificial intelligence. Reinforcement learning from human feedback (RLHF) is widely applied to achieve this objective. A key step in…

Off-Policy Evaluation in Doubly Inhomogeneous Environments Open

Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang · 2024

ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments Open

Ke Sun, Linglong Kong, Hongtu Zhu, Chengchun Shi · 2024

Online experiments %in which experimental units receive a sequence of treatments over time are frequently employed in many technological companies to evaluate the performance of a newly developed policy, product, or treatment relative to a…

Off-policy Evaluation with Deeply-abstracted States Open

Meiling Hao, Pingfan Su, Liyuan Hu, Zoltán Szabó, Qingyuan Zhao , et al. · 2024

Off-policy evaluation (OPE) is crucial for assessing a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions -- originally des…

Combining Experimental and Historical Data for Policy Evaluation Open

Ting Li, Chengchun Shi, Q Wen, Sui Yang, Yongli Qin , et al. · 2024

This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data …

Recirculated transport mechanism aggravates ozone pollution over the mountainous coastal region: Increased contribution from vertical mixing Open

Jun Hu, Chengchun Shi, Erling Ni, Jane Liu, Shixian Zhai , et al. · 2024

In coastal areas where many highly developed metropolis and city agglomeration are located, high ozone (O3) concentrations are one of the most severe environmental problems, which is gravely impacted by the sea-land breeze (SLB) circulatio…

Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments Open

Qianglin Wen, Chengchun Shi, Ying Yang, Niansheng Tang, Hongtu Zhu · 2024

A/B testing has become the gold standard for policy evaluation in modern technological industries. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of variou…

Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data Open

Danyang Wang, Chengchun Shi, Shikai Luo, Will Wei Sun · 2024

In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achievi…

Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing Open

Ting Li, Chengchun Shi, Zhaohua Lu, Yi Li, Hongtu Zhu · 2024

Many modern tech companies, such as Google, Uber, and Didi, use online experiments (also known as A/B testing) to evaluate new policies against existing ones. While most studies concentrate on average treatment effects, situations with ske…

Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making Open

Jing‐Jing Li, Chengchun Shi, Lexin Li, Anne Collins · 2024

Computational cognitive modeling is an important tool for understanding the processes supporting human and animal decision-making. Choice data in decision-making tasks are inherently noisy, and separating noise from signal can improve the …

On Efficient Inference of Causal Effects with Multiple Mediators Open

Haoyu Wei, Hengrui Cai, Chengchun Shi, Rui Song · 2024

This paper provides robust estimators and efficient inference of causal effects involving multiple interacting mediators. Most existing works either impose a linear model assumption among the mediators or are restricted to handle condition…

Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making Open

Ting Li, Chengchun Shi, Jianing Wang, Fan Zhou, Hongtu Zhu · 2023

A/B testing is critical for modern technological companies to evaluate the effectiveness of newly developed products against standard baselines. This paper studies optimal designs that aim to maximize the amount of information obtained fro…

Robust Offline Reinforcement learning with Heavy-Tailed Rewards Open

Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi · 2023

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, …

Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework Open

Lan Luo, Chengchun Shi, Jitao Wang, Zhenke Wu, Lexin Li · 2023

Mediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and …

Testing for the Markov property in time series via deep conditional generative learning Open

Yunzhe Zhou, Chengchun Shi, Lexin Li, Qiwei Yao · 2023

The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparame…

Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making Open

Jing‐Jing Li, Chengchun Shi, Lexin Li, Anne Collins · 2023

Computational cognitive modeling is an important tool for understanding the processes supporting human and animal decision-making. Choice data in decision-making tasks are inherently noisy, and separating noise from signal can improve the …

Off-policy Evaluation in Doubly Inhomogeneous Environments Open

Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang · 2023

This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we pr…

Chengchun Shi YOU? Author Swipe