Alex Deng
YOU?
Author Swipe
View article: Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking
Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking Open
Evaluation plays a crucial role in the development of ranking algorithms on search and recommender systems. It enables online platforms to create user-friendly features that drive commercial success in a steady and effective manner. The on…
View article: FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Open
Prior research on training grounded factuality classification models to detect hallucinations in large language models (LLMs) has relied on public natural language inference (NLI) data and synthetic data. However, conventional NLI datasets…
View article: SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection Open
Large language models (LLMs) are highly capable but face latency challenges in real-time applications, such as conducting online hallucination detection. To overcome this issue, we propose a novel framework that leverages a small language …
View article: From Augmentation to Decomposition: A New Look at CUPED in 2023
From Augmentation to Decomposition: A New Look at CUPED in 2023 Open
Ten years ago, CUPED (Controlled Experiments Utilizing Pre-Experiment Data) mainstreamed the idea of variance reduction leveraging pre-experiment covariates. Since its introduction, it has been implemented, extended, and modernized by majo…
View article: Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology
Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology Open
The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airb…
View article: Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology
Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology Open
The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the past two decades, organizations such as Airb…
View article: Continuous Attribution of Episodical Outcomes for More Efficient and Targeted Online Measurement
Continuous Attribution of Episodical Outcomes for More Efficient and Targeted Online Measurement Open
Online experimentation platforms collect user feedback at low cost and large scale. Some systems even support real-time or near real-time data processing, and can update metrics and statistics continuously. Many commonly used metrics, such…
View article: Collaborative Systems Thinking Culture: A Path to Success for Complex Projects
Collaborative Systems Thinking Culture: A Path to Success for Complex Projects Open
The world is filled with hard and complex problems, oftentimes requiring involved solutions. In large organizations attempting to solve these types of problems, a mindset shift and key candidate methodologies centered on collaborative syst…
View article: Zero to Hero: Exploiting Null Effects to Achieve Variance Reduction in Experiments with One-sided Triggering
Zero to Hero: Exploiting Null Effects to Achieve Variance Reduction in Experiments with One-sided Triggering Open
In online experiments where the intervention is only exposed, or "triggered", for a small subset of the population, it is critical to use variance reduction techniques to estimate treatment effects with sufficient precision to inform busin…
View article: On Post-selection Inference in A/B Testing
On Post-selection Inference in A/B Testing Open
When interpreting A/B tests, we typically focus only on the statistically significant results and take them by face value. This practice, termed post-selection inference in the statistical literature, may negatively affect both point estim…
View article: The equivalence of the Delta method and the cluster-robust variance estimator for the analysis of clustered randomized experiments
The equivalence of the Delta method and the cluster-robust variance estimator for the analysis of clustered randomized experiments Open
It often happens that the same problem presents itself to different communities and the solutions proposed or adopted by those communities are different. We take the case of the variance estimation of the population average treatment effec…
View article: A/B Testing at Scale: Accelerating Software Innovation
A/B Testing at Scale: Accelerating Software Innovation Open
The Internet and the general digitalization of products and operations provides an unprecedented opportunity to accelerate innovation while applying a rigorous and trustworthy methodology for supporting key product decisions. Developers of…
View article: A note on Type S/M errors in hypothesis testing
A note on Type S/M errors in hypothesis testing Open
Motivated by the recent replication and reproducibility crisis, Gelman and Carlin (2014, Perspect. Psychol. Sci ., 9 , 641) advocated focusing on controlling for Type S/M errors, instead of the classic Type I/ II errors, when conducting hy…
View article: Applying the Delta method in metric analytics: A practical guide with novel ideas
Applying the Delta method in metric analytics: A practical guide with novel ideas Open
During the last decade, the information technology industry has adopted a data-driven culture, relying on online metrics to measure and monitor business performance. Under the setting of big data, the majority of such metrics approximately…
View article: A note on type S/M errors in hypothesis testing
A note on type S/M errors in hypothesis testing Open
Motivated by the recent replication and reproducibility crisis, Gelman and Carlin (2014) advocated focusing on controlling for type S/M errors, instead of the classic type I/II errors, when conducting hypothesis testing. In this paper, we …
View article: On randomization-based causal inference for matched-pair factorial designs
On randomization-based causal inference for matched-pair factorial designs Open
Under the potential outcomes framework, we introduce matched-pair factorial designs, and propose the matched-pair estimator of the factorial effects. We also calculate the randomization-based covariance matrix of the matched-pair estimator…
View article: Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression
Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression Open
Randomized controlled experiment has long been accepted as the golden standard for establishing causal link and estimating causal effect in various scientific fields. Average treatment effect is often used to summarize the effect estimatio…
View article: Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing
Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing Open
A/B testing is one of the most successful applications of statistical theory in modern Internet age. One problem of Null Hypothesis Statistical Testing (NHST), the backbone of A/B testing methodology, is that experimenters are not allowed …
View article: Demystifying the Bias from Selective Inference: a Revisit to Dawid's Treatment Selection Problem
Demystifying the Bias from Selective Inference: a Revisit to Dawid's Treatment Selection Problem Open
We extend the heuristic discussion in Senn (2008) on the bias from selective inference for the treatment selection problem (Dawid 1994), by deriving the closed-form expression for the selection bias. We illustrate the advantages of our the…