Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2504.03997
Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on historical interaction data introduces biases-such as selection, exposure, conformity, and position biases-that distort evaluation metrics, driven by the Missing-Not-At-Random (MNAR) nature of user interactions and favoring popular or frequently exposed items over true user preferences. We propose a novel framework for robust offline evaluation of retrieval-ranking systems, transforming MNAR data into Missing-At-Random (MAR) through reweighting combined with black-box optimization, guided by neural estimation of information-theoretic metrics. Our contributions include (1) a causal formulation for addressing offline evaluation biases, (2) a system-agnostic debiasing framework, and (3) empirical validation of its effectiveness. This framework enables more accurate, fair, and generalizable evaluations, enhancing model assessment before deployment.
Related Topics To Compare & Contrast
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2504.03997
- https://arxiv.org/pdf/2504.03997
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416125100