Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2504.03997
Evaluating retrieval-ranking systems is crucial for developing high-performing models. While online A/B testing is the gold standard, its high cost and risks to user experience require effective offline methods. However, relying on historical interaction data introduces biases-such as selection, exposure, conformity, and position biases-that distort evaluation metrics, driven by the Missing-Not-At-Random (MNAR) nature of user interactions and favoring popular or frequently exposed items over true user preferences. We propose a novel framework for robust offline evaluation of retrieval-ranking systems, transforming MNAR data into Missing-At-Random (MAR) through reweighting combined with black-box optimization, guided by neural estimation of information-theoretic metrics. Our contributions include (1) a causal formulation for addressing offline evaluation biases, (2) a system-agnostic debiasing framework, and (3) empirical validation of its effectiveness. This framework enables more accurate, fair, and generalizable evaluations, enhancing model assessment before deployment.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2504.03997
- https://arxiv.org/pdf/2504.03997
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416125100
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416125100Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2504.03997Digital Object Identifier
- Title
-
Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking SystemsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-04Full publication date if available
- Authors
-
Ruomeng Xu, Babak SalimiList of authors in order
- Landing page
-
https://arxiv.org/abs/2504.03997Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2504.03997Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2504.03997Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416125100 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2504.03997 |
| ids.doi | https://doi.org/10.48550/arxiv.2504.03997 |
| ids.openalex | https://openalex.org/W4416125100 |
| fwci | |
| type | preprint |
| title | Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2504.03997 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2504.03997 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2504.03997 |
| locations[1].id | doi:10.48550/arxiv.2504.03997 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2504.03997 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5066485633 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-6155-4582 |
| authorships[0].author.display_name | Ruomeng Xu |
| authorships[0].author_position | middle |
| authorships[0].raw_author_name | Xu, Ruomeng |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5103209063 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-2485-9533 |
| authorships[1].author.display_name | Babak Salimi |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Salimi, Babak |
| authorships[1].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2504.03997 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Towards Robust Offline Evaluation: A Causal and Information Theoretic Framework for Debiasing Ranking Systems |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T05:44:28.285651 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2504.03997 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2504.03997 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2504.03997 |
| primary_location.id | pmh:oai:arXiv.org:2504.03997 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2504.03997 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2504.03997 |
| publication_date | 2025-04-04 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 69, 102, 111 |
| abstract_inverted_index.We | 67 |
| abstract_inverted_index.as | 37 |
| abstract_inverted_index.by | 48, 92 |
| abstract_inverted_index.is | 3, 13 |
| abstract_inverted_index.of | 53, 76, 95, 119 |
| abstract_inverted_index.on | 31 |
| abstract_inverted_index.or | 59 |
| abstract_inverted_index.to | 22 |
| abstract_inverted_index.(1) | 101 |
| abstract_inverted_index.(2) | 110 |
| abstract_inverted_index.(3) | 116 |
| abstract_inverted_index.A/B | 11 |
| abstract_inverted_index.Our | 98 |
| abstract_inverted_index.and | 20, 41, 56, 115, 128 |
| abstract_inverted_index.for | 5, 72, 105 |
| abstract_inverted_index.its | 17, 120 |
| abstract_inverted_index.the | 14, 49 |
| abstract_inverted_index.MNAR | 80 |
| abstract_inverted_index.This | 122 |
| abstract_inverted_index.cost | 19 |
| abstract_inverted_index.data | 34, 81 |
| abstract_inverted_index.gold | 15 |
| abstract_inverted_index.high | 18 |
| abstract_inverted_index.into | 82 |
| abstract_inverted_index.more | 125 |
| abstract_inverted_index.over | 63 |
| abstract_inverted_index.true | 64 |
| abstract_inverted_index.user | 23, 54, 65 |
| abstract_inverted_index.with | 88 |
| abstract_inverted_index.(MAR) | 84 |
| abstract_inverted_index.While | 9 |
| abstract_inverted_index.fair, | 127 |
| abstract_inverted_index.items | 62 |
| abstract_inverted_index.model | 132 |
| abstract_inverted_index.novel | 70 |
| abstract_inverted_index.risks | 21 |
| abstract_inverted_index.(MNAR) | 51 |
| abstract_inverted_index.before | 134 |
| abstract_inverted_index.causal | 103 |
| abstract_inverted_index.driven | 47 |
| abstract_inverted_index.guided | 91 |
| abstract_inverted_index.nature | 52 |
| abstract_inverted_index.neural | 93 |
| abstract_inverted_index.online | 10 |
| abstract_inverted_index.robust | 73 |
| abstract_inverted_index.biases, | 109 |
| abstract_inverted_index.crucial | 4 |
| abstract_inverted_index.distort | 44 |
| abstract_inverted_index.enables | 124 |
| abstract_inverted_index.exposed | 61 |
| abstract_inverted_index.include | 100 |
| abstract_inverted_index.models. | 8 |
| abstract_inverted_index.offline | 27, 74, 107 |
| abstract_inverted_index.popular | 58 |
| abstract_inverted_index.propose | 68 |
| abstract_inverted_index.relying | 30 |
| abstract_inverted_index.require | 25 |
| abstract_inverted_index.systems | 2 |
| abstract_inverted_index.testing | 12 |
| abstract_inverted_index.through | 85 |
| abstract_inverted_index.However, | 29 |
| abstract_inverted_index.combined | 87 |
| abstract_inverted_index.favoring | 57 |
| abstract_inverted_index.methods. | 28 |
| abstract_inverted_index.metrics, | 46 |
| abstract_inverted_index.metrics. | 97 |
| abstract_inverted_index.position | 42 |
| abstract_inverted_index.systems, | 78 |
| abstract_inverted_index.accurate, | 126 |
| abstract_inverted_index.black-box | 89 |
| abstract_inverted_index.debiasing | 113 |
| abstract_inverted_index.effective | 26 |
| abstract_inverted_index.empirical | 117 |
| abstract_inverted_index.enhancing | 131 |
| abstract_inverted_index.exposure, | 39 |
| abstract_inverted_index.framework | 71, 123 |
| abstract_inverted_index.standard, | 16 |
| abstract_inverted_index.Evaluating | 0 |
| abstract_inverted_index.addressing | 106 |
| abstract_inverted_index.assessment | 133 |
| abstract_inverted_index.developing | 6 |
| abstract_inverted_index.estimation | 94 |
| abstract_inverted_index.evaluation | 45, 75, 108 |
| abstract_inverted_index.experience | 24 |
| abstract_inverted_index.framework, | 114 |
| abstract_inverted_index.frequently | 60 |
| abstract_inverted_index.historical | 32 |
| abstract_inverted_index.introduces | 35 |
| abstract_inverted_index.selection, | 38 |
| abstract_inverted_index.validation | 118 |
| abstract_inverted_index.biases-such | 36 |
| abstract_inverted_index.biases-that | 43 |
| abstract_inverted_index.conformity, | 40 |
| abstract_inverted_index.deployment. | 135 |
| abstract_inverted_index.formulation | 104 |
| abstract_inverted_index.interaction | 33 |
| abstract_inverted_index.reweighting | 86 |
| abstract_inverted_index.evaluations, | 130 |
| abstract_inverted_index.interactions | 55 |
| abstract_inverted_index.preferences. | 66 |
| abstract_inverted_index.transforming | 79 |
| abstract_inverted_index.contributions | 99 |
| abstract_inverted_index.generalizable | 129 |
| abstract_inverted_index.optimization, | 90 |
| abstract_inverted_index.effectiveness. | 121 |
| abstract_inverted_index.high-performing | 7 |
| abstract_inverted_index.system-agnostic | 112 |
| abstract_inverted_index.Missing-At-Random | 83 |
| abstract_inverted_index.retrieval-ranking | 1, 77 |
| abstract_inverted_index.Missing-Not-At-Random | 50 |
| abstract_inverted_index.information-theoretic | 96 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |