Evaluating Variance Estimates with Relative Efficiency Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2511.15961
Experimentation platforms in industry must often deal with customer trust issues. Platforms must prove the validity of their claims as well as catch issues that arise. As a central quantity estimated by experimentation platforms, the validity of confidence intervals is of particular concern. To ensure confidence intervals are reliable, we must understand and diagnose when our variance estimates are biased or noisy, or when the confidence intervals may be incorrect. A common method for this is A/A testing, in which both the control and test arms receive the same treatment. One can then test if the empirical false positive rate (FPR) deviates substantially from the target FPR over many tests. However, this approach turns each A/A test into a simple binary random variable. It is an inefficient estimate of the FPR as it throws away information about the magnitude of each experiment result. We show how to empirically evaluate the effectiveness of statistics that monitor the variance estimates that partly dictate a platform's statistical reliability. We also show that statistics other than empirical FPR are more effective at detecting issues. In particular, we propose a $t^2$-statistic that is more sample efficient.
Related Topics
- Type
- preprint
- Landing Page
- http://arxiv.org/abs/2511.15961
- https://arxiv.org/pdf/2511.15961
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4416550183
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416550183Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2511.15961Digital Object Identifier
- Title
-
Evaluating Variance Estimates with Relative EfficiencyWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-20Full publication date if available
- Authors
-
Kedar Karhadkar, Jack Klys, Daniel Shu Wei Ting, Artem VorozhtsovList of authors in order
- Landing page
-
https://arxiv.org/abs/2511.15961Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2511.15961Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2511.15961Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4416550183 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2511.15961 |
| ids.doi | https://doi.org/10.48550/arxiv.2511.15961 |
| ids.openalex | https://openalex.org/W4416550183 |
| fwci | |
| type | preprint |
| title | Evaluating Variance Estimates with Relative Efficiency |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | pmh:oai:arXiv.org:2511.15961 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2511.15961 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2511.15961 |
| locations[1].id | doi:10.48550/arxiv.2511.15961 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2511.15961 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5120513671 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Kedar Karhadkar |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Karhadkar, Kedar |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5084796787 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Jack Klys |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Klys, Jack |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5032625133 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-2264-7174 |
| authorships[2].author.display_name | Daniel Shu Wei Ting |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Ting, Daniel |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5120373767 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Artem Vorozhtsov |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Vorozhtsov, Artem |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2511.15961 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-23T00:00:00 |
| display_name | Evaluating Variance Estimates with Relative Efficiency |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T17:07:48.141936 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2511.15961 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2511.15961 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2511.15961 |
| primary_location.id | pmh:oai:arXiv.org:2511.15961 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2511.15961 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2511.15961 |
| publication_date | 2025-11-20 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 70 |
| abstract_inverted_index.a | 27, 118, 161, 184 |
| abstract_inverted_index.As | 26 |
| abstract_inverted_index.In | 180 |
| abstract_inverted_index.It | 123 |
| abstract_inverted_index.To | 43 |
| abstract_inverted_index.We | 143, 165 |
| abstract_inverted_index.an | 125 |
| abstract_inverted_index.as | 19, 21, 131 |
| abstract_inverted_index.at | 177 |
| abstract_inverted_index.be | 68 |
| abstract_inverted_index.by | 31 |
| abstract_inverted_index.if | 94 |
| abstract_inverted_index.in | 2, 78 |
| abstract_inverted_index.is | 39, 75, 124, 187 |
| abstract_inverted_index.it | 132 |
| abstract_inverted_index.of | 16, 36, 40, 128, 139, 151 |
| abstract_inverted_index.or | 60, 62 |
| abstract_inverted_index.to | 146 |
| abstract_inverted_index.we | 49, 182 |
| abstract_inverted_index.A/A | 76, 115 |
| abstract_inverted_index.FPR | 106, 130, 173 |
| abstract_inverted_index.One | 90 |
| abstract_inverted_index.and | 52, 83 |
| abstract_inverted_index.are | 47, 58, 174 |
| abstract_inverted_index.can | 91 |
| abstract_inverted_index.for | 73 |
| abstract_inverted_index.how | 145 |
| abstract_inverted_index.may | 67 |
| abstract_inverted_index.our | 55 |
| abstract_inverted_index.the | 14, 34, 64, 81, 87, 95, 104, 129, 137, 149, 155 |
| abstract_inverted_index.also | 166 |
| abstract_inverted_index.arms | 85 |
| abstract_inverted_index.away | 134 |
| abstract_inverted_index.both | 80 |
| abstract_inverted_index.deal | 6 |
| abstract_inverted_index.each | 114, 140 |
| abstract_inverted_index.from | 103 |
| abstract_inverted_index.into | 117 |
| abstract_inverted_index.many | 108 |
| abstract_inverted_index.more | 175, 188 |
| abstract_inverted_index.must | 4, 12, 50 |
| abstract_inverted_index.over | 107 |
| abstract_inverted_index.rate | 99 |
| abstract_inverted_index.same | 88 |
| abstract_inverted_index.show | 144, 167 |
| abstract_inverted_index.test | 84, 93, 116 |
| abstract_inverted_index.than | 171 |
| abstract_inverted_index.that | 24, 153, 158, 168, 186 |
| abstract_inverted_index.then | 92 |
| abstract_inverted_index.this | 74, 111 |
| abstract_inverted_index.well | 20 |
| abstract_inverted_index.when | 54, 63 |
| abstract_inverted_index.with | 7 |
| abstract_inverted_index.(FPR) | 100 |
| abstract_inverted_index.about | 136 |
| abstract_inverted_index.catch | 22 |
| abstract_inverted_index.false | 97 |
| abstract_inverted_index.often | 5 |
| abstract_inverted_index.other | 170 |
| abstract_inverted_index.prove | 13 |
| abstract_inverted_index.their | 17 |
| abstract_inverted_index.trust | 9 |
| abstract_inverted_index.turns | 113 |
| abstract_inverted_index.which | 79 |
| abstract_inverted_index.arise. | 25 |
| abstract_inverted_index.biased | 59 |
| abstract_inverted_index.binary | 120 |
| abstract_inverted_index.claims | 18 |
| abstract_inverted_index.common | 71 |
| abstract_inverted_index.ensure | 44 |
| abstract_inverted_index.issues | 23 |
| abstract_inverted_index.method | 72 |
| abstract_inverted_index.noisy, | 61 |
| abstract_inverted_index.partly | 159 |
| abstract_inverted_index.random | 121 |
| abstract_inverted_index.sample | 189 |
| abstract_inverted_index.simple | 119 |
| abstract_inverted_index.target | 105 |
| abstract_inverted_index.tests. | 109 |
| abstract_inverted_index.throws | 133 |
| abstract_inverted_index.central | 28 |
| abstract_inverted_index.control | 82 |
| abstract_inverted_index.dictate | 160 |
| abstract_inverted_index.issues. | 10, 179 |
| abstract_inverted_index.monitor | 154 |
| abstract_inverted_index.propose | 183 |
| abstract_inverted_index.receive | 86 |
| abstract_inverted_index.result. | 142 |
| abstract_inverted_index.However, | 110 |
| abstract_inverted_index.approach | 112 |
| abstract_inverted_index.concern. | 42 |
| abstract_inverted_index.customer | 8 |
| abstract_inverted_index.deviates | 101 |
| abstract_inverted_index.diagnose | 53 |
| abstract_inverted_index.estimate | 127 |
| abstract_inverted_index.evaluate | 148 |
| abstract_inverted_index.industry | 3 |
| abstract_inverted_index.positive | 98 |
| abstract_inverted_index.quantity | 29 |
| abstract_inverted_index.testing, | 77 |
| abstract_inverted_index.validity | 15, 35 |
| abstract_inverted_index.variance | 56, 156 |
| abstract_inverted_index.Platforms | 11 |
| abstract_inverted_index.detecting | 178 |
| abstract_inverted_index.effective | 176 |
| abstract_inverted_index.empirical | 96, 172 |
| abstract_inverted_index.estimated | 30 |
| abstract_inverted_index.estimates | 57, 157 |
| abstract_inverted_index.intervals | 38, 46, 66 |
| abstract_inverted_index.magnitude | 138 |
| abstract_inverted_index.platforms | 1 |
| abstract_inverted_index.reliable, | 48 |
| abstract_inverted_index.variable. | 122 |
| abstract_inverted_index.confidence | 37, 45, 65 |
| abstract_inverted_index.efficient. | 190 |
| abstract_inverted_index.experiment | 141 |
| abstract_inverted_index.incorrect. | 69 |
| abstract_inverted_index.particular | 41 |
| abstract_inverted_index.platform's | 162 |
| abstract_inverted_index.platforms, | 33 |
| abstract_inverted_index.statistics | 152, 169 |
| abstract_inverted_index.treatment. | 89 |
| abstract_inverted_index.understand | 51 |
| abstract_inverted_index.empirically | 147 |
| abstract_inverted_index.inefficient | 126 |
| abstract_inverted_index.information | 135 |
| abstract_inverted_index.particular, | 181 |
| abstract_inverted_index.statistical | 163 |
| abstract_inverted_index.reliability. | 164 |
| abstract_inverted_index.effectiveness | 150 |
| abstract_inverted_index.substantially | 102 |
| abstract_inverted_index.$t^2$-statistic | 185 |
| abstract_inverted_index.Experimentation | 0 |
| abstract_inverted_index.experimentation | 32 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |