A Bernstein polynomial approach for the estimation of cumulative distribution functions in the presence of missing data Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.07235
We study nonparametric estimation of univariate cumulative distribution functions (CDFs) pertaining to data missing at random. The proposed estimators smooth the inverse probability weighted (IPW) empirical CDF with the Bernstein operator, yielding monotone, $[0,1]$-valued curves that automatically adapt to bounded supports. We analyze two versions: a pseudo estimator that uses known propensities and a feasible estimator that uses propensities estimated nonparametrically from discrete auxiliary variables, the latter scenario being much more common in practice. For both, we derive pointwise bias and variance expansions, establish the optimal polynomial degree $m$ with respect to the mean integrated squared error, and prove the asymptotic normality. A key finding is that the feasible estimator has a smaller variance than the pseudo estimator by an explicit nonnegative correction term. We also develop an efficient degree selection procedure via least-squares cross-validation. Monte Carlo experiments demonstrate that, for moderate to large sample sizes, the Bernstein-smoothed feasible estimator outperforms both its unsmoothed counterpart and an integrated version of the IPW kernel density estimator proposed by Dubnicka (2009) in the same context. A real-data application to fasting plasma glucose from the 2017-2018 NHANES survey illustrates the method in a practical setting. All code needed to reproduce our analyses is readily accessible on GitHub.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2510.07235
- https://arxiv.org/pdf/2510.07235
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415318180
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415318180Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2510.07235Digital Object Identifier
- Title
-
A Bernstein polynomial approach for the estimation of cumulative distribution functions in the presence of missing dataWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-10-08Full publication date if available
- Authors
-
Ridha Gharbi, Wissem Jedidi, Salah Khardani, Frédéric OuimetList of authors in order
- Landing page
-
https://arxiv.org/abs/2510.07235Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2510.07235Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2510.07235Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415318180 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2510.07235 |
| ids.doi | https://doi.org/10.48550/arxiv.2510.07235 |
| ids.openalex | https://openalex.org/W4415318180 |
| fwci | |
| type | preprint |
| title | A Bernstein polynomial approach for the estimation of cumulative distribution functions in the presence of missing data |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11186 |
| topics[0].field.id | https://openalex.org/fields/23 |
| topics[0].field.display_name | Environmental Science |
| topics[0].score | 0.9650999903678894 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2306 |
| topics[0].subfield.display_name | Global and Planetary Change |
| topics[0].display_name | Hydrology and Drought Analysis |
| topics[1].id | https://openalex.org/T11052 |
| topics[1].field.id | https://openalex.org/fields/22 |
| topics[1].field.display_name | Engineering |
| topics[1].score | 0.9560999870300293 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2208 |
| topics[1].subfield.display_name | Electrical and Electronic Engineering |
| topics[1].display_name | Energy Load and Power Forecasting |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2510.07235 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2510.07235 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2510.07235 |
| locations[1].id | doi:10.48550/arxiv.2510.07235 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2510.07235 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5005280659 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Ridha Gharbi |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Gharbi, Rihab |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5063658754 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-3153-3948 |
| authorships[1].author.display_name | Wissem Jedidi |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jedidi, Wissem |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5078024341 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Salah Khardani |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Khardani, Salah |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5010318685 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-7933-5265 |
| authorships[3].author.display_name | Frédéric Ouimet |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Ouimet, Frédéric |
| authorships[3].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2510.07235 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-18T00:00:00 |
| display_name | A Bernstein polynomial approach for the estimation of cumulative distribution functions in the presence of missing data |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11186 |
| primary_topic.field.id | https://openalex.org/fields/23 |
| primary_topic.field.display_name | Environmental Science |
| primary_topic.score | 0.9650999903678894 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2306 |
| primary_topic.subfield.display_name | Global and Planetary Change |
| primary_topic.display_name | Hydrology and Drought Analysis |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2510.07235 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2510.07235 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2510.07235 |
| primary_location.id | pmh:oai:arXiv.org:2510.07235 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2510.07235 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2510.07235 |
| publication_date | 2025-10-08 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.A | 102, 173 |
| abstract_inverted_index.a | 45, 53, 111, 189 |
| abstract_inverted_index.We | 0, 41, 124 |
| abstract_inverted_index.an | 119, 127, 156 |
| abstract_inverted_index.at | 14 |
| abstract_inverted_index.by | 118, 166 |
| abstract_inverted_index.in | 72, 169, 188 |
| abstract_inverted_index.is | 105, 199 |
| abstract_inverted_index.of | 4, 159 |
| abstract_inverted_index.on | 202 |
| abstract_inverted_index.to | 11, 38, 91, 142, 176, 195 |
| abstract_inverted_index.we | 76 |
| abstract_inverted_index.$m$ | 88 |
| abstract_inverted_index.All | 192 |
| abstract_inverted_index.CDF | 26 |
| abstract_inverted_index.For | 74 |
| abstract_inverted_index.IPW | 161 |
| abstract_inverted_index.The | 16 |
| abstract_inverted_index.and | 52, 80, 97, 155 |
| abstract_inverted_index.for | 140 |
| abstract_inverted_index.has | 110 |
| abstract_inverted_index.its | 152 |
| abstract_inverted_index.key | 103 |
| abstract_inverted_index.our | 197 |
| abstract_inverted_index.the | 20, 28, 65, 84, 92, 99, 107, 115, 146, 160, 170, 181, 186 |
| abstract_inverted_index.two | 43 |
| abstract_inverted_index.via | 132 |
| abstract_inverted_index.also | 125 |
| abstract_inverted_index.bias | 79 |
| abstract_inverted_index.both | 151 |
| abstract_inverted_index.code | 193 |
| abstract_inverted_index.data | 12 |
| abstract_inverted_index.from | 61, 180 |
| abstract_inverted_index.mean | 93 |
| abstract_inverted_index.more | 70 |
| abstract_inverted_index.much | 69 |
| abstract_inverted_index.same | 171 |
| abstract_inverted_index.than | 114 |
| abstract_inverted_index.that | 35, 48, 56, 106 |
| abstract_inverted_index.uses | 49, 57 |
| abstract_inverted_index.with | 27, 89 |
| abstract_inverted_index.(IPW) | 24 |
| abstract_inverted_index.Carlo | 136 |
| abstract_inverted_index.Monte | 135 |
| abstract_inverted_index.adapt | 37 |
| abstract_inverted_index.being | 68 |
| abstract_inverted_index.both, | 75 |
| abstract_inverted_index.known | 50 |
| abstract_inverted_index.large | 143 |
| abstract_inverted_index.prove | 98 |
| abstract_inverted_index.study | 1 |
| abstract_inverted_index.term. | 123 |
| abstract_inverted_index.that, | 139 |
| abstract_inverted_index.(2009) | 168 |
| abstract_inverted_index.(CDFs) | 9 |
| abstract_inverted_index.NHANES | 183 |
| abstract_inverted_index.common | 71 |
| abstract_inverted_index.curves | 34 |
| abstract_inverted_index.degree | 87, 129 |
| abstract_inverted_index.derive | 77 |
| abstract_inverted_index.error, | 96 |
| abstract_inverted_index.kernel | 162 |
| abstract_inverted_index.latter | 66 |
| abstract_inverted_index.method | 187 |
| abstract_inverted_index.needed | 194 |
| abstract_inverted_index.plasma | 178 |
| abstract_inverted_index.pseudo | 46, 116 |
| abstract_inverted_index.sample | 144 |
| abstract_inverted_index.sizes, | 145 |
| abstract_inverted_index.smooth | 19 |
| abstract_inverted_index.survey | 184 |
| abstract_inverted_index.GitHub. | 203 |
| abstract_inverted_index.analyze | 42 |
| abstract_inverted_index.bounded | 39 |
| abstract_inverted_index.density | 163 |
| abstract_inverted_index.develop | 126 |
| abstract_inverted_index.fasting | 177 |
| abstract_inverted_index.finding | 104 |
| abstract_inverted_index.glucose | 179 |
| abstract_inverted_index.inverse | 21 |
| abstract_inverted_index.missing | 13 |
| abstract_inverted_index.optimal | 85 |
| abstract_inverted_index.random. | 15 |
| abstract_inverted_index.readily | 200 |
| abstract_inverted_index.respect | 90 |
| abstract_inverted_index.smaller | 112 |
| abstract_inverted_index.squared | 95 |
| abstract_inverted_index.version | 158 |
| abstract_inverted_index.Dubnicka | 167 |
| abstract_inverted_index.analyses | 198 |
| abstract_inverted_index.context. | 172 |
| abstract_inverted_index.discrete | 62 |
| abstract_inverted_index.explicit | 120 |
| abstract_inverted_index.feasible | 54, 108, 148 |
| abstract_inverted_index.moderate | 141 |
| abstract_inverted_index.proposed | 17, 165 |
| abstract_inverted_index.scenario | 67 |
| abstract_inverted_index.setting. | 191 |
| abstract_inverted_index.variance | 81, 113 |
| abstract_inverted_index.weighted | 23 |
| abstract_inverted_index.yielding | 31 |
| abstract_inverted_index.2017-2018 | 182 |
| abstract_inverted_index.Bernstein | 29 |
| abstract_inverted_index.auxiliary | 63 |
| abstract_inverted_index.efficient | 128 |
| abstract_inverted_index.empirical | 25 |
| abstract_inverted_index.establish | 83 |
| abstract_inverted_index.estimated | 59 |
| abstract_inverted_index.estimator | 47, 55, 109, 117, 149, 164 |
| abstract_inverted_index.functions | 8 |
| abstract_inverted_index.monotone, | 32 |
| abstract_inverted_index.operator, | 30 |
| abstract_inverted_index.pointwise | 78 |
| abstract_inverted_index.practical | 190 |
| abstract_inverted_index.practice. | 73 |
| abstract_inverted_index.procedure | 131 |
| abstract_inverted_index.real-data | 174 |
| abstract_inverted_index.reproduce | 196 |
| abstract_inverted_index.selection | 130 |
| abstract_inverted_index.supports. | 40 |
| abstract_inverted_index.versions: | 44 |
| abstract_inverted_index.accessible | 201 |
| abstract_inverted_index.asymptotic | 100 |
| abstract_inverted_index.correction | 122 |
| abstract_inverted_index.cumulative | 6 |
| abstract_inverted_index.estimation | 3 |
| abstract_inverted_index.estimators | 18 |
| abstract_inverted_index.integrated | 94, 157 |
| abstract_inverted_index.normality. | 101 |
| abstract_inverted_index.pertaining | 10 |
| abstract_inverted_index.polynomial | 86 |
| abstract_inverted_index.univariate | 5 |
| abstract_inverted_index.unsmoothed | 153 |
| abstract_inverted_index.variables, | 64 |
| abstract_inverted_index.application | 175 |
| abstract_inverted_index.counterpart | 154 |
| abstract_inverted_index.demonstrate | 138 |
| abstract_inverted_index.expansions, | 82 |
| abstract_inverted_index.experiments | 137 |
| abstract_inverted_index.illustrates | 185 |
| abstract_inverted_index.nonnegative | 121 |
| abstract_inverted_index.outperforms | 150 |
| abstract_inverted_index.probability | 22 |
| abstract_inverted_index.distribution | 7 |
| abstract_inverted_index.propensities | 51, 58 |
| abstract_inverted_index.automatically | 36 |
| abstract_inverted_index.least-squares | 133 |
| abstract_inverted_index.nonparametric | 2 |
| abstract_inverted_index.$[0,1]$-valued | 33 |
| abstract_inverted_index.cross-validation. | 134 |
| abstract_inverted_index.nonparametrically | 60 |
| abstract_inverted_index.Bernstein-smoothed | 147 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile |