Optimal Sampling for Generalized Linear Models Under Measurement Constraints Article Swipe
YOU?
·
· 2021
· Open Access
·
· DOI: https://doi.org/10.6084/m9.figshare.12448871
Under “measurement constraints,” responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our goal is to sample a relatively small portion of the dataset where the expensive responses will be measured and the resultant sampling estimator is statistically efficient. Measurement constraints require the sampling probabilities can only depend on a very small set of the responses. A sampling procedure that uses responses at most only on a small pilot sample will be called “response-free.” We propose a response-free sampling procedure optimal sampling under measurement constraints (OSUMC) for generalized linear models. Using the A-optimality criterion, that is, the trace of the asymptotic variance, the resultant estimator is statistically efficient within a class of sampling estimators. We establish the unconditional asymptotic distribution of a general class of response-free sampling estimators. This result is novel compared with the existing conditional results obtained by conditioning on both covariates and responses. Under our unconditional framework, the subsamples are no longer independent and new martingale techniques are developed for our asymptotic theory. We further derive the A-optimal response-free sampling distribution. Since this distribution depends on population level quantities, we propose the OSUMC algorithm to approximate the theoretical optimal sampling. Finally, we conduct an intensive empirical study to demonstrate the advantages of OSUMC algorithm over existing methods in both statistical and computational perspectives. We find that OSUMC’s performance is comparable to that of sampling algorithms that use complete responses. This shows that, provided an efficient algorithm such as OSUMC is used, there is little or no loss in accuracy due to the unavailability of responses because of measurement constraints. Supplementary materials for this article are available online.
Related Topics
- Type
- dataset
- Language
- en
- Landing Page
- https://doi.org/10.6084/m9.figshare.12448871
- OA Status
- gold
- Cited By
- 1
- References
- 5
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2962581334
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2962581334Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.6084/m9.figshare.12448871Digital Object Identifier
- Title
-
Optimal Sampling for Generalized Linear Models Under Measurement ConstraintsWork title
- Type
-
datasetOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2021Year of publication
- Publication date
-
2021-01-01Full publication date if available
- Authors
-
Tao Zhang, Yang Ning, David RuppertList of authors in order
- Landing page
-
https://doi.org/10.6084/m9.figshare.12448871Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.6084/m9.figshare.12448871Direct OA link when available
- Concepts
-
Sampling (signal processing), Mathematics, Generalized linear model, Statistics, Computer science, Applied mathematics, Mathematical optimization, Telecommunications, DetectorTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2022: 1Per-year citation counts (last 5 years)
- References (count)
-
5Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2962581334 |
|---|---|
| doi | https://doi.org/10.6084/m9.figshare.12448871 |
| ids.doi | https://doi.org/10.6084/m9.figshare.12448871 |
| ids.mag | 2962581334 |
| ids.openalex | https://openalex.org/W2962581334 |
| fwci | |
| type | dataset |
| title | Optimal Sampling for Generalized Linear Models Under Measurement Constraints |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11443 |
| topics[0].field.id | https://openalex.org/fields/18 |
| topics[0].field.display_name | Decision Sciences |
| topics[0].score | 0.9020000100135803 |
| topics[0].domain.id | https://openalex.org/domains/2 |
| topics[0].domain.display_name | Social Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1804 |
| topics[0].subfield.display_name | Statistics, Probability and Uncertainty |
| topics[0].display_name | Advanced Statistical Process Monitoring |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C140779682 |
| concepts[0].level | 3 |
| concepts[0].score | 0.5730942487716675 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q210868 |
| concepts[0].display_name | Sampling (signal processing) |
| concepts[1].id | https://openalex.org/C33923547 |
| concepts[1].level | 0 |
| concepts[1].score | 0.4696573317050934 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[1].display_name | Mathematics |
| concepts[2].id | https://openalex.org/C41587187 |
| concepts[2].level | 2 |
| concepts[2].score | 0.443445086479187 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1501882 |
| concepts[2].display_name | Generalized linear model |
| concepts[3].id | https://openalex.org/C105795698 |
| concepts[3].level | 1 |
| concepts[3].score | 0.4259685277938843 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[3].display_name | Statistics |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.4195018708705902 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C28826006 |
| concepts[5].level | 1 |
| concepts[5].score | 0.3611105978488922 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q33521 |
| concepts[5].display_name | Applied mathematics |
| concepts[6].id | https://openalex.org/C126255220 |
| concepts[6].level | 1 |
| concepts[6].score | 0.35361260175704956 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[6].display_name | Mathematical optimization |
| concepts[7].id | https://openalex.org/C76155785 |
| concepts[7].level | 1 |
| concepts[7].score | 0.07280683517456055 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q418 |
| concepts[7].display_name | Telecommunications |
| concepts[8].id | https://openalex.org/C94915269 |
| concepts[8].level | 2 |
| concepts[8].score | 0.0 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q1834857 |
| concepts[8].display_name | Detector |
| keywords[0].id | https://openalex.org/keywords/sampling |
| keywords[0].score | 0.5730942487716675 |
| keywords[0].display_name | Sampling (signal processing) |
| keywords[1].id | https://openalex.org/keywords/mathematics |
| keywords[1].score | 0.4696573317050934 |
| keywords[1].display_name | Mathematics |
| keywords[2].id | https://openalex.org/keywords/generalized-linear-model |
| keywords[2].score | 0.443445086479187 |
| keywords[2].display_name | Generalized linear model |
| keywords[3].id | https://openalex.org/keywords/statistics |
| keywords[3].score | 0.4259685277938843 |
| keywords[3].display_name | Statistics |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.4195018708705902 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/applied-mathematics |
| keywords[5].score | 0.3611105978488922 |
| keywords[5].display_name | Applied mathematics |
| keywords[6].id | https://openalex.org/keywords/mathematical-optimization |
| keywords[6].score | 0.35361260175704956 |
| keywords[6].display_name | Mathematical optimization |
| keywords[7].id | https://openalex.org/keywords/telecommunications |
| keywords[7].score | 0.07280683517456055 |
| keywords[7].display_name | Telecommunications |
| language | en |
| locations[0].id | doi:10.6084/m9.figshare.12448871 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | dataset |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.6084/m9.figshare.12448871 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5100375723 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-7689-4184 |
| authorships[0].author.display_name | Tao Zhang |
| authorships[0].countries | US |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I205783295 |
| authorships[0].affiliations[0].raw_affiliation_string | cornell University |
| authorships[0].institutions[0].id | https://openalex.org/I205783295 |
| authorships[0].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[0].institutions[0].country_code | US |
| authorships[0].institutions[0].display_name | Cornell University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Tao Zhang |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | cornell University |
| authorships[1].author.id | https://openalex.org/A5070406375 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-6877-9231 |
| authorships[1].author.display_name | Yang Ning |
| authorships[1].countries | US |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I205783295 |
| authorships[1].affiliations[0].raw_affiliation_string | cornell University |
| authorships[1].institutions[0].id | https://openalex.org/I205783295 |
| authorships[1].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[1].institutions[0].country_code | US |
| authorships[1].institutions[0].display_name | Cornell University |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Yang Ning |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | cornell University |
| authorships[2].author.id | https://openalex.org/A5061514978 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-6713-2257 |
| authorships[2].author.display_name | David Ruppert |
| authorships[2].countries | US |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I205783295 |
| authorships[2].affiliations[0].raw_affiliation_string | cornell University |
| authorships[2].institutions[0].id | https://openalex.org/I205783295 |
| authorships[2].institutions[0].ror | https://ror.org/05bnh6r87 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I205783295 |
| authorships[2].institutions[0].country_code | US |
| authorships[2].institutions[0].display_name | Cornell University |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | David Ruppert |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | cornell University |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.6084/m9.figshare.12448871 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Optimal Sampling for Generalized Linear Models Under Measurement Constraints |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11443 |
| primary_topic.field.id | https://openalex.org/fields/18 |
| primary_topic.field.display_name | Decision Sciences |
| primary_topic.score | 0.9020000100135803 |
| primary_topic.domain.id | https://openalex.org/domains/2 |
| primary_topic.domain.display_name | Social Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1804 |
| primary_topic.subfield.display_name | Statistics, Probability and Uncertainty |
| primary_topic.display_name | Advanced Statistical Process Monitoring |
| related_works | https://openalex.org/W2284300159, https://openalex.org/W2527940101, https://openalex.org/W1482399408, https://openalex.org/W1922851888, https://openalex.org/W3107697994, https://openalex.org/W2898939701, https://openalex.org/W2019364264, https://openalex.org/W2290516599, https://openalex.org/W2128624067, https://openalex.org/W2188923992 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2022 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.6084/m9.figshare.12448871 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | dataset |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.6084/m9.figshare.12448871 |
| primary_location.id | doi:10.6084/m9.figshare.12448871 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | dataset |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.6084/m9.figshare.12448871 |
| publication_date | 2021-01-01 |
| publication_year | 2021 |
| referenced_works | https://openalex.org/W3144951960, https://openalex.org/W1977792067, https://openalex.org/W3125855424, https://openalex.org/W2971235269, https://openalex.org/W1580334350 |
| referenced_works_count | 5 |
| abstract_inverted_index.A | 71 |
| abstract_inverted_index.a | 32, 64, 81, 91, 124, 136 |
| abstract_inverted_index.We | 89, 129, 181, 231 |
| abstract_inverted_index.an | 211, 251 |
| abstract_inverted_index.as | 255 |
| abstract_inverted_index.at | 77 |
| abstract_inverted_index.be | 44, 86 |
| abstract_inverted_index.by | 154 |
| abstract_inverted_index.in | 15, 225, 265 |
| abstract_inverted_index.is | 29, 51, 120, 145, 236, 257, 260 |
| abstract_inverted_index.no | 168, 263 |
| abstract_inverted_index.of | 13, 36, 68, 113, 126, 135, 139, 219, 240, 271, 274 |
| abstract_inverted_index.on | 11, 63, 80, 156, 193 |
| abstract_inverted_index.or | 262 |
| abstract_inverted_index.to | 6, 30, 202, 215, 238, 268 |
| abstract_inverted_index.we | 197, 209 |
| abstract_inverted_index.Our | 27 |
| abstract_inverted_index.and | 8, 46, 159, 171, 228 |
| abstract_inverted_index.are | 4, 21, 167, 175, 282 |
| abstract_inverted_index.but | 18 |
| abstract_inverted_index.can | 60 |
| abstract_inverted_index.due | 267 |
| abstract_inverted_index.for | 23, 101, 177, 279 |
| abstract_inverted_index.is, | 110 |
| abstract_inverted_index.new | 172 |
| abstract_inverted_index.our | 162, 178 |
| abstract_inverted_index.set | 67 |
| abstract_inverted_index.the | 16, 19, 24, 37, 40, 47, 57, 69, 106, 111, 114, 117, 131, 149, 165, 184, 199, 204, 217, 269 |
| abstract_inverted_index.use | 244 |
| abstract_inverted_index.This | 143, 247 |
| abstract_inverted_index.both | 157, 226 |
| abstract_inverted_index.find | 232 |
| abstract_inverted_index.goal | 28 |
| abstract_inverted_index.loss | 264 |
| abstract_inverted_index.most | 12, 78 |
| abstract_inverted_index.only | 61, 79 |
| abstract_inverted_index.over | 222 |
| abstract_inverted_index.such | 254 |
| abstract_inverted_index.that | 74, 109, 233, 239, 243 |
| abstract_inverted_index.this | 190, 280 |
| abstract_inverted_index.uses | 75 |
| abstract_inverted_index.very | 65 |
| abstract_inverted_index.will | 43, 85 |
| abstract_inverted_index.with | 148 |
| abstract_inverted_index.OSUMC | 200, 220, 256 |
| abstract_inverted_index.Since | 189 |
| abstract_inverted_index.Under | 0, 161 |
| abstract_inverted_index.Using | 105 |
| abstract_inverted_index.class | 125, 138 |
| abstract_inverted_index.level | 195 |
| abstract_inverted_index.novel | 146 |
| abstract_inverted_index.pilot | 83 |
| abstract_inverted_index.shows | 248 |
| abstract_inverted_index.small | 34, 66, 82 |
| abstract_inverted_index.study | 214 |
| abstract_inverted_index.that, | 249 |
| abstract_inverted_index.there | 259 |
| abstract_inverted_index.trace | 112 |
| abstract_inverted_index.under | 97 |
| abstract_inverted_index.used, | 258 |
| abstract_inverted_index.where | 39 |
| abstract_inverted_index.called | 87 |
| abstract_inverted_index.depend | 62 |
| abstract_inverted_index.derive | 183 |
| abstract_inverted_index.entire | 25 |
| abstract_inverted_index.linear | 103 |
| abstract_inverted_index.little | 261 |
| abstract_inverted_index.longer | 169 |
| abstract_inverted_index.result | 144 |
| abstract_inverted_index.sample | 31, 84 |
| abstract_inverted_index.within | 123 |
| abstract_inverted_index.(OSUMC) | 100 |
| abstract_inverted_index.article | 281 |
| abstract_inverted_index.because | 273 |
| abstract_inverted_index.conduct | 210 |
| abstract_inverted_index.dataset | 38 |
| abstract_inverted_index.depends | 192 |
| abstract_inverted_index.further | 182 |
| abstract_inverted_index.general | 137 |
| abstract_inverted_index.measure | 7 |
| abstract_inverted_index.methods | 224 |
| abstract_inverted_index.models. | 104 |
| abstract_inverted_index.online. | 284 |
| abstract_inverted_index.optimal | 95, 206 |
| abstract_inverted_index.portion | 35 |
| abstract_inverted_index.propose | 90, 198 |
| abstract_inverted_index.records | 14 |
| abstract_inverted_index.require | 56 |
| abstract_inverted_index.results | 152 |
| abstract_inverted_index.theory. | 180 |
| abstract_inverted_index.Finally, | 208 |
| abstract_inverted_index.accuracy | 266 |
| abstract_inverted_index.compared | 147 |
| abstract_inverted_index.complete | 245 |
| abstract_inverted_index.dataset, | 17 |
| abstract_inverted_index.dataset. | 26 |
| abstract_inverted_index.existing | 150, 223 |
| abstract_inverted_index.measured | 45 |
| abstract_inverted_index.obtained | 153 |
| abstract_inverted_index.provided | 250 |
| abstract_inverted_index.sampling | 49, 58, 72, 93, 96, 127, 141, 187, 241 |
| abstract_inverted_index.A-optimal | 185 |
| abstract_inverted_index.OSUMC’s | 234 |
| abstract_inverted_index.algorithm | 201, 221, 253 |
| abstract_inverted_index.available | 22, 283 |
| abstract_inverted_index.developed | 176 |
| abstract_inverted_index.efficient | 122, 252 |
| abstract_inverted_index.empirical | 213 |
| abstract_inverted_index.establish | 130 |
| abstract_inverted_index.estimator | 50, 119 |
| abstract_inverted_index.expensive | 5, 41 |
| abstract_inverted_index.initially | 9 |
| abstract_inverted_index.intensive | 212 |
| abstract_inverted_index.materials | 278 |
| abstract_inverted_index.procedure | 73, 94 |
| abstract_inverted_index.responses | 3, 42, 76, 272 |
| abstract_inverted_index.resultant | 48, 118 |
| abstract_inverted_index.sampling. | 207 |
| abstract_inverted_index.variance, | 116 |
| abstract_inverted_index.advantages | 218 |
| abstract_inverted_index.algorithms | 242 |
| abstract_inverted_index.asymptotic | 115, 133, 179 |
| abstract_inverted_index.comparable | 237 |
| abstract_inverted_index.covariates | 20, 158 |
| abstract_inverted_index.criterion, | 108 |
| abstract_inverted_index.efficient. | 53 |
| abstract_inverted_index.framework, | 164 |
| abstract_inverted_index.martingale | 173 |
| abstract_inverted_index.population | 194 |
| abstract_inverted_index.relatively | 33 |
| abstract_inverted_index.responses. | 70, 160, 246 |
| abstract_inverted_index.subsamples | 166 |
| abstract_inverted_index.techniques | 174 |
| abstract_inverted_index.Measurement | 54 |
| abstract_inverted_index.approximate | 203 |
| abstract_inverted_index.conditional | 151 |
| abstract_inverted_index.constraints | 55, 99 |
| abstract_inverted_index.demonstrate | 216 |
| abstract_inverted_index.estimators. | 128, 142 |
| abstract_inverted_index.generalized | 102 |
| abstract_inverted_index.independent | 170 |
| abstract_inverted_index.measurement | 98, 275 |
| abstract_inverted_index.performance | 235 |
| abstract_inverted_index.quantities, | 196 |
| abstract_inverted_index.statistical | 227 |
| abstract_inverted_index.theoretical | 205 |
| abstract_inverted_index.unavailable | 10 |
| abstract_inverted_index.A-optimality | 107 |
| abstract_inverted_index.conditioning | 155 |
| abstract_inverted_index.constraints. | 276 |
| abstract_inverted_index.distribution | 134, 191 |
| abstract_inverted_index.Supplementary | 277 |
| abstract_inverted_index.computational | 229 |
| abstract_inverted_index.distribution. | 188 |
| abstract_inverted_index.perspectives. | 230 |
| abstract_inverted_index.probabilities | 59 |
| abstract_inverted_index.response-free | 92, 140, 186 |
| abstract_inverted_index.statistically | 52, 121 |
| abstract_inverted_index.unconditional | 132, 163 |
| abstract_inverted_index.unavailability | 270 |
| abstract_inverted_index.“measurement | 1 |
| abstract_inverted_index.constraints,” | 2 |
| abstract_inverted_index.“response-free.” | 88 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |