Estimating the true number of principal components under the random design Article Swipe
Principal component analysis (PCA) is frequently employed as a dimension reduction tool when the number of covariates is large. However, the number of principal components to be retained in PCA is typically determined in a researcher-dependent manner. To mitigate the subjectivity in PCA, this paper proposes a data-driven testing procedure to estimate the number of underlying principal components. While existing work such as G'Sell et al. (2016), Taylor et al. (2016) and Choi et al. (2017) discuss similar tests under fixed design, this paper investigates an extension of their framework to a more general econometric setup with the random design. The proposed test is proved to achieve asymptotically exact type 1 error controls under a locally defined null hypothesis, with simulation examples indicating an asymptotic validity of our test.
Related Topics
- Type
- preprint
- Landing Page
- https://doi.org/10.48550/arxiv.2511.10419
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W7105751269
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W7105751269Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2511.10419Digital Object Identifier
- Title
-
Estimating the true number of principal components under the random designWork title
- Type
-
preprintOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-13Full publication date if available
- Authors
-
Matsumura YasuyukiList of authors in order
- Landing page
-
https://doi.org/10.48550/arxiv.2511.10419Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.48550/arxiv.2511.10419Direct OA link when available
- Concepts
-
Mathematics, Principal component analysis, Extension (predicate logic), Dimensionality reduction, Dimension (graph theory), Statistics, Covariate, Random variable, Statistical hypothesis testing, Applied mathematics, Principal (computer security), Reduction (mathematics), Component (thermodynamics), Sufficient dimension reduction, Algorithm, Null (SQL), Work (physics), Mathematical optimization, Null hypothesis, Relation (database), Econometrics, Effective dimension, Feature (linguistics), Asymptotic distributionTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W7105751269 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2511.10419 |
| ids.doi | https://doi.org/10.48550/arxiv.2511.10419 |
| ids.openalex | https://openalex.org/W7105751269 |
| fwci | 0.0 |
| type | preprint |
| title | Estimating the true number of principal components under the random design |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C33923547 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7218084931373596 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[0].display_name | Mathematics |
| concepts[1].id | https://openalex.org/C27438332 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7057698369026184 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q2873 |
| concepts[1].display_name | Principal component analysis |
| concepts[2].id | https://openalex.org/C2778029271 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6007524132728577 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q5421931 |
| concepts[2].display_name | Extension (predicate logic) |
| concepts[3].id | https://openalex.org/C70518039 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5287440419197083 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q16000077 |
| concepts[3].display_name | Dimensionality reduction |
| concepts[4].id | https://openalex.org/C33676613 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5171465873718262 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q13415176 |
| concepts[4].display_name | Dimension (graph theory) |
| concepts[5].id | https://openalex.org/C105795698 |
| concepts[5].level | 1 |
| concepts[5].score | 0.47640013694763184 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[5].display_name | Statistics |
| concepts[6].id | https://openalex.org/C119043178 |
| concepts[6].level | 2 |
| concepts[6].score | 0.47546443343162537 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q320723 |
| concepts[6].display_name | Covariate |
| concepts[7].id | https://openalex.org/C122123141 |
| concepts[7].level | 2 |
| concepts[7].score | 0.45940959453582764 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q176623 |
| concepts[7].display_name | Random variable |
| concepts[8].id | https://openalex.org/C87007009 |
| concepts[8].level | 2 |
| concepts[8].score | 0.3804243206977844 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q210832 |
| concepts[8].display_name | Statistical hypothesis testing |
| concepts[9].id | https://openalex.org/C28826006 |
| concepts[9].level | 1 |
| concepts[9].score | 0.3643846809864044 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q33521 |
| concepts[9].display_name | Applied mathematics |
| concepts[10].id | https://openalex.org/C144559511 |
| concepts[10].level | 2 |
| concepts[10].score | 0.3524273633956909 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q2986279 |
| concepts[10].display_name | Principal (computer security) |
| concepts[11].id | https://openalex.org/C111335779 |
| concepts[11].level | 2 |
| concepts[11].score | 0.3385755121707916 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q3454686 |
| concepts[11].display_name | Reduction (mathematics) |
| concepts[12].id | https://openalex.org/C168167062 |
| concepts[12].level | 2 |
| concepts[12].score | 0.3231130540370941 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q1117970 |
| concepts[12].display_name | Component (thermodynamics) |
| concepts[13].id | https://openalex.org/C27931671 |
| concepts[13].level | 3 |
| concepts[13].score | 0.3205665349960327 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q7634497 |
| concepts[13].display_name | Sufficient dimension reduction |
| concepts[14].id | https://openalex.org/C11413529 |
| concepts[14].level | 1 |
| concepts[14].score | 0.3201236426830292 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q8366 |
| concepts[14].display_name | Algorithm |
| concepts[15].id | https://openalex.org/C203763787 |
| concepts[15].level | 2 |
| concepts[15].score | 0.3143513798713684 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q371029 |
| concepts[15].display_name | Null (SQL) |
| concepts[16].id | https://openalex.org/C18762648 |
| concepts[16].level | 2 |
| concepts[16].score | 0.3059838116168976 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q42213 |
| concepts[16].display_name | Work (physics) |
| concepts[17].id | https://openalex.org/C126255220 |
| concepts[17].level | 1 |
| concepts[17].score | 0.2967955768108368 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q141495 |
| concepts[17].display_name | Mathematical optimization |
| concepts[18].id | https://openalex.org/C191988596 |
| concepts[18].level | 2 |
| concepts[18].score | 0.2893105745315552 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q628374 |
| concepts[18].display_name | Null hypothesis |
| concepts[19].id | https://openalex.org/C25343380 |
| concepts[19].level | 2 |
| concepts[19].score | 0.2874661684036255 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q277521 |
| concepts[19].display_name | Relation (database) |
| concepts[20].id | https://openalex.org/C149782125 |
| concepts[20].level | 1 |
| concepts[20].score | 0.27066805958747864 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q160039 |
| concepts[20].display_name | Econometrics |
| concepts[21].id | https://openalex.org/C115311070 |
| concepts[21].level | 3 |
| concepts[21].score | 0.2607801854610443 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q5347255 |
| concepts[21].display_name | Effective dimension |
| concepts[22].id | https://openalex.org/C2776401178 |
| concepts[22].level | 2 |
| concepts[22].score | 0.2593456506729126 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q12050496 |
| concepts[22].display_name | Feature (linguistics) |
| concepts[23].id | https://openalex.org/C65778772 |
| concepts[23].level | 3 |
| concepts[23].score | 0.25155407190322876 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q12345341 |
| concepts[23].display_name | Asymptotic distribution |
| keywords[0].id | https://openalex.org/keywords/principal-component-analysis |
| keywords[0].score | 0.7057698369026184 |
| keywords[0].display_name | Principal component analysis |
| keywords[1].id | https://openalex.org/keywords/extension |
| keywords[1].score | 0.6007524132728577 |
| keywords[1].display_name | Extension (predicate logic) |
| keywords[2].id | https://openalex.org/keywords/dimensionality-reduction |
| keywords[2].score | 0.5287440419197083 |
| keywords[2].display_name | Dimensionality reduction |
| keywords[3].id | https://openalex.org/keywords/dimension |
| keywords[3].score | 0.5171465873718262 |
| keywords[3].display_name | Dimension (graph theory) |
| keywords[4].id | https://openalex.org/keywords/covariate |
| keywords[4].score | 0.47546443343162537 |
| keywords[4].display_name | Covariate |
| keywords[5].id | https://openalex.org/keywords/random-variable |
| keywords[5].score | 0.45940959453582764 |
| keywords[5].display_name | Random variable |
| keywords[6].id | https://openalex.org/keywords/statistical-hypothesis-testing |
| keywords[6].score | 0.3804243206977844 |
| keywords[6].display_name | Statistical hypothesis testing |
| keywords[7].id | https://openalex.org/keywords/principal |
| keywords[7].score | 0.3524273633956909 |
| keywords[7].display_name | Principal (computer security) |
| language | |
| locations[0].id | doi:10.48550/arxiv.2511.10419 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | article |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.48550/arxiv.2511.10419 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A3171345319 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Matsumura Yasuyuki |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Matsumura, Yasuyuki |
| authorships[0].is_corresponding | True |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.48550/arxiv.2511.10419 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-15T00:00:00 |
| display_name | Estimating the true number of principal components under the random design |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-15T23:16:52.776844 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.48550/arxiv.2511.10419 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | article |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.48550/arxiv.2511.10419 |
| primary_location.id | doi:10.48550/arxiv.2511.10419 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | article |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.48550/arxiv.2511.10419 |
| publication_date | 2025-11-13 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.1 | 110 |
| abstract_inverted_index.a | 8, 34, 46, 91, 114 |
| abstract_inverted_index.To | 37 |
| abstract_inverted_index.an | 85, 123 |
| abstract_inverted_index.as | 7, 62 |
| abstract_inverted_index.be | 26 |
| abstract_inverted_index.et | 64, 68, 73 |
| abstract_inverted_index.in | 28, 33, 41 |
| abstract_inverted_index.is | 4, 17, 30, 103 |
| abstract_inverted_index.of | 15, 22, 54, 87, 126 |
| abstract_inverted_index.to | 25, 50, 90, 105 |
| abstract_inverted_index.PCA | 29 |
| abstract_inverted_index.The | 100 |
| abstract_inverted_index.al. | 65, 69, 74 |
| abstract_inverted_index.and | 71 |
| abstract_inverted_index.our | 127 |
| abstract_inverted_index.the | 13, 20, 39, 52, 97 |
| abstract_inverted_index.Choi | 72 |
| abstract_inverted_index.PCA, | 42 |
| abstract_inverted_index.more | 92 |
| abstract_inverted_index.null | 117 |
| abstract_inverted_index.such | 61 |
| abstract_inverted_index.test | 102 |
| abstract_inverted_index.this | 43, 82 |
| abstract_inverted_index.tool | 11 |
| abstract_inverted_index.type | 109 |
| abstract_inverted_index.when | 12 |
| abstract_inverted_index.with | 96, 119 |
| abstract_inverted_index.work | 60 |
| abstract_inverted_index.(PCA) | 3 |
| abstract_inverted_index.While | 58 |
| abstract_inverted_index.error | 111 |
| abstract_inverted_index.exact | 108 |
| abstract_inverted_index.fixed | 80 |
| abstract_inverted_index.paper | 44, 83 |
| abstract_inverted_index.setup | 95 |
| abstract_inverted_index.test. | 128 |
| abstract_inverted_index.tests | 78 |
| abstract_inverted_index.their | 88 |
| abstract_inverted_index.under | 79, 113 |
| abstract_inverted_index.(2016) | 70 |
| abstract_inverted_index.(2017) | 75 |
| abstract_inverted_index.G'Sell | 63 |
| abstract_inverted_index.Taylor | 67 |
| abstract_inverted_index.large. | 18 |
| abstract_inverted_index.number | 14, 21, 53 |
| abstract_inverted_index.proved | 104 |
| abstract_inverted_index.random | 98 |
| abstract_inverted_index.(2016), | 66 |
| abstract_inverted_index.achieve | 106 |
| abstract_inverted_index.defined | 116 |
| abstract_inverted_index.design, | 81 |
| abstract_inverted_index.design. | 99 |
| abstract_inverted_index.discuss | 76 |
| abstract_inverted_index.general | 93 |
| abstract_inverted_index.locally | 115 |
| abstract_inverted_index.manner. | 36 |
| abstract_inverted_index.similar | 77 |
| abstract_inverted_index.testing | 48 |
| abstract_inverted_index.However, | 19 |
| abstract_inverted_index.analysis | 2 |
| abstract_inverted_index.controls | 112 |
| abstract_inverted_index.employed | 6 |
| abstract_inverted_index.estimate | 51 |
| abstract_inverted_index.examples | 121 |
| abstract_inverted_index.existing | 59 |
| abstract_inverted_index.mitigate | 38 |
| abstract_inverted_index.proposed | 101 |
| abstract_inverted_index.proposes | 45 |
| abstract_inverted_index.retained | 27 |
| abstract_inverted_index.validity | 125 |
| abstract_inverted_index.Principal | 0 |
| abstract_inverted_index.component | 1 |
| abstract_inverted_index.dimension | 9 |
| abstract_inverted_index.extension | 86 |
| abstract_inverted_index.framework | 89 |
| abstract_inverted_index.principal | 23, 56 |
| abstract_inverted_index.procedure | 49 |
| abstract_inverted_index.reduction | 10 |
| abstract_inverted_index.typically | 31 |
| abstract_inverted_index.asymptotic | 124 |
| abstract_inverted_index.components | 24 |
| abstract_inverted_index.covariates | 16 |
| abstract_inverted_index.determined | 32 |
| abstract_inverted_index.frequently | 5 |
| abstract_inverted_index.indicating | 122 |
| abstract_inverted_index.simulation | 120 |
| abstract_inverted_index.underlying | 55 |
| abstract_inverted_index.components. | 57 |
| abstract_inverted_index.data-driven | 47 |
| abstract_inverted_index.econometric | 94 |
| abstract_inverted_index.hypothesis, | 118 |
| abstract_inverted_index.investigates | 84 |
| abstract_inverted_index.subjectivity | 40 |
| abstract_inverted_index.asymptotically | 107 |
| abstract_inverted_index.researcher-dependent | 35 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 1 |
| citation_normalized_percentile |