Enhanced variable selection for boosting sparser and less complex models in distributional copula regression Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2406.03900
Structured additive distributional copula regression allows to model the joint distribution of multivariate outcomes by relating all distribution parameters to covariates. Estimation via statistical boosting enables accounting for high-dimensional data and incorporating data-driven variable selection, both of which are useful given the complexity of the model class. However, as known from univariate (distributional) regression, the standard boosting algorithm tends to select too many variables with minor importance, particularly in settings with large sample sizes, leading to complex models with difficult interpretation. To counteract this behavior and to avoid selecting base-learners with only a negligible impact, we combined the ideas of probing, stability selection and a new deselection approach with statistical boosting for distributional copula regression. In a simulation study and an application to the joint modelling of weight and length of newborns, we found that all proposed methods enhance variable selection by reducing the number of false positives. However, only stability selection and the deselection approach yielded similar predictive performance to classical boosting. Finally, the deselection approach is better scalable to larger datasets and led to a competitive predictive performance, which we further illustrated in a genomic cohort study from the UK Biobank by modelling the joint genetic predisposition for two phenotypes.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2406.03900
- https://arxiv.org/pdf/2406.03900
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4399454902
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4399454902Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2406.03900Digital Object Identifier
- Title
-
Enhanced variable selection for boosting sparser and less complex models in distributional copula regressionWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-06Full publication date if available
- Authors
-
Annika Strömer, Nadja Klein, Christian Staerk, Florian Faschingbauer, Hannah Klinkhammer, Andreas MayrList of authors in order
- Landing page
-
https://arxiv.org/abs/2406.03900Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2406.03900Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2406.03900Direct OA link when available
- Concepts
-
Copula (linguistics), Boosting (machine learning), Feature selection, Econometrics, Regression, Model selection, Computer science, Statistics, Artificial intelligence, Machine learning, MathematicsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4399454902 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2406.03900 |
| ids.doi | https://doi.org/10.48550/arxiv.2406.03900 |
| ids.openalex | https://openalex.org/W4399454902 |
| fwci | |
| type | preprint |
| title | Enhanced variable selection for boosting sparser and less complex models in distributional copula regression |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10876 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9246000051498413 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2207 |
| topics[0].subfield.display_name | Control and Systems Engineering |
| topics[0].display_name | Fault Detection and Control Systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C17618745 |
| concepts[0].level | 2 |
| concepts[0].score | 0.832364559173584 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q207509 |
| concepts[0].display_name | Copula (linguistics) |
| concepts[1].id | https://openalex.org/C46686674 |
| concepts[1].level | 2 |
| concepts[1].score | 0.6897190809249878 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q466303 |
| concepts[1].display_name | Boosting (machine learning) |
| concepts[2].id | https://openalex.org/C148483581 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5971605777740479 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q446488 |
| concepts[2].display_name | Feature selection |
| concepts[3].id | https://openalex.org/C149782125 |
| concepts[3].level | 1 |
| concepts[3].score | 0.5606313943862915 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q160039 |
| concepts[3].display_name | Econometrics |
| concepts[4].id | https://openalex.org/C83546350 |
| concepts[4].level | 2 |
| concepts[4].score | 0.4837932288646698 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1139051 |
| concepts[4].display_name | Regression |
| concepts[5].id | https://openalex.org/C93959086 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4208650290966034 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q6888345 |
| concepts[5].display_name | Model selection |
| concepts[6].id | https://openalex.org/C41008148 |
| concepts[6].level | 0 |
| concepts[6].score | 0.4115992784500122 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[6].display_name | Computer science |
| concepts[7].id | https://openalex.org/C105795698 |
| concepts[7].level | 1 |
| concepts[7].score | 0.39096304774284363 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q12483 |
| concepts[7].display_name | Statistics |
| concepts[8].id | https://openalex.org/C154945302 |
| concepts[8].level | 1 |
| concepts[8].score | 0.37379008531570435 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[8].display_name | Artificial intelligence |
| concepts[9].id | https://openalex.org/C119857082 |
| concepts[9].level | 1 |
| concepts[9].score | 0.3728068470954895 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[9].display_name | Machine learning |
| concepts[10].id | https://openalex.org/C33923547 |
| concepts[10].level | 0 |
| concepts[10].score | 0.34420618414878845 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[10].display_name | Mathematics |
| keywords[0].id | https://openalex.org/keywords/copula |
| keywords[0].score | 0.832364559173584 |
| keywords[0].display_name | Copula (linguistics) |
| keywords[1].id | https://openalex.org/keywords/boosting |
| keywords[1].score | 0.6897190809249878 |
| keywords[1].display_name | Boosting (machine learning) |
| keywords[2].id | https://openalex.org/keywords/feature-selection |
| keywords[2].score | 0.5971605777740479 |
| keywords[2].display_name | Feature selection |
| keywords[3].id | https://openalex.org/keywords/econometrics |
| keywords[3].score | 0.5606313943862915 |
| keywords[3].display_name | Econometrics |
| keywords[4].id | https://openalex.org/keywords/regression |
| keywords[4].score | 0.4837932288646698 |
| keywords[4].display_name | Regression |
| keywords[5].id | https://openalex.org/keywords/model-selection |
| keywords[5].score | 0.4208650290966034 |
| keywords[5].display_name | Model selection |
| keywords[6].id | https://openalex.org/keywords/computer-science |
| keywords[6].score | 0.4115992784500122 |
| keywords[6].display_name | Computer science |
| keywords[7].id | https://openalex.org/keywords/statistics |
| keywords[7].score | 0.39096304774284363 |
| keywords[7].display_name | Statistics |
| keywords[8].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[8].score | 0.37379008531570435 |
| keywords[8].display_name | Artificial intelligence |
| keywords[9].id | https://openalex.org/keywords/machine-learning |
| keywords[9].score | 0.3728068470954895 |
| keywords[9].display_name | Machine learning |
| keywords[10].id | https://openalex.org/keywords/mathematics |
| keywords[10].score | 0.34420618414878845 |
| keywords[10].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2406.03900 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2406.03900 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2406.03900 |
| locations[1].id | doi:10.48550/arxiv.2406.03900 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2406.03900 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5016578580 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-1284-3318 |
| authorships[0].author.display_name | Annika Strömer |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Strömer, Annika |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5009563869 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-5072-5347 |
| authorships[1].author.display_name | Nadja Klein |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Klein, Nadja |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5067210732 |
| authorships[2].author.orcid | https://orcid.org/0000-0003-0526-0189 |
| authorships[2].author.display_name | Christian Staerk |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Staerk, Christian |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5010909333 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Florian Faschingbauer |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Faschingbauer, Florian |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5021345672 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-3752-1275 |
| authorships[4].author.display_name | Hannah Klinkhammer |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Klinkhammer, Hannah |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5085125500 |
| authorships[5].author.orcid | https://orcid.org/0000-0001-7106-9732 |
| authorships[5].author.display_name | Andreas Mayr |
| authorships[5].author_position | last |
| authorships[5].raw_author_name | Mayr, Andreas |
| authorships[5].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2406.03900 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-06-08T00:00:00 |
| display_name | Enhanced variable selection for boosting sparser and less complex models in distributional copula regression |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10876 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9246000051498413 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2207 |
| primary_topic.subfield.display_name | Control and Systems Engineering |
| primary_topic.display_name | Fault Detection and Control Systems |
| related_works | https://openalex.org/W2125652721, https://openalex.org/W1540371141, https://openalex.org/W4231274751, https://openalex.org/W1549363203, https://openalex.org/W2154063878, https://openalex.org/W2556012038, https://openalex.org/W1489772951, https://openalex.org/W1538046993, https://openalex.org/W4239293476, https://openalex.org/W2145188897 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2406.03900 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2406.03900 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2406.03900 |
| primary_location.id | pmh:oai:arXiv.org:2406.03900 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2406.03900 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2406.03900 |
| publication_date | 2024-06-06 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 92, 104, 116, 176, 185 |
| abstract_inverted_index.In | 115 |
| abstract_inverted_index.To | 81 |
| abstract_inverted_index.UK | 191 |
| abstract_inverted_index.an | 120 |
| abstract_inverted_index.as | 48 |
| abstract_inverted_index.by | 14, 141, 193 |
| abstract_inverted_index.in | 68, 184 |
| abstract_inverted_index.is | 167 |
| abstract_inverted_index.of | 11, 36, 43, 99, 126, 130, 145 |
| abstract_inverted_index.to | 6, 19, 59, 75, 86, 122, 160, 170, 175 |
| abstract_inverted_index.we | 95, 132, 181 |
| abstract_inverted_index.all | 16, 135 |
| abstract_inverted_index.and | 30, 85, 103, 119, 128, 152, 173 |
| abstract_inverted_index.are | 38 |
| abstract_inverted_index.for | 27, 111, 199 |
| abstract_inverted_index.led | 174 |
| abstract_inverted_index.new | 105 |
| abstract_inverted_index.the | 8, 41, 44, 54, 97, 123, 143, 153, 164, 190, 195 |
| abstract_inverted_index.too | 61 |
| abstract_inverted_index.two | 200 |
| abstract_inverted_index.via | 22 |
| abstract_inverted_index.both | 35 |
| abstract_inverted_index.data | 29 |
| abstract_inverted_index.from | 50, 189 |
| abstract_inverted_index.many | 62 |
| abstract_inverted_index.only | 91, 149 |
| abstract_inverted_index.that | 134 |
| abstract_inverted_index.this | 83 |
| abstract_inverted_index.with | 64, 70, 78, 90, 108 |
| abstract_inverted_index.avoid | 87 |
| abstract_inverted_index.false | 146 |
| abstract_inverted_index.found | 133 |
| abstract_inverted_index.given | 40 |
| abstract_inverted_index.ideas | 98 |
| abstract_inverted_index.joint | 9, 124, 196 |
| abstract_inverted_index.known | 49 |
| abstract_inverted_index.large | 71 |
| abstract_inverted_index.minor | 65 |
| abstract_inverted_index.model | 7, 45 |
| abstract_inverted_index.study | 118, 188 |
| abstract_inverted_index.tends | 58 |
| abstract_inverted_index.which | 37, 180 |
| abstract_inverted_index.allows | 5 |
| abstract_inverted_index.better | 168 |
| abstract_inverted_index.class. | 46 |
| abstract_inverted_index.cohort | 187 |
| abstract_inverted_index.copula | 3, 113 |
| abstract_inverted_index.larger | 171 |
| abstract_inverted_index.length | 129 |
| abstract_inverted_index.models | 77 |
| abstract_inverted_index.number | 144 |
| abstract_inverted_index.sample | 72 |
| abstract_inverted_index.select | 60 |
| abstract_inverted_index.sizes, | 73 |
| abstract_inverted_index.useful | 39 |
| abstract_inverted_index.weight | 127 |
| abstract_inverted_index.Biobank | 192 |
| abstract_inverted_index.complex | 76 |
| abstract_inverted_index.enables | 25 |
| abstract_inverted_index.enhance | 138 |
| abstract_inverted_index.further | 182 |
| abstract_inverted_index.genetic | 197 |
| abstract_inverted_index.genomic | 186 |
| abstract_inverted_index.impact, | 94 |
| abstract_inverted_index.leading | 74 |
| abstract_inverted_index.methods | 137 |
| abstract_inverted_index.similar | 157 |
| abstract_inverted_index.yielded | 156 |
| abstract_inverted_index.Finally, | 163 |
| abstract_inverted_index.However, | 47, 148 |
| abstract_inverted_index.additive | 1 |
| abstract_inverted_index.approach | 107, 155, 166 |
| abstract_inverted_index.behavior | 84 |
| abstract_inverted_index.boosting | 24, 56, 110 |
| abstract_inverted_index.combined | 96 |
| abstract_inverted_index.datasets | 172 |
| abstract_inverted_index.outcomes | 13 |
| abstract_inverted_index.probing, | 100 |
| abstract_inverted_index.proposed | 136 |
| abstract_inverted_index.reducing | 142 |
| abstract_inverted_index.relating | 15 |
| abstract_inverted_index.scalable | 169 |
| abstract_inverted_index.settings | 69 |
| abstract_inverted_index.standard | 55 |
| abstract_inverted_index.variable | 33, 139 |
| abstract_inverted_index.algorithm | 57 |
| abstract_inverted_index.boosting. | 162 |
| abstract_inverted_index.classical | 161 |
| abstract_inverted_index.difficult | 79 |
| abstract_inverted_index.modelling | 125, 194 |
| abstract_inverted_index.newborns, | 131 |
| abstract_inverted_index.selecting | 88 |
| abstract_inverted_index.selection | 102, 140, 151 |
| abstract_inverted_index.stability | 101, 150 |
| abstract_inverted_index.variables | 63 |
| abstract_inverted_index.Estimation | 21 |
| abstract_inverted_index.Structured | 0 |
| abstract_inverted_index.accounting | 26 |
| abstract_inverted_index.complexity | 42 |
| abstract_inverted_index.counteract | 82 |
| abstract_inverted_index.negligible | 93 |
| abstract_inverted_index.parameters | 18 |
| abstract_inverted_index.positives. | 147 |
| abstract_inverted_index.predictive | 158, 178 |
| abstract_inverted_index.regression | 4 |
| abstract_inverted_index.selection, | 34 |
| abstract_inverted_index.simulation | 117 |
| abstract_inverted_index.univariate | 51 |
| abstract_inverted_index.application | 121 |
| abstract_inverted_index.competitive | 177 |
| abstract_inverted_index.covariates. | 20 |
| abstract_inverted_index.data-driven | 32 |
| abstract_inverted_index.deselection | 106, 154, 165 |
| abstract_inverted_index.illustrated | 183 |
| abstract_inverted_index.importance, | 66 |
| abstract_inverted_index.performance | 159 |
| abstract_inverted_index.phenotypes. | 201 |
| abstract_inverted_index.regression, | 53 |
| abstract_inverted_index.regression. | 114 |
| abstract_inverted_index.statistical | 23, 109 |
| abstract_inverted_index.distribution | 10, 17 |
| abstract_inverted_index.multivariate | 12 |
| abstract_inverted_index.particularly | 67 |
| abstract_inverted_index.performance, | 179 |
| abstract_inverted_index.base-learners | 89 |
| abstract_inverted_index.incorporating | 31 |
| abstract_inverted_index.distributional | 2, 112 |
| abstract_inverted_index.predisposition | 198 |
| abstract_inverted_index.interpretation. | 80 |
| abstract_inverted_index.(distributional) | 52 |
| abstract_inverted_index.high-dimensional | 28 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 6 |
| citation_normalized_percentile |