Improving prediction models by incorporating external data with weights based on similarity Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2405.07631
In clinical settings, we often face the challenge of building prediction models based on small observational data sets. For example, such a data set might be from a medical center in a multi-center study. Differences between centers might be large, thus requiring specific models based on the data set from the target center. Still, we want to borrow information from the external centers, to deal with small sample sizes. There are approaches that either assign weights to each external data set or each external observation. To incorporate information on differences between data sets and observations, we propose an approach that combines both into weights that can be incorporated into a likelihood for fitting regression models. Specifically, we suggest weights at the data set level that incorporate information on how well the models that provide the observation weights distinguish between data sets. Technically, this takes the form of inverse probability weighting. We explore different scenarios where covariates and outcomes differ among data sets, informing our simulation design for method evaluation. The concept of effective sample size is used for understanding the effectiveness of our subgroup modeling approach. We demonstrate our approach through a clinical application, predicting applied radiotherapy doses for cancer patients. Generally, the proposed approach provides improved prediction performance when external data sets are similar. We thus provide a method for quantifying similarity of external data sets to the target data set and use this similarity to include external observations for improving performance in a target data set prediction modeling task with small data.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2405.07631
- https://arxiv.org/pdf/2405.07631
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4396913872
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4396913872Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2405.07631Digital Object Identifier
- Title
-
Improving prediction models by incorporating external data with weights based on similarityWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-05-13Full publication date if available
- Authors
-
Max Behrens, Maryam Farhadizadeh, Angelika Rohde, Alexander Rühle, Nils H. Nicolay, Harald Binder, Daniela ZöllerList of authors in order
- Landing page
-
https://arxiv.org/abs/2405.07631Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2405.07631Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2405.07631Direct OA link when available
- Concepts
-
Similarity (geometry), Computer science, Data mining, Artificial intelligence, Image (mathematics)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4396913872 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2405.07631 |
| ids.doi | https://doi.org/10.48550/arxiv.2405.07631 |
| ids.openalex | https://openalex.org/W4396913872 |
| fwci | |
| type | preprint |
| title | Improving prediction models by incorporating external data with weights based on similarity |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10320 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.2328999936580658 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Neural Networks and Applications |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C103278499 |
| concepts[0].level | 3 |
| concepts[0].score | 0.6761802434921265 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q254465 |
| concepts[0].display_name | Similarity (geometry) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.5098130106925964 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C124101348 |
| concepts[2].level | 1 |
| concepts[2].score | 0.4298703670501709 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[2].display_name | Data mining |
| concepts[3].id | https://openalex.org/C154945302 |
| concepts[3].level | 1 |
| concepts[3].score | 0.37119606137275696 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[3].display_name | Artificial intelligence |
| concepts[4].id | https://openalex.org/C115961682 |
| concepts[4].level | 2 |
| concepts[4].score | 0.0 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[4].display_name | Image (mathematics) |
| keywords[0].id | https://openalex.org/keywords/similarity |
| keywords[0].score | 0.6761802434921265 |
| keywords[0].display_name | Similarity (geometry) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.5098130106925964 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/data-mining |
| keywords[2].score | 0.4298703670501709 |
| keywords[2].display_name | Data mining |
| keywords[3].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[3].score | 0.37119606137275696 |
| keywords[3].display_name | Artificial intelligence |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2405.07631 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2405.07631 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2405.07631 |
| locations[1].id | doi:10.48550/arxiv.2405.07631 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2405.07631 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5039306067 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-4185-4702 |
| authorships[0].author.display_name | Max Behrens |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Behrens, Max |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5098623209 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Maryam Farhadizadeh |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Farhadizadeh, Maryam |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5109225224 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Angelika Rohde |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Rohde, Angelika |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5047091127 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2022-897X |
| authorships[3].author.display_name | Alexander Rühle |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Rühle, Alexander |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5090066209 |
| authorships[4].author.orcid | https://orcid.org/0000-0003-2550-1410 |
| authorships[4].author.display_name | Nils H. Nicolay |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Nicolay, Nils H. |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5011534196 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-5666-8662 |
| authorships[5].author.display_name | Harald Binder |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Binder, Harald |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5008241596 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-9929-7403 |
| authorships[6].author.display_name | Daniela Zöller |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Zöller, Daniela |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2405.07631 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-05-15T00:00:00 |
| display_name | Improving prediction models by incorporating external data with weights based on similarity |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10320 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.2328999936580658 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Neural Networks and Applications |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052, https://openalex.org/W2382290278, https://openalex.org/W4395014643 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2405.07631 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2405.07631 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2405.07631 |
| primary_location.id | pmh:oai:arXiv.org:2405.07631 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2405.07631 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2405.07631 |
| publication_date | 2024-05-13 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 21, 27, 31, 109, 191, 218, 244 |
| abstract_inverted_index.In | 0 |
| abstract_inverted_index.To | 85 |
| abstract_inverted_index.We | 150, 186, 215 |
| abstract_inverted_index.an | 97 |
| abstract_inverted_index.at | 119 |
| abstract_inverted_index.be | 25, 38, 106 |
| abstract_inverted_index.in | 30, 243 |
| abstract_inverted_index.is | 175 |
| abstract_inverted_index.of | 8, 146, 171, 181, 223 |
| abstract_inverted_index.on | 13, 45, 88, 127 |
| abstract_inverted_index.or | 81 |
| abstract_inverted_index.to | 56, 63, 76, 227, 236 |
| abstract_inverted_index.we | 3, 54, 95, 116 |
| abstract_inverted_index.For | 18 |
| abstract_inverted_index.The | 169 |
| abstract_inverted_index.and | 93, 156, 232 |
| abstract_inverted_index.are | 70, 213 |
| abstract_inverted_index.can | 105 |
| abstract_inverted_index.for | 111, 166, 177, 198, 220, 240 |
| abstract_inverted_index.how | 128 |
| abstract_inverted_index.our | 163, 182, 188 |
| abstract_inverted_index.set | 23, 48, 80, 122, 231, 247 |
| abstract_inverted_index.the | 6, 46, 50, 60, 120, 130, 134, 144, 179, 202, 228 |
| abstract_inverted_index.use | 233 |
| abstract_inverted_index.both | 101 |
| abstract_inverted_index.data | 16, 22, 47, 79, 91, 121, 139, 160, 211, 225, 230, 246 |
| abstract_inverted_index.deal | 64 |
| abstract_inverted_index.each | 77, 82 |
| abstract_inverted_index.face | 5 |
| abstract_inverted_index.form | 145 |
| abstract_inverted_index.from | 26, 49, 59 |
| abstract_inverted_index.into | 102, 108 |
| abstract_inverted_index.sets | 92, 212, 226 |
| abstract_inverted_index.size | 174 |
| abstract_inverted_index.such | 20 |
| abstract_inverted_index.task | 250 |
| abstract_inverted_index.that | 72, 99, 104, 124, 132 |
| abstract_inverted_index.this | 142, 234 |
| abstract_inverted_index.thus | 40, 216 |
| abstract_inverted_index.used | 176 |
| abstract_inverted_index.want | 55 |
| abstract_inverted_index.well | 129 |
| abstract_inverted_index.when | 209 |
| abstract_inverted_index.with | 65, 251 |
| abstract_inverted_index.There | 69 |
| abstract_inverted_index.among | 159 |
| abstract_inverted_index.based | 12, 44 |
| abstract_inverted_index.data. | 253 |
| abstract_inverted_index.doses | 197 |
| abstract_inverted_index.level | 123 |
| abstract_inverted_index.might | 24, 37 |
| abstract_inverted_index.often | 4 |
| abstract_inverted_index.sets, | 161 |
| abstract_inverted_index.sets. | 17, 140 |
| abstract_inverted_index.small | 14, 66, 252 |
| abstract_inverted_index.takes | 143 |
| abstract_inverted_index.where | 154 |
| abstract_inverted_index.Still, | 53 |
| abstract_inverted_index.assign | 74 |
| abstract_inverted_index.borrow | 57 |
| abstract_inverted_index.cancer | 199 |
| abstract_inverted_index.center | 29 |
| abstract_inverted_index.design | 165 |
| abstract_inverted_index.differ | 158 |
| abstract_inverted_index.either | 73 |
| abstract_inverted_index.large, | 39 |
| abstract_inverted_index.method | 167, 219 |
| abstract_inverted_index.models | 11, 43, 131 |
| abstract_inverted_index.sample | 67, 173 |
| abstract_inverted_index.sizes. | 68 |
| abstract_inverted_index.study. | 33 |
| abstract_inverted_index.target | 51, 229, 245 |
| abstract_inverted_index.applied | 195 |
| abstract_inverted_index.between | 35, 90, 138 |
| abstract_inverted_index.center. | 52 |
| abstract_inverted_index.centers | 36 |
| abstract_inverted_index.concept | 170 |
| abstract_inverted_index.explore | 151 |
| abstract_inverted_index.fitting | 112 |
| abstract_inverted_index.include | 237 |
| abstract_inverted_index.inverse | 147 |
| abstract_inverted_index.medical | 28 |
| abstract_inverted_index.models. | 114 |
| abstract_inverted_index.propose | 96 |
| abstract_inverted_index.provide | 133, 217 |
| abstract_inverted_index.suggest | 117 |
| abstract_inverted_index.through | 190 |
| abstract_inverted_index.weights | 75, 103, 118, 136 |
| abstract_inverted_index.approach | 98, 189, 204 |
| abstract_inverted_index.building | 9 |
| abstract_inverted_index.centers, | 62 |
| abstract_inverted_index.clinical | 1, 192 |
| abstract_inverted_index.combines | 100 |
| abstract_inverted_index.example, | 19 |
| abstract_inverted_index.external | 61, 78, 83, 210, 224, 238 |
| abstract_inverted_index.improved | 206 |
| abstract_inverted_index.modeling | 184, 249 |
| abstract_inverted_index.outcomes | 157 |
| abstract_inverted_index.proposed | 203 |
| abstract_inverted_index.provides | 205 |
| abstract_inverted_index.similar. | 214 |
| abstract_inverted_index.specific | 42 |
| abstract_inverted_index.subgroup | 183 |
| abstract_inverted_index.approach. | 185 |
| abstract_inverted_index.challenge | 7 |
| abstract_inverted_index.different | 152 |
| abstract_inverted_index.effective | 172 |
| abstract_inverted_index.improving | 241 |
| abstract_inverted_index.informing | 162 |
| abstract_inverted_index.patients. | 200 |
| abstract_inverted_index.requiring | 41 |
| abstract_inverted_index.scenarios | 153 |
| abstract_inverted_index.settings, | 2 |
| abstract_inverted_index.Generally, | 201 |
| abstract_inverted_index.approaches | 71 |
| abstract_inverted_index.covariates | 155 |
| abstract_inverted_index.likelihood | 110 |
| abstract_inverted_index.predicting | 194 |
| abstract_inverted_index.prediction | 10, 207, 248 |
| abstract_inverted_index.regression | 113 |
| abstract_inverted_index.similarity | 222, 235 |
| abstract_inverted_index.simulation | 164 |
| abstract_inverted_index.weighting. | 149 |
| abstract_inverted_index.Differences | 34 |
| abstract_inverted_index.demonstrate | 187 |
| abstract_inverted_index.differences | 89 |
| abstract_inverted_index.distinguish | 137 |
| abstract_inverted_index.evaluation. | 168 |
| abstract_inverted_index.incorporate | 86, 125 |
| abstract_inverted_index.information | 58, 87, 126 |
| abstract_inverted_index.observation | 135 |
| abstract_inverted_index.performance | 208, 242 |
| abstract_inverted_index.probability | 148 |
| abstract_inverted_index.quantifying | 221 |
| abstract_inverted_index.Technically, | 141 |
| abstract_inverted_index.application, | 193 |
| abstract_inverted_index.incorporated | 107 |
| abstract_inverted_index.multi-center | 32 |
| abstract_inverted_index.observation. | 84 |
| abstract_inverted_index.observations | 239 |
| abstract_inverted_index.radiotherapy | 196 |
| abstract_inverted_index.Specifically, | 115 |
| abstract_inverted_index.effectiveness | 180 |
| abstract_inverted_index.observational | 15 |
| abstract_inverted_index.observations, | 94 |
| abstract_inverted_index.understanding | 178 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |