High-dimensional Biomarker Identification for Interpretable Disease Prediction via Machine Learning Models Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.17615/akpc-2t51
MOTIVATION: Omics features, often measured by high-throughput technologies, combined with clinical features, significantly impact the understanding of many complex human diseases. Integrating key omics biomarkers with clinical risk factors is essential for elucidating disease mechanisms, advancing early diagnosis, and enhancing precision medicine. However, the high dimensionality and intricate associations between disease outcomes and omics profiles present substantial analytical challenges. RESULTS: We propose a High-dimensional Feature Importance Test (HiFIT) framework to address these challenges. Specifically, we develop an ensemble data-driven biomarker identification tool, Hybrid Feature Screening (HFS), to construct a candidate feature set for downstream machine learning models. The pre-screened candidate features from HFS are further refined using a computationally efficient permutation-based feature importance test employing machine learning methods to flexibly model the potential complex associations between disease outcomes and molecular biomarkers. Through extensive numerical simulation studies and practical applications to microbiome-associated weight changes following bariatric surgery, as well as the examination of gene-expression-associated kidney pan-cancer survival data, we demonstrate HiFIT's superior performance in both outcome prediction and feature importance identification. AVAILABILITY AND IMPLEMENTATION: An R package implementing the HiFIT algorithm is available on GitHub (https://github.com/BZou-lab/HiFIT). SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.17615/akpc-2t51
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415201140
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415201140Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.17615/akpc-2t51Digital Object Identifier
- Title
-
High-dimensional Biomarker Identification for Interpretable Disease Prediction via Machine Learning ModelsWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-05-08Full publication date if available
- Authors
-
Yifan Dai, Di Wu, Ian M. Carroll, Fei Zou, Baiming ZouList of authors in order
- Landing page
-
https://doi.org/10.17615/akpc-2t51Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.17615/akpc-2t51Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415201140 |
|---|---|
| doi | https://doi.org/10.17615/akpc-2t51 |
| ids.doi | https://doi.org/10.17615/akpc-2t51 |
| ids.openalex | https://openalex.org/W4415201140 |
| fwci | 0.0 |
| type | article |
| title | High-dimensional Biomarker Identification for Interpretable Disease Prediction via Machine Learning Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10862 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.5475999712944031 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | AI in cancer detection |
| topics[1].id | https://openalex.org/T11396 |
| topics[1].field.id | https://openalex.org/fields/36 |
| topics[1].field.display_name | Health Professions |
| topics[1].score | 0.5149999856948853 |
| topics[1].domain.id | https://openalex.org/domains/4 |
| topics[1].domain.display_name | Health Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/3605 |
| topics[1].subfield.display_name | Health Information Management |
| topics[1].display_name | Artificial Intelligence in Healthcare |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | doi:10.17615/akpc-2t51 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S7407051488 |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | UNC Libraries |
| locations[0].source.host_organization | |
| locations[0].source.host_organization_name | |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | article-journal |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.17615/akpc-2t51 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5102524107 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-7897-5228 |
| authorships[0].author.display_name | Yifan Dai |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Dai, Yifan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101454129 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-8331-2357 |
| authorships[1].author.display_name | Di Wu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Wu, Di |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5042869525 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8615-5086 |
| authorships[2].author.display_name | Ian M. Carroll |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Carroll, Ian |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5100782494 |
| authorships[3].author.orcid | https://orcid.org/0000-0002-6637-3593 |
| authorships[3].author.display_name | Fei Zou |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Zou, Fei |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5066472367 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7879-9460 |
| authorships[4].author.display_name | Baiming Zou |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Zou, Baiming |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.17615/akpc-2t51 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-15T00:00:00 |
| display_name | High-dimensional Biomarker Identification for Interpretable Disease Prediction via Machine Learning Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10862 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.5475999712944031 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | AI in cancer detection |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.17615/akpc-2t51 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S7407051488 |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | UNC Libraries |
| best_oa_location.source.host_organization | |
| best_oa_location.source.host_organization_name | |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | article-journal |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.17615/akpc-2t51 |
| primary_location.id | doi:10.17615/akpc-2t51 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S7407051488 |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | UNC Libraries |
| primary_location.source.host_organization | |
| primary_location.source.host_organization_name | |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | article-journal |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.17615/akpc-2t51 |
| publication_date | 2025-05-08 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.R | 174 |
| abstract_inverted_index.a | 62, 88, 107 |
| abstract_inverted_index.An | 173 |
| abstract_inverted_index.We | 60 |
| abstract_inverted_index.an | 76 |
| abstract_inverted_index.as | 146, 148 |
| abstract_inverted_index.at | 191 |
| abstract_inverted_index.by | 5 |
| abstract_inverted_index.in | 162 |
| abstract_inverted_index.is | 29, 180 |
| abstract_inverted_index.of | 16, 151 |
| abstract_inverted_index.on | 182 |
| abstract_inverted_index.to | 69, 86, 118, 139 |
| abstract_inverted_index.we | 74, 157 |
| abstract_inverted_index.AND | 171 |
| abstract_inverted_index.HFS | 102 |
| abstract_inverted_index.The | 97 |
| abstract_inverted_index.and | 38, 46, 52, 128, 136, 166 |
| abstract_inverted_index.are | 103, 189 |
| abstract_inverted_index.for | 31, 92 |
| abstract_inverted_index.key | 22 |
| abstract_inverted_index.set | 91 |
| abstract_inverted_index.the | 14, 43, 121, 149, 177 |
| abstract_inverted_index.Test | 66 |
| abstract_inverted_index.both | 163 |
| abstract_inverted_index.from | 101 |
| abstract_inverted_index.high | 44 |
| abstract_inverted_index.many | 17 |
| abstract_inverted_index.risk | 27 |
| abstract_inverted_index.test | 113 |
| abstract_inverted_index.well | 147 |
| abstract_inverted_index.with | 9, 25 |
| abstract_inverted_index.HiFIT | 178 |
| abstract_inverted_index.Omics | 1 |
| abstract_inverted_index.data, | 156 |
| abstract_inverted_index.early | 36 |
| abstract_inverted_index.human | 19 |
| abstract_inverted_index.model | 120 |
| abstract_inverted_index.often | 3 |
| abstract_inverted_index.omics | 23, 53 |
| abstract_inverted_index.these | 71 |
| abstract_inverted_index.tool, | 81 |
| abstract_inverted_index.using | 106 |
| abstract_inverted_index.(HFS), | 85 |
| abstract_inverted_index.GitHub | 183 |
| abstract_inverted_index.Hybrid | 82 |
| abstract_inverted_index.impact | 13 |
| abstract_inverted_index.kidney | 153 |
| abstract_inverted_index.weight | 141 |
| abstract_inverted_index.(HiFIT) | 67 |
| abstract_inverted_index.Feature | 64, 83 |
| abstract_inverted_index.HiFIT's | 159 |
| abstract_inverted_index.Through | 131 |
| abstract_inverted_index.address | 70 |
| abstract_inverted_index.between | 49, 125 |
| abstract_inverted_index.changes | 142 |
| abstract_inverted_index.complex | 18, 123 |
| abstract_inverted_index.develop | 75 |
| abstract_inverted_index.disease | 33, 50, 126 |
| abstract_inverted_index.factors | 28 |
| abstract_inverted_index.feature | 90, 111, 167 |
| abstract_inverted_index.further | 104 |
| abstract_inverted_index.machine | 94, 115 |
| abstract_inverted_index.methods | 117 |
| abstract_inverted_index.models. | 96 |
| abstract_inverted_index.online. | 193 |
| abstract_inverted_index.outcome | 164 |
| abstract_inverted_index.package | 175 |
| abstract_inverted_index.present | 55 |
| abstract_inverted_index.propose | 61 |
| abstract_inverted_index.refined | 105 |
| abstract_inverted_index.studies | 135 |
| abstract_inverted_index.However, | 42 |
| abstract_inverted_index.RESULTS: | 59 |
| abstract_inverted_index.clinical | 10, 26 |
| abstract_inverted_index.combined | 8 |
| abstract_inverted_index.ensemble | 77 |
| abstract_inverted_index.features | 100 |
| abstract_inverted_index.flexibly | 119 |
| abstract_inverted_index.learning | 95, 116 |
| abstract_inverted_index.measured | 4 |
| abstract_inverted_index.outcomes | 51, 127 |
| abstract_inverted_index.profiles | 54 |
| abstract_inverted_index.superior | 160 |
| abstract_inverted_index.surgery, | 145 |
| abstract_inverted_index.survival | 155 |
| abstract_inverted_index.Screening | 84 |
| abstract_inverted_index.advancing | 35 |
| abstract_inverted_index.algorithm | 179 |
| abstract_inverted_index.available | 181, 190 |
| abstract_inverted_index.bariatric | 144 |
| abstract_inverted_index.biomarker | 79 |
| abstract_inverted_index.candidate | 89, 99 |
| abstract_inverted_index.construct | 87 |
| abstract_inverted_index.diseases. | 20 |
| abstract_inverted_index.efficient | 109 |
| abstract_inverted_index.employing | 114 |
| abstract_inverted_index.enhancing | 39 |
| abstract_inverted_index.essential | 30 |
| abstract_inverted_index.extensive | 132 |
| abstract_inverted_index.features, | 2, 11 |
| abstract_inverted_index.following | 143 |
| abstract_inverted_index.framework | 68 |
| abstract_inverted_index.intricate | 47 |
| abstract_inverted_index.materials | 188 |
| abstract_inverted_index.medicine. | 41 |
| abstract_inverted_index.molecular | 129 |
| abstract_inverted_index.numerical | 133 |
| abstract_inverted_index.potential | 122 |
| abstract_inverted_index.practical | 137 |
| abstract_inverted_index.precision | 40 |
| abstract_inverted_index.Importance | 65 |
| abstract_inverted_index.analytical | 57 |
| abstract_inverted_index.biomarkers | 24 |
| abstract_inverted_index.diagnosis, | 37 |
| abstract_inverted_index.downstream | 93 |
| abstract_inverted_index.importance | 112, 168 |
| abstract_inverted_index.pan-cancer | 154 |
| abstract_inverted_index.prediction | 165 |
| abstract_inverted_index.simulation | 134 |
| abstract_inverted_index.Integrating | 21 |
| abstract_inverted_index.MOTIVATION: | 0 |
| abstract_inverted_index.biomarkers. | 130 |
| abstract_inverted_index.challenges. | 58, 72 |
| abstract_inverted_index.data-driven | 78 |
| abstract_inverted_index.demonstrate | 158 |
| abstract_inverted_index.elucidating | 32 |
| abstract_inverted_index.examination | 150 |
| abstract_inverted_index.mechanisms, | 34 |
| abstract_inverted_index.performance | 161 |
| abstract_inverted_index.substantial | 56 |
| abstract_inverted_index.AVAILABILITY | 170 |
| abstract_inverted_index.INFORMATION: | 186 |
| abstract_inverted_index.applications | 138 |
| abstract_inverted_index.associations | 48, 124 |
| abstract_inverted_index.implementing | 176 |
| abstract_inverted_index.pre-screened | 98 |
| abstract_inverted_index.SUPPLEMENTARY | 185 |
| abstract_inverted_index.Specifically, | 73 |
| abstract_inverted_index.Supplementary | 187 |
| abstract_inverted_index.significantly | 12 |
| abstract_inverted_index.technologies, | 7 |
| abstract_inverted_index.understanding | 15 |
| abstract_inverted_index.Bioinformatics | 192 |
| abstract_inverted_index.dimensionality | 45 |
| abstract_inverted_index.identification | 80 |
| abstract_inverted_index.IMPLEMENTATION: | 172 |
| abstract_inverted_index.computationally | 108 |
| abstract_inverted_index.high-throughput | 6 |
| abstract_inverted_index.identification. | 169 |
| abstract_inverted_index.High-dimensional | 63 |
| abstract_inverted_index.permutation-based | 110 |
| abstract_inverted_index.microbiome-associated | 140 |
| abstract_inverted_index.gene-expression-associated | 152 |
| abstract_inverted_index.(https://github.com/BZou-lab/HiFIT). | 184 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| citation_normalized_percentile.value | 0.22201909 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |