Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2510.01683
Deep learning models achieve strong performance in chest radiograph (CXR) interpretation, yet fairness and reliability concerns persist. Models often show uneven accuracy across patient subgroups, leading to hidden failures not reflected in aggregate metrics. Existing error detection approaches -- based on confidence calibration or out-of-distribution (OOD) detection -- struggle with subtle within-distribution errors, while image- and representation-level consistency-based methods remain underexplored in medical imaging. We propose an augmentation-sensitivity risk scoring (ASRS) framework to identify error-prone CXR cases. ASRS applies clinically plausible rotations ($\pm 15^\circ$/$\pm 30^\circ$) and measures embedding shifts with the RAD-DINO encoder. Sensitivity scores stratify samples into stability quartiles, where highly sensitive cases show substantially lower recall ($-0.2$ to $-0.3$) despite high AUROC and confidence. ASRS provides a label-free means for selective prediction and clinician review, improving fairness and safety in medical AI.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2510.01683
- https://arxiv.org/pdf/2510.01683
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4414818360
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4414818360Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2510.01683Digital Object Identifier
- Title
-
Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk ScoringWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-10-02Full publication date if available
- Authors
-
H. Shu, Wei-Ning Chiu, Shun-Ting Chang, Maoyi Huang, Takeshi Tohyama, Ahram Han, Po‐Chih KuoList of authors in order
- Landing page
-
https://arxiv.org/abs/2510.01683Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2510.01683Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2510.01683Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4414818360 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2510.01683 |
| ids.doi | https://doi.org/10.48550/arxiv.2510.01683 |
| ids.openalex | https://openalex.org/W4414818360 |
| fwci | |
| type | preprint |
| title | Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10876 |
| topics[0].field.id | https://openalex.org/fields/22 |
| topics[0].field.display_name | Engineering |
| topics[0].score | 0.9460999965667725 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2207 |
| topics[0].subfield.display_name | Control and Systems Engineering |
| topics[0].display_name | Fault Detection and Control Systems |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2510.01683 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://arxiv.org/pdf/2510.01683 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2510.01683 |
| locations[1].id | doi:10.48550/arxiv.2510.01683 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2510.01683 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5100582040 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | H. Shu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Shu, Han-Jay |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5101286955 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Wei-Ning Chiu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Chiu, Wei-Ning |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5108925542 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Shun-Ting Chang |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Chang, Shun-Ting |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5048767215 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-9154-9485 |
| authorships[3].author.display_name | Maoyi Huang |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Huang, Meng-Ping |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5026341995 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-7300-8180 |
| authorships[4].author.display_name | Takeshi Tohyama |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Tohyama, Takeshi |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5112676676 |
| authorships[5].author.orcid | |
| authorships[5].author.display_name | Ahram Han |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Han, Ahram |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5103017921 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-4020-3147 |
| authorships[6].author.display_name | Po‐Chih Kuo |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | Kuo, Po-Chih |
| authorships[6].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2510.01683 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Uncovering Overconfident Failures in CXR Models via Augmentation-Sensitivity Risk Scoring |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10876 |
| primary_topic.field.id | https://openalex.org/fields/22 |
| primary_topic.field.display_name | Engineering |
| primary_topic.score | 0.9460999965667725 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2207 |
| primary_topic.subfield.display_name | Control and Systems Engineering |
| primary_topic.display_name | Fault Detection and Control Systems |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2510.01683 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2510.01683 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2510.01683 |
| primary_location.id | pmh:oai:arXiv.org:2510.01683 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://arxiv.org/pdf/2510.01683 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2510.01683 |
| publication_date | 2025-10-02 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 118 |
| abstract_inverted_index.-- | 38, 47 |
| abstract_inverted_index.We | 64 |
| abstract_inverted_index.an | 66 |
| abstract_inverted_index.in | 6, 31, 61, 131 |
| abstract_inverted_index.on | 40 |
| abstract_inverted_index.or | 43 |
| abstract_inverted_index.to | 26, 72, 109 |
| abstract_inverted_index.AI. | 133 |
| abstract_inverted_index.CXR | 75 |
| abstract_inverted_index.and | 13, 55, 85, 114, 124, 129 |
| abstract_inverted_index.for | 121 |
| abstract_inverted_index.not | 29 |
| abstract_inverted_index.the | 90 |
| abstract_inverted_index.yet | 11 |
| abstract_inverted_index.ASRS | 77, 116 |
| abstract_inverted_index.Deep | 0 |
| abstract_inverted_index.high | 112 |
| abstract_inverted_index.into | 97 |
| abstract_inverted_index.risk | 68 |
| abstract_inverted_index.show | 19, 104 |
| abstract_inverted_index.with | 49, 89 |
| abstract_inverted_index.($\pm | 82 |
| abstract_inverted_index.(CXR) | 9 |
| abstract_inverted_index.(OOD) | 45 |
| abstract_inverted_index.AUROC | 113 |
| abstract_inverted_index.based | 39 |
| abstract_inverted_index.cases | 103 |
| abstract_inverted_index.chest | 7 |
| abstract_inverted_index.error | 35 |
| abstract_inverted_index.lower | 106 |
| abstract_inverted_index.means | 120 |
| abstract_inverted_index.often | 18 |
| abstract_inverted_index.where | 100 |
| abstract_inverted_index.while | 53 |
| abstract_inverted_index.(ASRS) | 70 |
| abstract_inverted_index.Models | 17 |
| abstract_inverted_index.across | 22 |
| abstract_inverted_index.cases. | 76 |
| abstract_inverted_index.hidden | 27 |
| abstract_inverted_index.highly | 101 |
| abstract_inverted_index.image- | 54 |
| abstract_inverted_index.models | 2 |
| abstract_inverted_index.recall | 107 |
| abstract_inverted_index.remain | 59 |
| abstract_inverted_index.safety | 130 |
| abstract_inverted_index.scores | 94 |
| abstract_inverted_index.shifts | 88 |
| abstract_inverted_index.strong | 4 |
| abstract_inverted_index.subtle | 50 |
| abstract_inverted_index.uneven | 20 |
| abstract_inverted_index.$-0.3$) | 110 |
| abstract_inverted_index.($-0.2$ | 108 |
| abstract_inverted_index.achieve | 3 |
| abstract_inverted_index.applies | 78 |
| abstract_inverted_index.despite | 111 |
| abstract_inverted_index.errors, | 52 |
| abstract_inverted_index.leading | 25 |
| abstract_inverted_index.medical | 62, 132 |
| abstract_inverted_index.methods | 58 |
| abstract_inverted_index.patient | 23 |
| abstract_inverted_index.propose | 65 |
| abstract_inverted_index.review, | 126 |
| abstract_inverted_index.samples | 96 |
| abstract_inverted_index.scoring | 69 |
| abstract_inverted_index.Existing | 34 |
| abstract_inverted_index.RAD-DINO | 91 |
| abstract_inverted_index.accuracy | 21 |
| abstract_inverted_index.concerns | 15 |
| abstract_inverted_index.encoder. | 92 |
| abstract_inverted_index.failures | 28 |
| abstract_inverted_index.fairness | 12, 128 |
| abstract_inverted_index.identify | 73 |
| abstract_inverted_index.imaging. | 63 |
| abstract_inverted_index.learning | 1 |
| abstract_inverted_index.measures | 86 |
| abstract_inverted_index.metrics. | 33 |
| abstract_inverted_index.persist. | 16 |
| abstract_inverted_index.provides | 117 |
| abstract_inverted_index.stratify | 95 |
| abstract_inverted_index.struggle | 48 |
| abstract_inverted_index.aggregate | 32 |
| abstract_inverted_index.clinician | 125 |
| abstract_inverted_index.detection | 36, 46 |
| abstract_inverted_index.embedding | 87 |
| abstract_inverted_index.framework | 71 |
| abstract_inverted_index.improving | 127 |
| abstract_inverted_index.plausible | 80 |
| abstract_inverted_index.reflected | 30 |
| abstract_inverted_index.rotations | 81 |
| abstract_inverted_index.selective | 122 |
| abstract_inverted_index.sensitive | 102 |
| abstract_inverted_index.stability | 98 |
| abstract_inverted_index.30^\circ$) | 84 |
| abstract_inverted_index.approaches | 37 |
| abstract_inverted_index.clinically | 79 |
| abstract_inverted_index.confidence | 41 |
| abstract_inverted_index.label-free | 119 |
| abstract_inverted_index.prediction | 123 |
| abstract_inverted_index.quartiles, | 99 |
| abstract_inverted_index.radiograph | 8 |
| abstract_inverted_index.subgroups, | 24 |
| abstract_inverted_index.Sensitivity | 93 |
| abstract_inverted_index.calibration | 42 |
| abstract_inverted_index.confidence. | 115 |
| abstract_inverted_index.error-prone | 74 |
| abstract_inverted_index.performance | 5 |
| abstract_inverted_index.reliability | 14 |
| abstract_inverted_index.substantially | 105 |
| abstract_inverted_index.underexplored | 60 |
| abstract_inverted_index.15^\circ$/$\pm | 83 |
| abstract_inverted_index.interpretation, | 10 |
| abstract_inverted_index.consistency-based | 57 |
| abstract_inverted_index.out-of-distribution | 44 |
| abstract_inverted_index.within-distribution | 51 |
| abstract_inverted_index.representation-level | 56 |
| abstract_inverted_index.augmentation-sensitivity | 67 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |