Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2504.16612
Purpose: In this study, we investigate the training of foundation models using federated learning to address data-sharing limitations and enable collaborative model training without data transfer for minimally invasive surgery. Methods: Inspired by the EndoViT study, we adapt the Masked Autoencoder for federated learning, enhancing it with adaptive Sharpness-Aware Minimization (FedSAM) and Stochastic Weight Averaging (SWA). Our model is pretrained on the Endo700k dataset collection and later fine-tuned and evaluated for tasks such as Semantic Segmentation, Action Triplet Recognition, and Surgical Phase Recognition. Results: Our findings demonstrate that integrating adaptive FedSAM into the federated MAE approach improves pretraining, leading to a reduction in reconstruction loss per patch. The application of FL-EndoViT in surgical downstream tasks results in performance comparable to CEN-EndoViT. Furthermore, FL-EndoViT exhibits advantages over CEN-EndoViT in surgical scene segmentation when data is limited and in action triplet recognition when large datasets are used. Conclusion: These findings highlight the potential of federated learning for privacy-preserving training of surgical foundation models, offering a robust and generalizable solution for surgical data science. Effective collaboration requires adapting federated learning methods, such as the integration of FedSAM, which can accommodate the inherent data heterogeneity across institutions. In future, exploring FL in video-based models may enhance these capabilities by incorporating spatiotemporal dynamics crucial for real-world surgical environments.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2504.16612
- https://arxiv.org/pdf/2504.16612
- OA Status
- green
- OpenAlex ID
- https://openalex.org/W4415066388
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4415066388Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2504.16612Digital Object Identifier
- Title
-
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image CollectionsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-04-23Full publication date if available
- Authors
-
Max Kirchner, Alexander C. Jenke, Sebastian Bodenstedt, Fiona R. Kolbinger, Oliver Lester Saldanha, Jakob Nikolas Kather, Martin Wagner, Stefanie SpeidelList of authors in order
- Landing page
-
https://arxiv.org/abs/2504.16612Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2504.16612Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2504.16612Direct OA link when available
- Cited by
-
0Total citation count in OpenAlex
Full payload
| id | https://openalex.org/W4415066388 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2504.16612 |
| ids.doi | https://doi.org/10.48550/arxiv.2504.16612 |
| ids.openalex | https://openalex.org/W4415066388 |
| fwci | |
| type | preprint |
| title | Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10764 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.8439000248908997 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Privacy-Preserving Technologies in Data |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2504.16612 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | cc-by-nc-nd |
| locations[0].pdf_url | https://arxiv.org/pdf/2504.16612 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | https://openalex.org/licenses/cc-by-nc-nd |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2504.16612 |
| locations[1].id | doi:10.48550/arxiv.2504.16612 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2504.16612 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5036243537 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-8798-2446 |
| authorships[0].author.display_name | Max Kirchner |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Kirchner, Max |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5056857657 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-4675-417X |
| authorships[1].author.display_name | Alexander C. Jenke |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Jenke, Alexander C. |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5034437528 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-2203-9729 |
| authorships[2].author.display_name | Sebastian Bodenstedt |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Bodenstedt, Sebastian |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5059363211 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2265-4809 |
| authorships[3].author.display_name | Fiona R. Kolbinger |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Kolbinger, Fiona R. |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5076609492 |
| authorships[4].author.orcid | https://orcid.org/0000-0002-3594-7590 |
| authorships[4].author.display_name | Oliver Lester Saldanha |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Saldanha, Oliver L. |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5073483894 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-3730-5348 |
| authorships[5].author.display_name | Jakob Nikolas Kather |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Kather, Jakob N. |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5027275363 |
| authorships[6].author.orcid | https://orcid.org/0000-0002-9831-9110 |
| authorships[6].author.display_name | Martin Wagner |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Wagner, Martin |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5003648994 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-4590-1908 |
| authorships[7].author.display_name | Stefanie Speidel |
| authorships[7].author_position | last |
| authorships[7].raw_author_name | Speidel, Stefanie |
| authorships[7].is_corresponding | False |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2504.16612 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-11T00:00:00 |
| display_name | Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10764 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.8439000248908997 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Privacy-Preserving Technologies in Data |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2504.16612 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | cc-by-nc-nd |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2504.16612 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2504.16612 |
| primary_location.id | pmh:oai:arXiv.org:2504.16612 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | cc-by-nc-nd |
| primary_location.pdf_url | https://arxiv.org/pdf/2504.16612 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | https://openalex.org/licenses/cc-by-nc-nd |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2504.16612 |
| publication_date | 2025-04-23 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 100, 162 |
| abstract_inverted_index.FL | 196 |
| abstract_inverted_index.In | 1, 193 |
| abstract_inverted_index.as | 73, 179 |
| abstract_inverted_index.by | 32, 204 |
| abstract_inverted_index.in | 102, 111, 116, 127, 136, 197 |
| abstract_inverted_index.is | 58, 133 |
| abstract_inverted_index.it | 45 |
| abstract_inverted_index.of | 8, 109, 151, 157, 182 |
| abstract_inverted_index.on | 60 |
| abstract_inverted_index.to | 14, 99, 119 |
| abstract_inverted_index.we | 4, 36 |
| abstract_inverted_index.MAE | 94 |
| abstract_inverted_index.Our | 56, 84 |
| abstract_inverted_index.The | 107 |
| abstract_inverted_index.and | 18, 51, 65, 68, 79, 135, 164 |
| abstract_inverted_index.are | 143 |
| abstract_inverted_index.can | 185 |
| abstract_inverted_index.for | 26, 41, 70, 154, 167, 209 |
| abstract_inverted_index.may | 200 |
| abstract_inverted_index.per | 105 |
| abstract_inverted_index.the | 6, 33, 38, 61, 92, 149, 180, 187 |
| abstract_inverted_index.data | 24, 132, 169, 189 |
| abstract_inverted_index.into | 91 |
| abstract_inverted_index.loss | 104 |
| abstract_inverted_index.over | 125 |
| abstract_inverted_index.such | 72, 178 |
| abstract_inverted_index.that | 87 |
| abstract_inverted_index.this | 2 |
| abstract_inverted_index.when | 131, 140 |
| abstract_inverted_index.with | 46 |
| abstract_inverted_index.Phase | 81 |
| abstract_inverted_index.These | 146 |
| abstract_inverted_index.adapt | 37 |
| abstract_inverted_index.large | 141 |
| abstract_inverted_index.later | 66 |
| abstract_inverted_index.model | 21, 57 |
| abstract_inverted_index.scene | 129 |
| abstract_inverted_index.tasks | 71, 114 |
| abstract_inverted_index.these | 202 |
| abstract_inverted_index.used. | 144 |
| abstract_inverted_index.using | 11 |
| abstract_inverted_index.which | 184 |
| abstract_inverted_index.(SWA). | 55 |
| abstract_inverted_index.Action | 76 |
| abstract_inverted_index.FedSAM | 90 |
| abstract_inverted_index.Masked | 39 |
| abstract_inverted_index.Weight | 53 |
| abstract_inverted_index.across | 191 |
| abstract_inverted_index.action | 137 |
| abstract_inverted_index.enable | 19 |
| abstract_inverted_index.models | 10, 199 |
| abstract_inverted_index.patch. | 106 |
| abstract_inverted_index.robust | 163 |
| abstract_inverted_index.study, | 3, 35 |
| abstract_inverted_index.EndoViT | 34 |
| abstract_inverted_index.FedSAM, | 183 |
| abstract_inverted_index.Triplet | 77 |
| abstract_inverted_index.address | 15 |
| abstract_inverted_index.crucial | 208 |
| abstract_inverted_index.dataset | 63 |
| abstract_inverted_index.enhance | 201 |
| abstract_inverted_index.future, | 194 |
| abstract_inverted_index.leading | 98 |
| abstract_inverted_index.limited | 134 |
| abstract_inverted_index.models, | 160 |
| abstract_inverted_index.results | 115 |
| abstract_inverted_index.triplet | 138 |
| abstract_inverted_index.without | 23 |
| abstract_inverted_index.(FedSAM) | 50 |
| abstract_inverted_index.Endo700k | 62 |
| abstract_inverted_index.Inspired | 31 |
| abstract_inverted_index.Methods: | 30 |
| abstract_inverted_index.Purpose: | 0 |
| abstract_inverted_index.Results: | 83 |
| abstract_inverted_index.Semantic | 74 |
| abstract_inverted_index.Surgical | 80 |
| abstract_inverted_index.adapting | 174 |
| abstract_inverted_index.adaptive | 47, 89 |
| abstract_inverted_index.approach | 95 |
| abstract_inverted_index.datasets | 142 |
| abstract_inverted_index.dynamics | 207 |
| abstract_inverted_index.exhibits | 123 |
| abstract_inverted_index.findings | 85, 147 |
| abstract_inverted_index.improves | 96 |
| abstract_inverted_index.inherent | 188 |
| abstract_inverted_index.invasive | 28 |
| abstract_inverted_index.learning | 13, 153, 176 |
| abstract_inverted_index.methods, | 177 |
| abstract_inverted_index.offering | 161 |
| abstract_inverted_index.requires | 173 |
| abstract_inverted_index.science. | 170 |
| abstract_inverted_index.solution | 166 |
| abstract_inverted_index.surgery. | 29 |
| abstract_inverted_index.surgical | 112, 128, 158, 168, 211 |
| abstract_inverted_index.training | 7, 22, 156 |
| abstract_inverted_index.transfer | 25 |
| abstract_inverted_index.Averaging | 54 |
| abstract_inverted_index.Effective | 171 |
| abstract_inverted_index.enhancing | 44 |
| abstract_inverted_index.evaluated | 69 |
| abstract_inverted_index.exploring | 195 |
| abstract_inverted_index.federated | 12, 42, 93, 152, 175 |
| abstract_inverted_index.highlight | 148 |
| abstract_inverted_index.learning, | 43 |
| abstract_inverted_index.minimally | 27 |
| abstract_inverted_index.potential | 150 |
| abstract_inverted_index.reduction | 101 |
| abstract_inverted_index.FL-EndoViT | 110, 122 |
| abstract_inverted_index.Stochastic | 52 |
| abstract_inverted_index.advantages | 124 |
| abstract_inverted_index.collection | 64 |
| abstract_inverted_index.comparable | 118 |
| abstract_inverted_index.downstream | 113 |
| abstract_inverted_index.fine-tuned | 67 |
| abstract_inverted_index.foundation | 9, 159 |
| abstract_inverted_index.pretrained | 59 |
| abstract_inverted_index.real-world | 210 |
| abstract_inverted_index.Autoencoder | 40 |
| abstract_inverted_index.CEN-EndoViT | 126 |
| abstract_inverted_index.Conclusion: | 145 |
| abstract_inverted_index.accommodate | 186 |
| abstract_inverted_index.application | 108 |
| abstract_inverted_index.demonstrate | 86 |
| abstract_inverted_index.integrating | 88 |
| abstract_inverted_index.integration | 181 |
| abstract_inverted_index.investigate | 5 |
| abstract_inverted_index.limitations | 17 |
| abstract_inverted_index.performance | 117 |
| abstract_inverted_index.recognition | 139 |
| abstract_inverted_index.video-based | 198 |
| abstract_inverted_index.CEN-EndoViT. | 120 |
| abstract_inverted_index.Furthermore, | 121 |
| abstract_inverted_index.Minimization | 49 |
| abstract_inverted_index.Recognition, | 78 |
| abstract_inverted_index.Recognition. | 82 |
| abstract_inverted_index.capabilities | 203 |
| abstract_inverted_index.data-sharing | 16 |
| abstract_inverted_index.pretraining, | 97 |
| abstract_inverted_index.segmentation | 130 |
| abstract_inverted_index.Segmentation, | 75 |
| abstract_inverted_index.collaboration | 172 |
| abstract_inverted_index.collaborative | 20 |
| abstract_inverted_index.environments. | 212 |
| abstract_inverted_index.generalizable | 165 |
| abstract_inverted_index.heterogeneity | 190 |
| abstract_inverted_index.incorporating | 205 |
| abstract_inverted_index.institutions. | 192 |
| abstract_inverted_index.reconstruction | 103 |
| abstract_inverted_index.spatiotemporal | 206 |
| abstract_inverted_index.Sharpness-Aware | 48 |
| abstract_inverted_index.privacy-preserving | 155 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 8 |
| citation_normalized_percentile |