Structured Taxonomy and Framework for Developing Medical Benchmark in Large Language Models Derived from Scoping Review Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.21203/rs.3.rs-7927940/v1
With the rapid advancement of large language model technology, numerous studies have explored its application in the medical field. Robust evaluation is crucial for ensuring reliability and safety, leading to the development of diverse benchmark datasets. In this study, we propose a structured taxonomy to provide researchers with practical guidance for benchmark selection. Furthermore, we introduce READY, a development framework built on five principles - Reliable, Ethical, Annotated, Diverse, Yield-validated - to support the systematic design of medical benchmarks and strengthen future evaluation practices. To establish the taxonomy and framework, we systematically reviewed benchmark datasets designed for evaluating LLMs in medical context. A comprehensive literature search yielded 55 relevant studies. Each benchmark was analyzed using a structured framework encompassing the dataset construction and evaluation methodology. We anticipate that this research will promote more rigorous and ethical LLM evaluation, paving the way for the safe application of LLMs in clinical settings.
Related Topics
- Type
- article
- Landing Page
- https://doi.org/10.21203/rs.3.rs-7927940/v1
- https://www.researchsquare.com/article/rs-7927940/latest.pdf
- OA Status
- gold
- References
- 18
- OpenAlex ID
- https://openalex.org/W4416225454
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4416225454Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.21203/rs.3.rs-7927940/v1Digital Object Identifier
- Title
-
Structured Taxonomy and Framework for Developing Medical Benchmark in Large Language Models Derived from Scoping ReviewWork title
- Type
-
articleOpenAlex work type
- Publication year
-
2025Year of publication
- Publication date
-
2025-11-14Full publication date if available
- Authors
-
Junbok Lee, Jaeyong ShinList of authors in order
- Landing page
-
https://doi.org/10.21203/rs.3.rs-7927940/v1Publisher landing page
- PDF URL
-
https://www.researchsquare.com/article/rs-7927940/latest.pdfDirect link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://www.researchsquare.com/article/rs-7927940/latest.pdfDirect OA link when available
- Cited by
-
0Total citation count in OpenAlex
- References (count)
-
18Number of works referenced by this work
Full payload
| id | https://openalex.org/W4416225454 |
|---|---|
| doi | https://doi.org/10.21203/rs.3.rs-7927940/v1 |
| ids.doi | https://doi.org/10.21203/rs.3.rs-7927940/v1 |
| ids.openalex | https://openalex.org/W4416225454 |
| fwci | |
| type | article |
| title | Structured Taxonomy and Framework for Developing Medical Benchmark in Large Language Models Derived from Scoping Review |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| language | |
| locations[0].id | doi:10.21203/rs.3.rs-7927940/v1 |
| locations[0].is_oa | True |
| locations[0].source | |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://www.researchsquare.com/article/rs-7927940/latest.pdf |
| locations[0].version | acceptedVersion |
| locations[0].raw_type | posted-content |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.21203/rs.3.rs-7927940/v1 |
| indexed_in | crossref |
| authorships[0].author.id | https://openalex.org/A5062223061 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2472-9790 |
| authorships[0].author.display_name | Junbok Lee |
| authorships[0].countries | KR |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I193775966 |
| authorships[0].affiliations[0].raw_affiliation_string | Yonsei University |
| authorships[0].institutions[0].id | https://openalex.org/I193775966 |
| authorships[0].institutions[0].ror | https://ror.org/01wjejq96 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I193775966 |
| authorships[0].institutions[0].country_code | KR |
| authorships[0].institutions[0].display_name | Yonsei University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Junbok Lee |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Yonsei University |
| authorships[1].author.id | https://openalex.org/A5000615161 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-2955-6382 |
| authorships[1].author.display_name | Jaeyong Shin |
| authorships[1].countries | KR |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I193775966 |
| authorships[1].affiliations[0].raw_affiliation_string | Yonsei University College of Medicine |
| authorships[1].institutions[0].id | https://openalex.org/I193775966 |
| authorships[1].institutions[0].ror | https://ror.org/01wjejq96 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I193775966 |
| authorships[1].institutions[0].country_code | KR |
| authorships[1].institutions[0].display_name | Yonsei University |
| authorships[1].author_position | last |
| authorships[1].raw_author_name | Jaeyong Shin |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Yonsei University College of Medicine |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://www.researchsquare.com/article/rs-7927940/latest.pdf |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-11-14T00:00:00 |
| display_name | Structured Taxonomy and Framework for Developing Medical Benchmark in Large Language Models Derived from Scoping Review |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-28T08:13:58.905533 |
| primary_topic | |
| cited_by_count | 0 |
| locations_count | 1 |
| best_oa_location.id | doi:10.21203/rs.3.rs-7927940/v1 |
| best_oa_location.is_oa | True |
| best_oa_location.source | |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://www.researchsquare.com/article/rs-7927940/latest.pdf |
| best_oa_location.version | acceptedVersion |
| best_oa_location.raw_type | posted-content |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.21203/rs.3.rs-7927940/v1 |
| primary_location.id | doi:10.21203/rs.3.rs-7927940/v1 |
| primary_location.is_oa | True |
| primary_location.source | |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://www.researchsquare.com/article/rs-7927940/latest.pdf |
| primary_location.version | acceptedVersion |
| primary_location.raw_type | posted-content |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.21203/rs.3.rs-7927940/v1 |
| publication_date | 2025-11-14 |
| publication_year | 2025 |
| referenced_works | https://openalex.org/W4404523826, https://openalex.org/W4391301614, https://openalex.org/W4391069573, https://openalex.org/W4395052272, https://openalex.org/W4391921245, https://openalex.org/W4400897152, https://openalex.org/W4400964148, https://openalex.org/W4389069416, https://openalex.org/W2891378911, https://openalex.org/W3162922479, https://openalex.org/W4384071683, https://openalex.org/W4318391587, https://openalex.org/W2903314293, https://openalex.org/W2913352150, https://openalex.org/W3042207670, https://openalex.org/W3090073303, https://openalex.org/W4293233500, https://openalex.org/W4313887409 |
| referenced_works_count | 18 |
| abstract_inverted_index.- | 65, 71 |
| abstract_inverted_index.A | 103 |
| abstract_inverted_index.a | 42, 58, 116 |
| abstract_inverted_index.55 | 108 |
| abstract_inverted_index.In | 37 |
| abstract_inverted_index.To | 85 |
| abstract_inverted_index.We | 126 |
| abstract_inverted_index.in | 16, 100, 148 |
| abstract_inverted_index.is | 22 |
| abstract_inverted_index.of | 5, 33, 77, 146 |
| abstract_inverted_index.on | 62 |
| abstract_inverted_index.to | 30, 45, 72 |
| abstract_inverted_index.we | 40, 55, 91 |
| abstract_inverted_index.LLM | 137 |
| abstract_inverted_index.and | 27, 80, 89, 123, 135 |
| abstract_inverted_index.for | 24, 51, 97, 142 |
| abstract_inverted_index.its | 14 |
| abstract_inverted_index.the | 2, 17, 31, 74, 87, 120, 140, 143 |
| abstract_inverted_index.was | 113 |
| abstract_inverted_index.way | 141 |
| abstract_inverted_index.Each | 111 |
| abstract_inverted_index.LLMs | 99, 147 |
| abstract_inverted_index.With | 1 |
| abstract_inverted_index.five | 63 |
| abstract_inverted_index.have | 12 |
| abstract_inverted_index.more | 133 |
| abstract_inverted_index.safe | 144 |
| abstract_inverted_index.that | 128 |
| abstract_inverted_index.this | 38, 129 |
| abstract_inverted_index.will | 131 |
| abstract_inverted_index.with | 48 |
| abstract_inverted_index.built | 61 |
| abstract_inverted_index.large | 6 |
| abstract_inverted_index.model | 8 |
| abstract_inverted_index.rapid | 3 |
| abstract_inverted_index.using | 115 |
| abstract_inverted_index.READY, | 57 |
| abstract_inverted_index.Robust | 20 |
| abstract_inverted_index.design | 76 |
| abstract_inverted_index.field. | 19 |
| abstract_inverted_index.future | 82 |
| abstract_inverted_index.paving | 139 |
| abstract_inverted_index.search | 106 |
| abstract_inverted_index.study, | 39 |
| abstract_inverted_index.crucial | 23 |
| abstract_inverted_index.dataset | 121 |
| abstract_inverted_index.diverse | 34 |
| abstract_inverted_index.ethical | 136 |
| abstract_inverted_index.leading | 29 |
| abstract_inverted_index.medical | 18, 78, 101 |
| abstract_inverted_index.promote | 132 |
| abstract_inverted_index.propose | 41 |
| abstract_inverted_index.provide | 46 |
| abstract_inverted_index.safety, | 28 |
| abstract_inverted_index.studies | 11 |
| abstract_inverted_index.support | 73 |
| abstract_inverted_index.yielded | 107 |
| abstract_inverted_index.Diverse, | 69 |
| abstract_inverted_index.Ethical, | 67 |
| abstract_inverted_index.analyzed | 114 |
| abstract_inverted_index.clinical | 149 |
| abstract_inverted_index.context. | 102 |
| abstract_inverted_index.datasets | 95 |
| abstract_inverted_index.designed | 96 |
| abstract_inverted_index.ensuring | 25 |
| abstract_inverted_index.explored | 13 |
| abstract_inverted_index.guidance | 50 |
| abstract_inverted_index.language | 7 |
| abstract_inverted_index.numerous | 10 |
| abstract_inverted_index.relevant | 109 |
| abstract_inverted_index.research | 130 |
| abstract_inverted_index.reviewed | 93 |
| abstract_inverted_index.rigorous | 134 |
| abstract_inverted_index.studies. | 110 |
| abstract_inverted_index.taxonomy | 44, 88 |
| abstract_inverted_index.Reliable, | 66 |
| abstract_inverted_index.benchmark | 35, 52, 94, 112 |
| abstract_inverted_index.datasets. | 36 |
| abstract_inverted_index.establish | 86 |
| abstract_inverted_index.framework | 60, 118 |
| abstract_inverted_index.introduce | 56 |
| abstract_inverted_index.practical | 49 |
| abstract_inverted_index.settings. | 150 |
| abstract_inverted_index.Annotated, | 68 |
| abstract_inverted_index.anticipate | 127 |
| abstract_inverted_index.benchmarks | 79 |
| abstract_inverted_index.evaluating | 98 |
| abstract_inverted_index.evaluation | 21, 83, 124 |
| abstract_inverted_index.framework, | 90 |
| abstract_inverted_index.literature | 105 |
| abstract_inverted_index.practices. | 84 |
| abstract_inverted_index.principles | 64 |
| abstract_inverted_index.selection. | 53 |
| abstract_inverted_index.strengthen | 81 |
| abstract_inverted_index.structured | 43, 117 |
| abstract_inverted_index.systematic | 75 |
| abstract_inverted_index.advancement | 4 |
| abstract_inverted_index.application | 15, 145 |
| abstract_inverted_index.development | 32, 59 |
| abstract_inverted_index.evaluation, | 138 |
| abstract_inverted_index.reliability | 26 |
| abstract_inverted_index.researchers | 47 |
| abstract_inverted_index.technology, | 9 |
| abstract_inverted_index.Furthermore, | 54 |
| abstract_inverted_index.construction | 122 |
| abstract_inverted_index.encompassing | 119 |
| abstract_inverted_index.methodology. | 125 |
| abstract_inverted_index.comprehensive | 104 |
| abstract_inverted_index.systematically | 92 |
| abstract_inverted_index.Yield-validated | 70 |
| abstract_inverted_index.<title>Abstract</title> | 0 |
| cited_by_percentile_year | |
| countries_distinct_count | 1 |
| institutions_distinct_count | 2 |
| citation_normalized_percentile |