Analyzing evaluation methods for large language models in the medical field: a scoping review Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.1186/s12911-024-02709-7
Background Owing to the rapid growth in the popularity of Large Language Models (LLMs), various performance evaluation studies have been conducted to confirm their applicability in the medical field. However, there is still no clear framework for evaluating LLMs. Objective This study reviews studies on LLM evaluations in the medical field and analyzes the research methods used in these studies. It aims to provide a reference for future researchers designing LLM studies. Methods & materials We conducted a scoping review of three databases (PubMed, Embase, and MEDLINE) to identify LLM-related articles published between January 1, 2023, and September 30, 2023. We analyzed the types of methods, number of questions (queries), evaluators, repeat measurements, additional analysis methods, use of prompt engineering, and metrics other than accuracy. Results A total of 142 articles met the inclusion criteria. LLM evaluation was primarily categorized as either providing test examinations ( n = 53, 37.3%) or being evaluated by a medical professional ( n = 80, 56.3%), with some hybrid cases ( n = 5, 3.5%) or a combination of the two ( n = 4, 2.8%). Most studies had 100 or fewer questions ( n = 18, 29.0%), 15 (24.2%) performed repeated measurements, 18 (29.0%) performed additional analyses, and 8 (12.9%) used prompt engineering. For medical assessment, most studies used 50 or fewer queries ( n = 54, 64.3%), had two evaluators ( n = 43, 48.3%), and 14 (14.7%) used prompt engineering. Conclusions More research is required regarding the application of LLMs in healthcare. Although previous studies have evaluated performance, future studies will likely focus on improving performance. A well-structured methodology is required for these studies to be conducted systematically.
Related Topics
- Type
- review
- Language
- en
- Landing Page
- https://doi.org/10.1186/s12911-024-02709-7
- https://bmcmedinformdecismak.biomedcentral.com/counter/pdf/10.1186/s12911-024-02709-7
- OA Status
- gold
- Cited By
- 22
- References
- 148
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404837143
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404837143Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1186/s12911-024-02709-7Digital Object Identifier
- Title
-
Analyzing evaluation methods for large language models in the medical field: a scoping reviewWork title
- Type
-
reviewOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-11-29Full publication date if available
- Authors
-
Junbok Lee, Sungkyung Park, Jaeyong Shin, Belong ChoList of authors in order
- Landing page
-
https://doi.org/10.1186/s12911-024-02709-7Publisher landing page
- PDF URL
-
https://bmcmedinformdecismak.biomedcentral.com/counter/pdf/10.1186/s12911-024-02709-7Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://bmcmedinformdecismak.biomedcentral.com/counter/pdf/10.1186/s12911-024-02709-7Direct OA link when available
- Concepts
-
Popularity, Health informatics, Inclusion (mineral), English language, MEDLINE, Field (mathematics), Medicine, Computer science, Medical education, Medical physics, Public health, Psychology, Pathology, Pure mathematics, Political science, Mathematics, Social psychology, Mathematics education, LawTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
22Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 22Per-year citation counts (last 5 years)
- References (count)
-
148Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404837143 |
|---|---|
| doi | https://doi.org/10.1186/s12911-024-02709-7 |
| ids.doi | https://doi.org/10.1186/s12911-024-02709-7 |
| ids.pmid | https://pubmed.ncbi.nlm.nih.gov/39614219 |
| ids.openalex | https://openalex.org/W4404837143 |
| fwci | 5.35479361 |
| mesh[0].qualifier_ui | |
| mesh[0].descriptor_ui | D006801 |
| mesh[0].is_major_topic | False |
| mesh[0].qualifier_name | |
| mesh[0].descriptor_name | Humans |
| mesh[1].qualifier_ui | |
| mesh[1].descriptor_ui | D009323 |
| mesh[1].is_major_topic | True |
| mesh[1].qualifier_name | |
| mesh[1].descriptor_name | Natural Language Processing |
| mesh[2].qualifier_ui | |
| mesh[2].descriptor_ui | D005069 |
| mesh[2].is_major_topic | False |
| mesh[2].qualifier_name | |
| mesh[2].descriptor_name | Evaluation Studies as Topic |
| mesh[3].qualifier_ui | |
| mesh[3].descriptor_ui | D006801 |
| mesh[3].is_major_topic | False |
| mesh[3].qualifier_name | |
| mesh[3].descriptor_name | Humans |
| mesh[4].qualifier_ui | |
| mesh[4].descriptor_ui | D009323 |
| mesh[4].is_major_topic | True |
| mesh[4].qualifier_name | |
| mesh[4].descriptor_name | Natural Language Processing |
| mesh[5].qualifier_ui | |
| mesh[5].descriptor_ui | D005069 |
| mesh[5].is_major_topic | False |
| mesh[5].qualifier_name | |
| mesh[5].descriptor_name | Evaluation Studies as Topic |
| mesh[6].qualifier_ui | |
| mesh[6].descriptor_ui | D006801 |
| mesh[6].is_major_topic | False |
| mesh[6].qualifier_name | |
| mesh[6].descriptor_name | Humans |
| mesh[7].qualifier_ui | |
| mesh[7].descriptor_ui | D009323 |
| mesh[7].is_major_topic | True |
| mesh[7].qualifier_name | |
| mesh[7].descriptor_name | Natural Language Processing |
| mesh[8].qualifier_ui | |
| mesh[8].descriptor_ui | D005069 |
| mesh[8].is_major_topic | False |
| mesh[8].qualifier_name | |
| mesh[8].descriptor_name | Evaluation Studies as Topic |
| mesh[9].qualifier_ui | |
| mesh[9].descriptor_ui | D006801 |
| mesh[9].is_major_topic | False |
| mesh[9].qualifier_name | |
| mesh[9].descriptor_name | Humans |
| mesh[10].qualifier_ui | |
| mesh[10].descriptor_ui | D009323 |
| mesh[10].is_major_topic | True |
| mesh[10].qualifier_name | |
| mesh[10].descriptor_name | Natural Language Processing |
| mesh[11].qualifier_ui | |
| mesh[11].descriptor_ui | D005069 |
| mesh[11].is_major_topic | False |
| mesh[11].qualifier_name | |
| mesh[11].descriptor_name | Evaluation Studies as Topic |
| type | review |
| title | Analyzing evaluation methods for large language models in the medical field: a scoping review |
| biblio.issue | 1 |
| biblio.volume | 24 |
| biblio.last_page | 366 |
| biblio.first_page | 366 |
| topics[0].id | https://openalex.org/T11636 |
| topics[0].field.id | https://openalex.org/fields/27 |
| topics[0].field.display_name | Medicine |
| topics[0].score | 0.9993000030517578 |
| topics[0].domain.id | https://openalex.org/domains/4 |
| topics[0].domain.display_name | Health Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2718 |
| topics[0].subfield.display_name | Health Informatics |
| topics[0].display_name | Artificial Intelligence in Healthcare and Education |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9645000100135803 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| is_xpac | False |
| apc_list.value | 1570 |
| apc_list.currency | GBP |
| apc_list.value_usd | 1925 |
| apc_paid.value | 1570 |
| apc_paid.currency | GBP |
| apc_paid.value_usd | 1925 |
| concepts[0].id | https://openalex.org/C2780586970 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7716507911682129 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q1357284 |
| concepts[0].display_name | Popularity |
| concepts[1].id | https://openalex.org/C145642194 |
| concepts[1].level | 3 |
| concepts[1].score | 0.6018387675285339 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q870895 |
| concepts[1].display_name | Health informatics |
| concepts[2].id | https://openalex.org/C109359841 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5979759097099304 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q728944 |
| concepts[2].display_name | Inclusion (mineral) |
| concepts[3].id | https://openalex.org/C2987496018 |
| concepts[3].level | 2 |
| concepts[3].score | 0.5438365340232849 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1860 |
| concepts[3].display_name | English language |
| concepts[4].id | https://openalex.org/C2779473830 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5435677170753479 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q1540899 |
| concepts[4].display_name | MEDLINE |
| concepts[5].id | https://openalex.org/C9652623 |
| concepts[5].level | 2 |
| concepts[5].score | 0.4770911633968353 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q190109 |
| concepts[5].display_name | Field (mathematics) |
| concepts[6].id | https://openalex.org/C71924100 |
| concepts[6].level | 0 |
| concepts[6].score | 0.4378270208835602 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11190 |
| concepts[6].display_name | Medicine |
| concepts[7].id | https://openalex.org/C41008148 |
| concepts[7].level | 0 |
| concepts[7].score | 0.39374876022338867 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[7].display_name | Computer science |
| concepts[8].id | https://openalex.org/C509550671 |
| concepts[8].level | 1 |
| concepts[8].score | 0.3926224410533905 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q126945 |
| concepts[8].display_name | Medical education |
| concepts[9].id | https://openalex.org/C19527891 |
| concepts[9].level | 1 |
| concepts[9].score | 0.322098970413208 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1120908 |
| concepts[9].display_name | Medical physics |
| concepts[10].id | https://openalex.org/C138816342 |
| concepts[10].level | 2 |
| concepts[10].score | 0.28300708532333374 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q189603 |
| concepts[10].display_name | Public health |
| concepts[11].id | https://openalex.org/C15744967 |
| concepts[11].level | 0 |
| concepts[11].score | 0.19170483946800232 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q9418 |
| concepts[11].display_name | Psychology |
| concepts[12].id | https://openalex.org/C142724271 |
| concepts[12].level | 1 |
| concepts[12].score | 0.1551288366317749 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q7208 |
| concepts[12].display_name | Pathology |
| concepts[13].id | https://openalex.org/C202444582 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q837863 |
| concepts[13].display_name | Pure mathematics |
| concepts[14].id | https://openalex.org/C17744445 |
| concepts[14].level | 0 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[14].display_name | Political science |
| concepts[15].id | https://openalex.org/C33923547 |
| concepts[15].level | 0 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[15].display_name | Mathematics |
| concepts[16].id | https://openalex.org/C77805123 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q161272 |
| concepts[16].display_name | Social psychology |
| concepts[17].id | https://openalex.org/C145420912 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q853077 |
| concepts[17].display_name | Mathematics education |
| concepts[18].id | https://openalex.org/C199539241 |
| concepts[18].level | 1 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[18].display_name | Law |
| keywords[0].id | https://openalex.org/keywords/popularity |
| keywords[0].score | 0.7716507911682129 |
| keywords[0].display_name | Popularity |
| keywords[1].id | https://openalex.org/keywords/health-informatics |
| keywords[1].score | 0.6018387675285339 |
| keywords[1].display_name | Health informatics |
| keywords[2].id | https://openalex.org/keywords/inclusion |
| keywords[2].score | 0.5979759097099304 |
| keywords[2].display_name | Inclusion (mineral) |
| keywords[3].id | https://openalex.org/keywords/english-language |
| keywords[3].score | 0.5438365340232849 |
| keywords[3].display_name | English language |
| keywords[4].id | https://openalex.org/keywords/medline |
| keywords[4].score | 0.5435677170753479 |
| keywords[4].display_name | MEDLINE |
| keywords[5].id | https://openalex.org/keywords/field |
| keywords[5].score | 0.4770911633968353 |
| keywords[5].display_name | Field (mathematics) |
| keywords[6].id | https://openalex.org/keywords/medicine |
| keywords[6].score | 0.4378270208835602 |
| keywords[6].display_name | Medicine |
| keywords[7].id | https://openalex.org/keywords/computer-science |
| keywords[7].score | 0.39374876022338867 |
| keywords[7].display_name | Computer science |
| keywords[8].id | https://openalex.org/keywords/medical-education |
| keywords[8].score | 0.3926224410533905 |
| keywords[8].display_name | Medical education |
| keywords[9].id | https://openalex.org/keywords/medical-physics |
| keywords[9].score | 0.322098970413208 |
| keywords[9].display_name | Medical physics |
| keywords[10].id | https://openalex.org/keywords/public-health |
| keywords[10].score | 0.28300708532333374 |
| keywords[10].display_name | Public health |
| keywords[11].id | https://openalex.org/keywords/psychology |
| keywords[11].score | 0.19170483946800232 |
| keywords[11].display_name | Psychology |
| keywords[12].id | https://openalex.org/keywords/pathology |
| keywords[12].score | 0.1551288366317749 |
| keywords[12].display_name | Pathology |
| language | en |
| locations[0].id | doi:10.1186/s12911-024-02709-7 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S107516304 |
| locations[0].source.issn | 1472-6947 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 1472-6947 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | True |
| locations[0].source.display_name | BMC Medical Informatics and Decision Making |
| locations[0].source.host_organization | https://openalex.org/P4310320256 |
| locations[0].source.host_organization_name | BioMed Central |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320256, https://openalex.org/P4310319965 |
| locations[0].source.host_organization_lineage_names | BioMed Central, Springer Nature |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://bmcmedinformdecismak.biomedcentral.com/counter/pdf/10.1186/s12911-024-02709-7 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | BMC Medical Informatics and Decision Making |
| locations[0].landing_page_url | https://doi.org/10.1186/s12911-024-02709-7 |
| locations[1].id | pmid:39614219 |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306525036 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | PubMed |
| locations[1].source.host_organization | https://openalex.org/I1299303238 |
| locations[1].source.host_organization_name | National Institutes of Health |
| locations[1].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | publishedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | True |
| locations[1].is_published | True |
| locations[1].raw_source_name | BMC medical informatics and decision making |
| locations[1].landing_page_url | https://pubmed.ncbi.nlm.nih.gov/39614219 |
| locations[2].id | pmh:oai:pubmedcentral.nih.gov:11606129 |
| locations[2].is_oa | True |
| locations[2].source.id | https://openalex.org/S2764455111 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | False |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | PubMed Central |
| locations[2].source.host_organization | https://openalex.org/I1299303238 |
| locations[2].source.host_organization_name | National Institutes of Health |
| locations[2].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[2].license | other-oa |
| locations[2].pdf_url | |
| locations[2].version | publishedVersion |
| locations[2].raw_type | Text |
| locations[2].license_id | https://openalex.org/licenses/other-oa |
| locations[2].is_accepted | True |
| locations[2].is_published | True |
| locations[2].raw_source_name | BMC Med Inform Decis Mak |
| locations[2].landing_page_url | https://www.ncbi.nlm.nih.gov/pmc/articles/11606129 |
| locations[3].id | pmh:oai:doaj.org/article:cf779519175b464995d3c83bac9da465 |
| locations[3].is_oa | False |
| locations[3].source.id | https://openalex.org/S4306401280 |
| locations[3].source.issn | |
| locations[3].source.type | repository |
| locations[3].source.is_oa | False |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | DOAJ (DOAJ: Directory of Open Access Journals) |
| locations[3].source.host_organization | |
| locations[3].source.host_organization_name | |
| locations[3].license | |
| locations[3].pdf_url | |
| locations[3].version | submittedVersion |
| locations[3].raw_type | article |
| locations[3].license_id | |
| locations[3].is_accepted | False |
| locations[3].is_published | False |
| locations[3].raw_source_name | BMC Medical Informatics and Decision Making, Vol 24, Iss 1, Pp 1-11 (2024) |
| locations[3].landing_page_url | https://doaj.org/article/cf779519175b464995d3c83bac9da465 |
| indexed_in | crossref, doaj, pubmed |
| authorships[0].author.id | https://openalex.org/A5062223061 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2472-9790 |
| authorships[0].author.display_name | Junbok Lee |
| authorships[0].countries | KR |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I193775966 |
| authorships[0].affiliations[0].raw_affiliation_string | Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea |
| authorships[0].institutions[0].id | https://openalex.org/I193775966 |
| authorships[0].institutions[0].ror | https://ror.org/01wjejq96 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I193775966 |
| authorships[0].institutions[0].country_code | KR |
| authorships[0].institutions[0].display_name | Yonsei University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Junbok Lee |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Institute for Innovation in Digital Healthcare, Yonsei University, Seoul, Republic of Korea |
| authorships[1].author.id | https://openalex.org/A5088060647 |
| authorships[1].author.orcid | https://orcid.org/0000-0003-1171-5020 |
| authorships[1].author.display_name | Sungkyung Park |
| authorships[1].countries | KR |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I118373667 |
| authorships[1].affiliations[0].raw_affiliation_string | Department of Bigdata AI Management Information, Seoul National University of Science and Technology, Seoul, Republic of Korea |
| authorships[1].institutions[0].id | https://openalex.org/I118373667 |
| authorships[1].institutions[0].ror | https://ror.org/00chfja07 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I118373667 |
| authorships[1].institutions[0].country_code | KR |
| authorships[1].institutions[0].display_name | Seoul National University of Science and Technology |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Sungkyung Park |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Department of Bigdata AI Management Information, Seoul National University of Science and Technology, Seoul, Republic of Korea |
| authorships[2].author.id | https://openalex.org/A5000615161 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-2955-6382 |
| authorships[2].author.display_name | Jaeyong Shin |
| authorships[2].countries | KR |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I193775966 |
| authorships[2].affiliations[0].raw_affiliation_string | Department of Preventive Medicine and Public Health, Yonsei University College of Medicine, 50-1, Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea |
| authorships[2].institutions[0].id | https://openalex.org/I193775966 |
| authorships[2].institutions[0].ror | https://ror.org/01wjejq96 |
| authorships[2].institutions[0].type | education |
| authorships[2].institutions[0].lineage | https://openalex.org/I193775966 |
| authorships[2].institutions[0].country_code | KR |
| authorships[2].institutions[0].display_name | Yonsei University |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Jaeyong Shin |
| authorships[2].is_corresponding | True |
| authorships[2].raw_affiliation_strings | Department of Preventive Medicine and Public Health, Yonsei University College of Medicine, 50-1, Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea |
| authorships[3].author.id | https://openalex.org/A5108128071 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Belong Cho |
| authorships[3].countries | KR |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I139264467 |
| authorships[3].affiliations[0].raw_affiliation_string | Department of Human Systems Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea |
| authorships[3].institutions[0].id | https://openalex.org/I139264467 |
| authorships[3].institutions[0].ror | https://ror.org/04h9pn542 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I139264467 |
| authorships[3].institutions[0].country_code | KR |
| authorships[3].institutions[0].display_name | Seoul National University |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Belong Cho |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Department of Human Systems Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea |
| has_content.pdf | True |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://bmcmedinformdecismak.biomedcentral.com/counter/pdf/10.1186/s12911-024-02709-7 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Analyzing evaluation methods for large language models in the medical field: a scoping review |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-23T05:10:03.516525 |
| primary_topic.id | https://openalex.org/T11636 |
| primary_topic.field.id | https://openalex.org/fields/27 |
| primary_topic.field.display_name | Medicine |
| primary_topic.score | 0.9993000030517578 |
| primary_topic.domain.id | https://openalex.org/domains/4 |
| primary_topic.domain.display_name | Health Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2718 |
| primary_topic.subfield.display_name | Health Informatics |
| primary_topic.display_name | Artificial Intelligence in Healthcare and Education |
| related_works | https://openalex.org/W2368605798, https://openalex.org/W2518037665, https://openalex.org/W2348524959, https://openalex.org/W2477036161, https://openalex.org/W2368049389, https://openalex.org/W2384861574, https://openalex.org/W4294565801, https://openalex.org/W2405487481, https://openalex.org/W198576020, https://openalex.org/W2146348055 |
| cited_by_count | 22 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 22 |
| locations_count | 4 |
| best_oa_location.id | doi:10.1186/s12911-024-02709-7 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S107516304 |
| best_oa_location.source.issn | 1472-6947 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 1472-6947 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | True |
| best_oa_location.source.display_name | BMC Medical Informatics and Decision Making |
| best_oa_location.source.host_organization | https://openalex.org/P4310320256 |
| best_oa_location.source.host_organization_name | BioMed Central |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320256, https://openalex.org/P4310319965 |
| best_oa_location.source.host_organization_lineage_names | BioMed Central, Springer Nature |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://bmcmedinformdecismak.biomedcentral.com/counter/pdf/10.1186/s12911-024-02709-7 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | BMC Medical Informatics and Decision Making |
| best_oa_location.landing_page_url | https://doi.org/10.1186/s12911-024-02709-7 |
| primary_location.id | doi:10.1186/s12911-024-02709-7 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S107516304 |
| primary_location.source.issn | 1472-6947 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 1472-6947 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | True |
| primary_location.source.display_name | BMC Medical Informatics and Decision Making |
| primary_location.source.host_organization | https://openalex.org/P4310320256 |
| primary_location.source.host_organization_name | BioMed Central |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320256, https://openalex.org/P4310319965 |
| primary_location.source.host_organization_lineage_names | BioMed Central, Springer Nature |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://bmcmedinformdecismak.biomedcentral.com/counter/pdf/10.1186/s12911-024-02709-7 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | BMC Medical Informatics and Decision Making |
| primary_location.landing_page_url | https://doi.org/10.1186/s12911-024-02709-7 |
| publication_date | 2024-11-29 |
| publication_year | 2024 |
| referenced_works | https://openalex.org/W4384561707, https://openalex.org/W4320495408, https://openalex.org/W4376866715, https://openalex.org/W4386423073, https://openalex.org/W4386973901, https://openalex.org/W4381550225, https://openalex.org/W4367595583, https://openalex.org/W4319779057, https://openalex.org/W4389508866, https://openalex.org/W4366769280, https://openalex.org/W4387144848, https://openalex.org/W4392642871, https://openalex.org/W4377197101, https://openalex.org/W4378214315, https://openalex.org/W4376872703, https://openalex.org/W4382930233, https://openalex.org/W2075950485, https://openalex.org/W2091538045, https://openalex.org/W2901669506, https://openalex.org/W2891378911, https://openalex.org/W4365137614, https://openalex.org/W4385827730, https://openalex.org/W4317910576, https://openalex.org/W4376640725, https://openalex.org/W4385491575, https://openalex.org/W4386157637, https://openalex.org/W4386438954, https://openalex.org/W4383371705, https://openalex.org/W4377030810, https://openalex.org/W4380356334, https://openalex.org/W4385612923, https://openalex.org/W4380786006, https://openalex.org/W4319460874, https://openalex.org/W4385988139, https://openalex.org/W4386346510, https://openalex.org/W4367668444, https://openalex.org/W4379599010, https://openalex.org/W4384484700, https://openalex.org/W4360840406, https://openalex.org/W4386457251, https://openalex.org/W4368360859, https://openalex.org/W4386272304, https://openalex.org/W4385597618, https://openalex.org/W4367678030, https://openalex.org/W4386553312, https://openalex.org/W4385563364, https://openalex.org/W4384024640, https://openalex.org/W4378976945, https://openalex.org/W4385269545, https://openalex.org/W4386448461, https://openalex.org/W4384695580, https://openalex.org/W4367175039, https://openalex.org/W4381743186, https://openalex.org/W4380303971, https://openalex.org/W4372047097, https://openalex.org/W4385751377, https://openalex.org/W4379093714, https://openalex.org/W4385812326, https://openalex.org/W4386407038, https://openalex.org/W4379377506, https://openalex.org/W4385351447, https://openalex.org/W4384337881, https://openalex.org/W4384305044, https://openalex.org/W4383311938, https://openalex.org/W4377220156, https://openalex.org/W4378528433, https://openalex.org/W4380685958, https://openalex.org/W4386596026, https://openalex.org/W4364378939, https://openalex.org/W4385334680, https://openalex.org/W4385568225, https://openalex.org/W4385898137, https://openalex.org/W4376133327, https://openalex.org/W4385295268, https://openalex.org/W4381427645, https://openalex.org/W4367310920, https://openalex.org/W4385476863, https://openalex.org/W4383501093, https://openalex.org/W4386151807, https://openalex.org/W4386390690, https://openalex.org/W4386046428, https://openalex.org/W4386196128, https://openalex.org/W4384922275, https://openalex.org/W4366603014, https://openalex.org/W4390063623, https://openalex.org/W4386392833, https://openalex.org/W4386692532, https://openalex.org/W4388294257, https://openalex.org/W4382929886, https://openalex.org/W4383710066, https://openalex.org/W4386207798, https://openalex.org/W4386726071, https://openalex.org/W4385331619, https://openalex.org/W4386865118, https://openalex.org/W4382282792, https://openalex.org/W4360822110, https://openalex.org/W4320920036, https://openalex.org/W4386200227, https://openalex.org/W4383749364, https://openalex.org/W4386346796, https://openalex.org/W4386757039, https://openalex.org/W4322622443, https://openalex.org/W4385667643, https://openalex.org/W4381838911, https://openalex.org/W4383618574, https://openalex.org/W4382198789, https://openalex.org/W4386998832, https://openalex.org/W4386033569, https://openalex.org/W4386757283, https://openalex.org/W4379511585, https://openalex.org/W4379231355, https://openalex.org/W4386110374, https://openalex.org/W4366548437, https://openalex.org/W4380302792, https://openalex.org/W4377010595, https://openalex.org/W4384499300, https://openalex.org/W4379209901, https://openalex.org/W4386843688, https://openalex.org/W4386053290, https://openalex.org/W4380423243, https://openalex.org/W4381480701, https://openalex.org/W4386045865, https://openalex.org/W4366332570, https://openalex.org/W4384337834, https://openalex.org/W4386735567, https://openalex.org/W4386287921, https://openalex.org/W4386414742, https://openalex.org/W4381682643, https://openalex.org/W4385299173, https://openalex.org/W4386644975, https://openalex.org/W4386448866, https://openalex.org/W4386096098, https://openalex.org/W4384626331, https://openalex.org/W4377563830, https://openalex.org/W4385997381, https://openalex.org/W4366462753, https://openalex.org/W4382395040, https://openalex.org/W4384564606, https://openalex.org/W4376636500, https://openalex.org/W4377157938, https://openalex.org/W4386288022, https://openalex.org/W4360976361, https://openalex.org/W4384821243, https://openalex.org/W4372347944, https://openalex.org/W4379467554, https://openalex.org/W4319662928, https://openalex.org/W4364363895, https://openalex.org/W4386510404 |
| referenced_works_count | 148 |
| abstract_inverted_index.( | 146, 158, 167, 178, 190, 221, 229 |
| abstract_inverted_index.8 | 206 |
| abstract_inverted_index.= | 148, 160, 169, 180, 192, 223, 231 |
| abstract_inverted_index.A | 127, 266 |
| abstract_inverted_index.a | 65, 78, 155, 173 |
| abstract_inverted_index.n | 147, 159, 168, 179, 191, 222, 230 |
| abstract_inverted_index.1, | 95 |
| abstract_inverted_index.14 | 235 |
| abstract_inverted_index.15 | 195 |
| abstract_inverted_index.18 | 200 |
| abstract_inverted_index.4, | 181 |
| abstract_inverted_index.5, | 170 |
| abstract_inverted_index.50 | 217 |
| abstract_inverted_index.It | 61 |
| abstract_inverted_index.We | 76, 101 |
| abstract_inverted_index.as | 141 |
| abstract_inverted_index.be | 275 |
| abstract_inverted_index.by | 154 |
| abstract_inverted_index.in | 7, 26, 48, 58, 250 |
| abstract_inverted_index.is | 32, 243, 269 |
| abstract_inverted_index.no | 34 |
| abstract_inverted_index.of | 10, 81, 105, 108, 118, 129, 175, 248 |
| abstract_inverted_index.on | 45, 263 |
| abstract_inverted_index.or | 151, 172, 187, 218 |
| abstract_inverted_index.to | 3, 22, 63, 88, 274 |
| abstract_inverted_index.100 | 186 |
| abstract_inverted_index.142 | 130 |
| abstract_inverted_index.18, | 193 |
| abstract_inverted_index.30, | 99 |
| abstract_inverted_index.43, | 232 |
| abstract_inverted_index.53, | 149 |
| abstract_inverted_index.54, | 224 |
| abstract_inverted_index.80, | 161 |
| abstract_inverted_index.For | 211 |
| abstract_inverted_index.LLM | 46, 71, 136 |
| abstract_inverted_index.and | 52, 86, 97, 121, 205, 234 |
| abstract_inverted_index.for | 37, 67, 271 |
| abstract_inverted_index.had | 185, 226 |
| abstract_inverted_index.met | 132 |
| abstract_inverted_index.the | 4, 8, 27, 49, 54, 103, 133, 176, 246 |
| abstract_inverted_index.two | 177, 227 |
| abstract_inverted_index.use | 117 |
| abstract_inverted_index.was | 138 |
| abstract_inverted_index.LLMs | 249 |
| abstract_inverted_index.More | 241 |
| abstract_inverted_index.Most | 183 |
| abstract_inverted_index.This | 41 |
| abstract_inverted_index.aims | 62 |
| abstract_inverted_index.been | 20 |
| abstract_inverted_index.have | 19, 255 |
| abstract_inverted_index.most | 214 |
| abstract_inverted_index.some | 164 |
| abstract_inverted_index.test | 144 |
| abstract_inverted_index.than | 124 |
| abstract_inverted_index.used | 57, 208, 216, 237 |
| abstract_inverted_index.will | 260 |
| abstract_inverted_index.with | 163 |
| abstract_inverted_index.& | 74 |
| abstract_inverted_index.2023, | 96 |
| abstract_inverted_index.2023. | 100 |
| abstract_inverted_index.3.5%) | 171 |
| abstract_inverted_index.LLMs. | 39 |
| abstract_inverted_index.Large | 11 |
| abstract_inverted_index.Owing | 2 |
| abstract_inverted_index.being | 152 |
| abstract_inverted_index.cases | 166 |
| abstract_inverted_index.clear | 35 |
| abstract_inverted_index.fewer | 188, 219 |
| abstract_inverted_index.field | 51 |
| abstract_inverted_index.focus | 262 |
| abstract_inverted_index.other | 123 |
| abstract_inverted_index.rapid | 5 |
| abstract_inverted_index.still | 33 |
| abstract_inverted_index.study | 42 |
| abstract_inverted_index.their | 24 |
| abstract_inverted_index.there | 31 |
| abstract_inverted_index.these | 59, 272 |
| abstract_inverted_index.three | 82 |
| abstract_inverted_index.total | 128 |
| abstract_inverted_index.types | 104 |
| abstract_inverted_index.2.8%). | 182 |
| abstract_inverted_index.37.3%) | 150 |
| abstract_inverted_index.Models | 13 |
| abstract_inverted_index.either | 142 |
| abstract_inverted_index.field. | 29 |
| abstract_inverted_index.future | 68, 258 |
| abstract_inverted_index.growth | 6 |
| abstract_inverted_index.hybrid | 165 |
| abstract_inverted_index.likely | 261 |
| abstract_inverted_index.number | 107 |
| abstract_inverted_index.prompt | 119, 209, 238 |
| abstract_inverted_index.repeat | 112 |
| abstract_inverted_index.review | 80 |
| abstract_inverted_index.(12.9%) | 207 |
| abstract_inverted_index.(14.7%) | 236 |
| abstract_inverted_index.(24.2%) | 196 |
| abstract_inverted_index.(29.0%) | 201 |
| abstract_inverted_index.(LLMs), | 14 |
| abstract_inverted_index.29.0%), | 194 |
| abstract_inverted_index.48.3%), | 233 |
| abstract_inverted_index.56.3%), | 162 |
| abstract_inverted_index.64.3%), | 225 |
| abstract_inverted_index.Embase, | 85 |
| abstract_inverted_index.January | 94 |
| abstract_inverted_index.Methods | 73 |
| abstract_inverted_index.Results | 126 |
| abstract_inverted_index.between | 93 |
| abstract_inverted_index.confirm | 23 |
| abstract_inverted_index.medical | 28, 50, 156, 212 |
| abstract_inverted_index.methods | 56 |
| abstract_inverted_index.metrics | 122 |
| abstract_inverted_index.provide | 64 |
| abstract_inverted_index.queries | 220 |
| abstract_inverted_index.reviews | 43 |
| abstract_inverted_index.scoping | 79 |
| abstract_inverted_index.studies | 18, 44, 184, 215, 254, 259, 273 |
| abstract_inverted_index.various | 15 |
| abstract_inverted_index.(PubMed, | 84 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Although | 252 |
| abstract_inverted_index.However, | 30 |
| abstract_inverted_index.Language | 12 |
| abstract_inverted_index.MEDLINE) | 87 |
| abstract_inverted_index.analysis | 115 |
| abstract_inverted_index.analyzed | 102 |
| abstract_inverted_index.analyzes | 53 |
| abstract_inverted_index.articles | 91, 131 |
| abstract_inverted_index.identify | 89 |
| abstract_inverted_index.methods, | 106, 116 |
| abstract_inverted_index.previous | 253 |
| abstract_inverted_index.repeated | 198 |
| abstract_inverted_index.required | 244, 270 |
| abstract_inverted_index.research | 55, 242 |
| abstract_inverted_index.studies. | 60, 72 |
| abstract_inverted_index.Objective | 40 |
| abstract_inverted_index.September | 98 |
| abstract_inverted_index.accuracy. | 125 |
| abstract_inverted_index.analyses, | 204 |
| abstract_inverted_index.conducted | 21, 77, 276 |
| abstract_inverted_index.criteria. | 135 |
| abstract_inverted_index.databases | 83 |
| abstract_inverted_index.designing | 70 |
| abstract_inverted_index.evaluated | 153, 256 |
| abstract_inverted_index.framework | 36 |
| abstract_inverted_index.improving | 264 |
| abstract_inverted_index.inclusion | 134 |
| abstract_inverted_index.materials | 75 |
| abstract_inverted_index.performed | 197, 202 |
| abstract_inverted_index.primarily | 139 |
| abstract_inverted_index.providing | 143 |
| abstract_inverted_index.published | 92 |
| abstract_inverted_index.questions | 109, 189 |
| abstract_inverted_index.reference | 66 |
| abstract_inverted_index.regarding | 245 |
| abstract_inverted_index.(queries), | 110 |
| abstract_inverted_index.Background | 1 |
| abstract_inverted_index.additional | 114, 203 |
| abstract_inverted_index.evaluating | 38 |
| abstract_inverted_index.evaluation | 17, 137 |
| abstract_inverted_index.evaluators | 228 |
| abstract_inverted_index.popularity | 9 |
| abstract_inverted_index.Conclusions | 240 |
| abstract_inverted_index.LLM-related | 90 |
| abstract_inverted_index.application | 247 |
| abstract_inverted_index.assessment, | 213 |
| abstract_inverted_index.categorized | 140 |
| abstract_inverted_index.combination | 174 |
| abstract_inverted_index.evaluations | 47 |
| abstract_inverted_index.evaluators, | 111 |
| abstract_inverted_index.healthcare. | 251 |
| abstract_inverted_index.methodology | 268 |
| abstract_inverted_index.performance | 16 |
| abstract_inverted_index.researchers | 69 |
| abstract_inverted_index.engineering, | 120 |
| abstract_inverted_index.engineering. | 210, 239 |
| abstract_inverted_index.examinations | 145 |
| abstract_inverted_index.performance, | 257 |
| abstract_inverted_index.performance. | 265 |
| abstract_inverted_index.professional | 157 |
| abstract_inverted_index.applicability | 25 |
| abstract_inverted_index.measurements, | 113, 199 |
| abstract_inverted_index.systematically. | 277 |
| abstract_inverted_index.well-structured | 267 |
| cited_by_percentile_year.max | 100 |
| cited_by_percentile_year.min | 99 |
| corresponding_author_ids | https://openalex.org/A5000615161 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 4 |
| corresponding_institution_ids | https://openalex.org/I193775966 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.4399999976158142 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.92980437 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |