Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.2196/57674
Background Large language models (LLMs) have achieved great progress in natural language processing tasks and demonstrated the potential for use in clinical applications. Despite their capabilities, LLMs in the medical domain are prone to generating hallucinations (not fully reliable responses). Hallucinations in LLMs’ responses create substantial risks, potentially threatening patients’ physical safety. Thus, to perceive and prevent this safety risk, it is essential to evaluate LLMs in the medical domain and build a systematic evaluation. Objective We developed a comprehensive evaluation system, MedGPTEval, composed of criteria, medical data sets in Chinese, and publicly available benchmarks. Methods First, a set of evaluation criteria was designed based on a comprehensive literature review. Second, existing candidate criteria were optimized by using a Delphi method with 5 experts in medicine and engineering. Third, 3 clinical experts designed medical data sets to interact with LLMs. Finally, benchmarking experiments were conducted on the data sets. The responses generated by chatbots based on LLMs were recorded for blind evaluations by 5 licensed medical experts. The evaluation criteria that were obtained covered medical professional capabilities, social comprehensive capabilities, contextual capabilities, and computational robustness, with 16 detailed indicators. The medical data sets include 27 medical dialogues and 7 case reports in Chinese. Three chatbots were evaluated: ChatGPT by OpenAI; ERNIE Bot by Baidu, Inc; and Doctor PuJiang (Dr PJ) by Shanghai Artificial Intelligence Laboratory. Results Dr PJ outperformed ChatGPT and ERNIE Bot in the multiple-turn medical dialogues and case report scenarios. Dr PJ also outperformed ChatGPT in the semantic consistency rate and complete error rate category, indicating better robustness. However, Dr PJ had slightly lower scores in medical professional capabilities compared with ChatGPT in the multiple-turn dialogue scenario. Conclusions MedGPTEval provides comprehensive criteria to evaluate chatbots by LLMs in the medical domain, open-source data sets, and benchmarks assessing 3 LLMs. Experimental results demonstrate that Dr PJ outperforms ChatGPT and ERNIE Bot in social and professional contexts. Therefore, such an assessment system can be easily adopted by researchers in this community to augment an open-source data set.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.2196/57674
- OA Status
- gold
- Cited By
- 14
- References
- 30
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4400261110
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4400261110Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.2196/57674Digital Object Identifier
- Title
-
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and ValidationWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-06-28Full publication date if available
- Authors
-
Jie Xu, Lu Lu, Xinwei Peng, Jiali Pang, Jinru Ding, Lingrui Yang, Huan Song, Kang Li, Xin Sun, Shaoting ZhangList of authors in order
- Landing page
-
https://doi.org/10.2196/57674Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.2196/57674Direct OA link when available
- Concepts
-
Computer science, Benchmark (surveying), Set (abstract data type), Data set, Data mining, Natural language processing, Artificial intelligence, Data science, Programming language, Cartography, GeographyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
14Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 9, 2024: 5Per-year citation counts (last 5 years)
- References (count)
-
30Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4400261110 |
|---|---|
| doi | https://doi.org/10.2196/57674 |
| ids.doi | https://doi.org/10.2196/57674 |
| ids.pmid | https://pubmed.ncbi.nlm.nih.gov/38952020 |
| ids.openalex | https://openalex.org/W4400261110 |
| fwci | 3.40759593 |
| type | article |
| title | Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation |
| biblio.issue | |
| biblio.volume | 12 |
| biblio.last_page | e57674 |
| biblio.first_page | e57674 |
| topics[0].id | https://openalex.org/T11636 |
| topics[0].field.id | https://openalex.org/fields/27 |
| topics[0].field.display_name | Medicine |
| topics[0].score | 0.9991999864578247 |
| topics[0].domain.id | https://openalex.org/domains/4 |
| topics[0].domain.display_name | Health Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/2718 |
| topics[0].subfield.display_name | Health Informatics |
| topics[0].display_name | Artificial Intelligence in Healthcare and Education |
| topics[1].id | https://openalex.org/T12422 |
| topics[1].field.id | https://openalex.org/fields/27 |
| topics[1].field.display_name | Medicine |
| topics[1].score | 0.9948999881744385 |
| topics[1].domain.id | https://openalex.org/domains/4 |
| topics[1].domain.display_name | Health Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/2741 |
| topics[1].subfield.display_name | Radiology, Nuclear Medicine and Imaging |
| topics[1].display_name | Radiomics and Machine Learning in Medical Imaging |
| topics[2].id | https://openalex.org/T13702 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9713000059127808 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Machine Learning in Healthcare |
| is_xpac | False |
| apc_list.value | 2300 |
| apc_list.currency | USD |
| apc_list.value_usd | 2300 |
| apc_paid.value | 2300 |
| apc_paid.currency | USD |
| apc_paid.value_usd | 2300 |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.654850959777832 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C185798385 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5817117691040039 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q1161707 |
| concepts[1].display_name | Benchmark (surveying) |
| concepts[2].id | https://openalex.org/C177264268 |
| concepts[2].level | 2 |
| concepts[2].score | 0.5366764664649963 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[2].display_name | Set (abstract data type) |
| concepts[3].id | https://openalex.org/C58489278 |
| concepts[3].level | 2 |
| concepts[3].score | 0.4728175401687622 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1172284 |
| concepts[3].display_name | Data set |
| concepts[4].id | https://openalex.org/C124101348 |
| concepts[4].level | 1 |
| concepts[4].score | 0.43072766065597534 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[4].display_name | Data mining |
| concepts[5].id | https://openalex.org/C204321447 |
| concepts[5].level | 1 |
| concepts[5].score | 0.42457857728004456 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[5].display_name | Natural language processing |
| concepts[6].id | https://openalex.org/C154945302 |
| concepts[6].level | 1 |
| concepts[6].score | 0.39128193259239197 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[6].display_name | Artificial intelligence |
| concepts[7].id | https://openalex.org/C2522767166 |
| concepts[7].level | 1 |
| concepts[7].score | 0.3298330008983612 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2374463 |
| concepts[7].display_name | Data science |
| concepts[8].id | https://openalex.org/C199360897 |
| concepts[8].level | 1 |
| concepts[8].score | 0.2742627263069153 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[8].display_name | Programming language |
| concepts[9].id | https://openalex.org/C58640448 |
| concepts[9].level | 1 |
| concepts[9].score | 0.10759422183036804 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q42515 |
| concepts[9].display_name | Cartography |
| concepts[10].id | https://openalex.org/C205649164 |
| concepts[10].level | 0 |
| concepts[10].score | 0.0 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q1071 |
| concepts[10].display_name | Geography |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.654850959777832 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/benchmark |
| keywords[1].score | 0.5817117691040039 |
| keywords[1].display_name | Benchmark (surveying) |
| keywords[2].id | https://openalex.org/keywords/set |
| keywords[2].score | 0.5366764664649963 |
| keywords[2].display_name | Set (abstract data type) |
| keywords[3].id | https://openalex.org/keywords/data-set |
| keywords[3].score | 0.4728175401687622 |
| keywords[3].display_name | Data set |
| keywords[4].id | https://openalex.org/keywords/data-mining |
| keywords[4].score | 0.43072766065597534 |
| keywords[4].display_name | Data mining |
| keywords[5].id | https://openalex.org/keywords/natural-language-processing |
| keywords[5].score | 0.42457857728004456 |
| keywords[5].display_name | Natural language processing |
| keywords[6].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[6].score | 0.39128193259239197 |
| keywords[6].display_name | Artificial intelligence |
| keywords[7].id | https://openalex.org/keywords/data-science |
| keywords[7].score | 0.3298330008983612 |
| keywords[7].display_name | Data science |
| keywords[8].id | https://openalex.org/keywords/programming-language |
| keywords[8].score | 0.2742627263069153 |
| keywords[8].display_name | Programming language |
| keywords[9].id | https://openalex.org/keywords/cartography |
| keywords[9].score | 0.10759422183036804 |
| keywords[9].display_name | Cartography |
| language | en |
| locations[0].id | doi:10.2196/57674 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S2764650051 |
| locations[0].source.issn | 2291-9694 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2291-9694 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | True |
| locations[0].source.display_name | JMIR Medical Informatics |
| locations[0].source.host_organization | https://openalex.org/P4310320608 |
| locations[0].source.host_organization_name | JMIR Publications |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310320608 |
| locations[0].source.host_organization_lineage_names | JMIR Publications |
| locations[0].license | cc-by |
| locations[0].pdf_url | |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | JMIR Medical Informatics |
| locations[0].landing_page_url | https://doi.org/10.2196/57674 |
| locations[1].id | pmid:38952020 |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306525036 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | PubMed |
| locations[1].source.host_organization | https://openalex.org/I1299303238 |
| locations[1].source.host_organization_name | National Institutes of Health |
| locations[1].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | publishedVersion |
| locations[1].raw_type | |
| locations[1].license_id | |
| locations[1].is_accepted | True |
| locations[1].is_published | True |
| locations[1].raw_source_name | JMIR medical informatics |
| locations[1].landing_page_url | https://pubmed.ncbi.nlm.nih.gov/38952020 |
| locations[2].id | pmh:oai:doaj.org/article:cdc2c9161fd64c2e95096d468c659585 |
| locations[2].is_oa | False |
| locations[2].source.id | https://openalex.org/S4306401280 |
| locations[2].source.issn | |
| locations[2].source.type | repository |
| locations[2].source.is_oa | False |
| locations[2].source.issn_l | |
| locations[2].source.is_core | False |
| locations[2].source.is_in_doaj | False |
| locations[2].source.display_name | DOAJ (DOAJ: Directory of Open Access Journals) |
| locations[2].source.host_organization | |
| locations[2].source.host_organization_name | |
| locations[2].license | |
| locations[2].pdf_url | |
| locations[2].version | submittedVersion |
| locations[2].raw_type | article |
| locations[2].license_id | |
| locations[2].is_accepted | False |
| locations[2].is_published | False |
| locations[2].raw_source_name | JMIR Medical Informatics, Vol 12, Pp e57674-e57674 (2024) |
| locations[2].landing_page_url | https://doaj.org/article/cdc2c9161fd64c2e95096d468c659585 |
| locations[3].id | pmh:oai:pubmedcentral.nih.gov:11225096 |
| locations[3].is_oa | True |
| locations[3].source.id | https://openalex.org/S2764455111 |
| locations[3].source.issn | |
| locations[3].source.type | repository |
| locations[3].source.is_oa | False |
| locations[3].source.issn_l | |
| locations[3].source.is_core | False |
| locations[3].source.is_in_doaj | False |
| locations[3].source.display_name | PubMed Central |
| locations[3].source.host_organization | https://openalex.org/I1299303238 |
| locations[3].source.host_organization_name | National Institutes of Health |
| locations[3].source.host_organization_lineage | https://openalex.org/I1299303238 |
| locations[3].license | other-oa |
| locations[3].pdf_url | |
| locations[3].version | submittedVersion |
| locations[3].raw_type | Text |
| locations[3].license_id | https://openalex.org/licenses/other-oa |
| locations[3].is_accepted | False |
| locations[3].is_published | False |
| locations[3].raw_source_name | JMIR Med Inform |
| locations[3].landing_page_url | https://www.ncbi.nlm.nih.gov/pmc/articles/11225096 |
| indexed_in | crossref, doaj, pubmed |
| authorships[0].author.id | https://openalex.org/A5052532450 |
| authorships[0].author.orcid | https://orcid.org/0000-0001-9233-4363 |
| authorships[0].author.display_name | Jie Xu |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I4391012619 |
| authorships[0].affiliations[0].raw_affiliation_string | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[0].institutions[0].id | https://openalex.org/I4391012619 |
| authorships[0].institutions[0].ror | https://ror.org/03wkvpx79 |
| authorships[0].institutions[0].type | facility |
| authorships[0].institutions[0].lineage | https://openalex.org/I4391012619 |
| authorships[0].institutions[0].country_code | |
| authorships[0].institutions[0].display_name | Shanghai Artificial Intelligence Laboratory |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Jie Xu |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[1].author.id | https://openalex.org/A5013657813 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-8834-1947 |
| authorships[1].author.display_name | Lu Lu |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I4391012619 |
| authorships[1].affiliations[0].raw_affiliation_string | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[1].institutions[0].id | https://openalex.org/I4391012619 |
| authorships[1].institutions[0].ror | https://ror.org/03wkvpx79 |
| authorships[1].institutions[0].type | facility |
| authorships[1].institutions[0].lineage | https://openalex.org/I4391012619 |
| authorships[1].institutions[0].country_code | |
| authorships[1].institutions[0].display_name | Shanghai Artificial Intelligence Laboratory |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Lu Lu |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[2].author.id | https://openalex.org/A5080491811 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-9868-0136 |
| authorships[2].author.display_name | Xinwei Peng |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I4391012619 |
| authorships[2].affiliations[0].raw_affiliation_string | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[2].institutions[0].id | https://openalex.org/I4391012619 |
| authorships[2].institutions[0].ror | https://ror.org/03wkvpx79 |
| authorships[2].institutions[0].type | facility |
| authorships[2].institutions[0].lineage | https://openalex.org/I4391012619 |
| authorships[2].institutions[0].country_code | |
| authorships[2].institutions[0].display_name | Shanghai Artificial Intelligence Laboratory |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Xinwei Peng |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[3].author.id | https://openalex.org/A5020869854 |
| authorships[3].author.orcid | https://orcid.org/0000-0001-5757-4804 |
| authorships[3].author.display_name | Jiali Pang |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I4391012619 |
| authorships[3].affiliations[0].raw_affiliation_string | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[3].institutions[0].id | https://openalex.org/I4391012619 |
| authorships[3].institutions[0].ror | https://ror.org/03wkvpx79 |
| authorships[3].institutions[0].type | facility |
| authorships[3].institutions[0].lineage | https://openalex.org/I4391012619 |
| authorships[3].institutions[0].country_code | |
| authorships[3].institutions[0].display_name | Shanghai Artificial Intelligence Laboratory |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Jiali Pang |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[4].author.id | https://openalex.org/A5045197394 |
| authorships[4].author.orcid | https://orcid.org/0009-0005-0399-7656 |
| authorships[4].author.display_name | Jinru Ding |
| authorships[4].affiliations[0].institution_ids | https://openalex.org/I4391012619 |
| authorships[4].affiliations[0].raw_affiliation_string | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[4].institutions[0].id | https://openalex.org/I4391012619 |
| authorships[4].institutions[0].ror | https://ror.org/03wkvpx79 |
| authorships[4].institutions[0].type | facility |
| authorships[4].institutions[0].lineage | https://openalex.org/I4391012619 |
| authorships[4].institutions[0].country_code | |
| authorships[4].institutions[0].display_name | Shanghai Artificial Intelligence Laboratory |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Jinru Ding |
| authorships[4].is_corresponding | False |
| authorships[4].raw_affiliation_strings | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[5].author.id | https://openalex.org/A5050983412 |
| authorships[5].author.orcid | https://orcid.org/0009-0005-4902-1034 |
| authorships[5].author.display_name | Lingrui Yang |
| authorships[5].countries | CN |
| authorships[5].affiliations[0].institution_ids | https://openalex.org/I183067930, https://openalex.org/I2801518579 |
| authorships[5].affiliations[0].raw_affiliation_string | Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China |
| authorships[5].institutions[0].id | https://openalex.org/I183067930 |
| authorships[5].institutions[0].ror | https://ror.org/0220qvk04 |
| authorships[5].institutions[0].type | education |
| authorships[5].institutions[0].lineage | https://openalex.org/I183067930 |
| authorships[5].institutions[0].country_code | CN |
| authorships[5].institutions[0].display_name | Shanghai Jiao Tong University |
| authorships[5].institutions[1].id | https://openalex.org/I2801518579 |
| authorships[5].institutions[1].ror | https://ror.org/04dzvks42 |
| authorships[5].institutions[1].type | healthcare |
| authorships[5].institutions[1].lineage | https://openalex.org/I2801518579 |
| authorships[5].institutions[1].country_code | CN |
| authorships[5].institutions[1].display_name | XinHua Hospital |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Lingrui Yang |
| authorships[5].is_corresponding | False |
| authorships[5].raw_affiliation_strings | Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China |
| authorships[6].author.id | https://openalex.org/A5043778832 |
| authorships[6].author.orcid | https://orcid.org/0000-0003-3845-8079 |
| authorships[6].author.display_name | Huan Song |
| authorships[6].countries | CN |
| authorships[6].affiliations[0].institution_ids | https://openalex.org/I24185976 |
| authorships[6].affiliations[0].raw_affiliation_string | Med-X Center for Informatics, Sichuan University, Chengdu, China |
| authorships[6].affiliations[1].institution_ids | https://openalex.org/I2800091995 |
| authorships[6].affiliations[1].raw_affiliation_string | West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China |
| authorships[6].institutions[0].id | https://openalex.org/I24185976 |
| authorships[6].institutions[0].ror | https://ror.org/011ashp19 |
| authorships[6].institutions[0].type | education |
| authorships[6].institutions[0].lineage | https://openalex.org/I24185976 |
| authorships[6].institutions[0].country_code | CN |
| authorships[6].institutions[0].display_name | Sichuan University |
| authorships[6].institutions[1].id | https://openalex.org/I2800091995 |
| authorships[6].institutions[1].ror | https://ror.org/040nggs60 |
| authorships[6].institutions[1].type | healthcare |
| authorships[6].institutions[1].lineage | https://openalex.org/I24185976, https://openalex.org/I2800091995 |
| authorships[6].institutions[1].country_code | CN |
| authorships[6].institutions[1].display_name | West China Medical Center of Sichuan University |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Huan Song |
| authorships[6].is_corresponding | False |
| authorships[6].raw_affiliation_strings | Med-X Center for Informatics, Sichuan University, Chengdu, China, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China |
| authorships[7].author.id | https://openalex.org/A5100456986 |
| authorships[7].author.orcid | https://orcid.org/0000-0002-8136-9816 |
| authorships[7].author.display_name | Kang Li |
| authorships[7].countries | CN |
| authorships[7].affiliations[0].institution_ids | https://openalex.org/I24185976 |
| authorships[7].affiliations[0].raw_affiliation_string | Med-X Center for Informatics, Sichuan University, Chengdu, China |
| authorships[7].affiliations[1].institution_ids | https://openalex.org/I2800091995 |
| authorships[7].affiliations[1].raw_affiliation_string | West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China |
| authorships[7].institutions[0].id | https://openalex.org/I24185976 |
| authorships[7].institutions[0].ror | https://ror.org/011ashp19 |
| authorships[7].institutions[0].type | education |
| authorships[7].institutions[0].lineage | https://openalex.org/I24185976 |
| authorships[7].institutions[0].country_code | CN |
| authorships[7].institutions[0].display_name | Sichuan University |
| authorships[7].institutions[1].id | https://openalex.org/I2800091995 |
| authorships[7].institutions[1].ror | https://ror.org/040nggs60 |
| authorships[7].institutions[1].type | healthcare |
| authorships[7].institutions[1].lineage | https://openalex.org/I24185976, https://openalex.org/I2800091995 |
| authorships[7].institutions[1].country_code | CN |
| authorships[7].institutions[1].display_name | West China Medical Center of Sichuan University |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Kang Li |
| authorships[7].is_corresponding | False |
| authorships[7].raw_affiliation_strings | Med-X Center for Informatics, Sichuan University, Chengdu, China, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China |
| authorships[8].author.id | https://openalex.org/A5100950605 |
| authorships[8].author.orcid | https://orcid.org/0009-0003-7223-5298 |
| authorships[8].author.display_name | Xin Sun |
| authorships[8].countries | CN |
| authorships[8].affiliations[0].institution_ids | https://openalex.org/I183067930, https://openalex.org/I2801518579 |
| authorships[8].affiliations[0].raw_affiliation_string | Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China |
| authorships[8].institutions[0].id | https://openalex.org/I183067930 |
| authorships[8].institutions[0].ror | https://ror.org/0220qvk04 |
| authorships[8].institutions[0].type | education |
| authorships[8].institutions[0].lineage | https://openalex.org/I183067930 |
| authorships[8].institutions[0].country_code | CN |
| authorships[8].institutions[0].display_name | Shanghai Jiao Tong University |
| authorships[8].institutions[1].id | https://openalex.org/I2801518579 |
| authorships[8].institutions[1].ror | https://ror.org/04dzvks42 |
| authorships[8].institutions[1].type | healthcare |
| authorships[8].institutions[1].lineage | https://openalex.org/I2801518579 |
| authorships[8].institutions[1].country_code | CN |
| authorships[8].institutions[1].display_name | XinHua Hospital |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Xin Sun |
| authorships[8].is_corresponding | False |
| authorships[8].raw_affiliation_strings | Clinical Research and Innovation Unit, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China |
| authorships[9].author.id | https://openalex.org/A5066553616 |
| authorships[9].author.orcid | https://orcid.org/0000-0002-8719-448X |
| authorships[9].author.display_name | Shaoting Zhang |
| authorships[9].affiliations[0].institution_ids | https://openalex.org/I4391012619 |
| authorships[9].affiliations[0].raw_affiliation_string | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| authorships[9].institutions[0].id | https://openalex.org/I4391012619 |
| authorships[9].institutions[0].ror | https://ror.org/03wkvpx79 |
| authorships[9].institutions[0].type | facility |
| authorships[9].institutions[0].lineage | https://openalex.org/I4391012619 |
| authorships[9].institutions[0].country_code | |
| authorships[9].institutions[0].display_name | Shanghai Artificial Intelligence Laboratory |
| authorships[9].author_position | last |
| authorships[9].raw_author_name | Shaoting Zhang |
| authorships[9].is_corresponding | True |
| authorships[9].raw_affiliation_strings | Shanghai Artificial Intelligence Laboratory, OpenMedLab, Shanghai, China |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.2196/57674 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T11636 |
| primary_topic.field.id | https://openalex.org/fields/27 |
| primary_topic.field.display_name | Medicine |
| primary_topic.score | 0.9991999864578247 |
| primary_topic.domain.id | https://openalex.org/domains/4 |
| primary_topic.domain.display_name | Health Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/2718 |
| primary_topic.subfield.display_name | Health Informatics |
| primary_topic.display_name | Artificial Intelligence in Healthcare and Education |
| related_works | https://openalex.org/W2378211422, https://openalex.org/W3114272811, https://openalex.org/W2027108423, https://openalex.org/W1855666948, https://openalex.org/W2758561209, https://openalex.org/W1548095260, https://openalex.org/W2594414941, https://openalex.org/W2781711915, https://openalex.org/W2112817590, https://openalex.org/W1555291398 |
| cited_by_count | 14 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 9 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 5 |
| locations_count | 4 |
| best_oa_location.id | doi:10.2196/57674 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S2764650051 |
| best_oa_location.source.issn | 2291-9694 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2291-9694 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | True |
| best_oa_location.source.display_name | JMIR Medical Informatics |
| best_oa_location.source.host_organization | https://openalex.org/P4310320608 |
| best_oa_location.source.host_organization_name | JMIR Publications |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310320608 |
| best_oa_location.source.host_organization_lineage_names | JMIR Publications |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | JMIR Medical Informatics |
| best_oa_location.landing_page_url | https://doi.org/10.2196/57674 |
| primary_location.id | doi:10.2196/57674 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S2764650051 |
| primary_location.source.issn | 2291-9694 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2291-9694 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | True |
| primary_location.source.display_name | JMIR Medical Informatics |
| primary_location.source.host_organization | https://openalex.org/P4310320608 |
| primary_location.source.host_organization_name | JMIR Publications |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310320608 |
| primary_location.source.host_organization_lineage_names | JMIR Publications |
| primary_location.license | cc-by |
| primary_location.pdf_url | |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | JMIR Medical Informatics |
| primary_location.landing_page_url | https://doi.org/10.2196/57674 |
| publication_date | 2024-06-28 |
| publication_year | 2024 |
| referenced_works | https://openalex.org/W4205865577, https://openalex.org/W4318765555, https://openalex.org/W4361289889, https://openalex.org/W4387378202, https://openalex.org/W4321649710, https://openalex.org/W4319662928, https://openalex.org/W4323030608, https://openalex.org/W4322622443, https://openalex.org/W2963270767, https://openalex.org/W3164795780, https://openalex.org/W4384484700, https://openalex.org/W2939803556, https://openalex.org/W4321276803, https://openalex.org/W4327525855, https://openalex.org/W4323066793, https://openalex.org/W2996859150, https://openalex.org/W3097183950, https://openalex.org/W4313361206, https://openalex.org/W4312212221, https://openalex.org/W3182414949, https://openalex.org/W2910866042, https://openalex.org/W4221143046, https://openalex.org/W4362655923, https://openalex.org/W4367310920, https://openalex.org/W3164718925, https://openalex.org/W4386045865, https://openalex.org/W2779051611, https://openalex.org/W3174169056, https://openalex.org/W3185341429, https://openalex.org/W3029078373 |
| referenced_works_count | 30 |
| abstract_inverted_index.3 | 130, 300 |
| abstract_inverted_index.5 | 123, 164 |
| abstract_inverted_index.7 | 199 |
| abstract_inverted_index.a | 73, 79, 98, 107, 119 |
| abstract_inverted_index.16 | 187 |
| abstract_inverted_index.27 | 195 |
| abstract_inverted_index.Dr | 227, 243, 262, 306 |
| abstract_inverted_index.PJ | 228, 244, 263, 307 |
| abstract_inverted_index.We | 77 |
| abstract_inverted_index.an | 320, 334 |
| abstract_inverted_index.be | 324 |
| abstract_inverted_index.by | 117, 153, 163, 209, 213, 221, 288, 327 |
| abstract_inverted_index.in | 10, 21, 28, 42, 67, 90, 125, 202, 234, 248, 268, 275, 290, 313, 329 |
| abstract_inverted_index.is | 62 |
| abstract_inverted_index.it | 61 |
| abstract_inverted_index.of | 85, 100 |
| abstract_inverted_index.on | 106, 146, 156 |
| abstract_inverted_index.to | 34, 54, 64, 137, 285, 332 |
| abstract_inverted_index.(Dr | 219 |
| abstract_inverted_index.Bot | 212, 233, 312 |
| abstract_inverted_index.PJ) | 220 |
| abstract_inverted_index.The | 150, 168, 190 |
| abstract_inverted_index.and | 15, 56, 71, 92, 127, 183, 198, 216, 231, 239, 253, 297, 310, 315 |
| abstract_inverted_index.are | 32 |
| abstract_inverted_index.can | 323 |
| abstract_inverted_index.for | 19, 160 |
| abstract_inverted_index.had | 264 |
| abstract_inverted_index.set | 99 |
| abstract_inverted_index.the | 17, 29, 68, 147, 235, 249, 276, 291 |
| abstract_inverted_index.use | 20 |
| abstract_inverted_index.was | 103 |
| abstract_inverted_index.(not | 37 |
| abstract_inverted_index.Inc; | 215 |
| abstract_inverted_index.LLMs | 27, 66, 157, 289 |
| abstract_inverted_index.also | 245 |
| abstract_inverted_index.case | 200, 240 |
| abstract_inverted_index.data | 88, 135, 148, 192, 295, 336 |
| abstract_inverted_index.have | 6 |
| abstract_inverted_index.rate | 252, 256 |
| abstract_inverted_index.set. | 337 |
| abstract_inverted_index.sets | 89, 136, 193 |
| abstract_inverted_index.such | 319 |
| abstract_inverted_index.that | 171, 305 |
| abstract_inverted_index.this | 58, 330 |
| abstract_inverted_index.were | 115, 144, 158, 172, 206 |
| abstract_inverted_index.with | 122, 139, 186, 273 |
| abstract_inverted_index.ERNIE | 211, 232, 311 |
| abstract_inverted_index.LLMs. | 140, 301 |
| abstract_inverted_index.Large | 2 |
| abstract_inverted_index.Three | 204 |
| abstract_inverted_index.Thus, | 53 |
| abstract_inverted_index.based | 105, 155 |
| abstract_inverted_index.blind | 161 |
| abstract_inverted_index.build | 72 |
| abstract_inverted_index.error | 255 |
| abstract_inverted_index.fully | 38 |
| abstract_inverted_index.great | 8 |
| abstract_inverted_index.lower | 266 |
| abstract_inverted_index.prone | 33 |
| abstract_inverted_index.risk, | 60 |
| abstract_inverted_index.sets, | 296 |
| abstract_inverted_index.sets. | 149 |
| abstract_inverted_index.tasks | 14 |
| abstract_inverted_index.their | 25 |
| abstract_inverted_index.using | 118 |
| abstract_inverted_index.(LLMs) | 5 |
| abstract_inverted_index.Baidu, | 214 |
| abstract_inverted_index.Delphi | 120 |
| abstract_inverted_index.Doctor | 217 |
| abstract_inverted_index.First, | 97 |
| abstract_inverted_index.Third, | 129 |
| abstract_inverted_index.better | 259 |
| abstract_inverted_index.create | 45 |
| abstract_inverted_index.domain | 31, 70 |
| abstract_inverted_index.easily | 325 |
| abstract_inverted_index.method | 121 |
| abstract_inverted_index.models | 4 |
| abstract_inverted_index.report | 241 |
| abstract_inverted_index.risks, | 47 |
| abstract_inverted_index.safety | 59 |
| abstract_inverted_index.scores | 267 |
| abstract_inverted_index.social | 178, 314 |
| abstract_inverted_index.system | 322 |
| abstract_inverted_index.ChatGPT | 208, 230, 247, 274, 309 |
| abstract_inverted_index.Despite | 24 |
| abstract_inverted_index.LLMs’ | 43 |
| abstract_inverted_index.Methods | 96 |
| abstract_inverted_index.OpenAI; | 210 |
| abstract_inverted_index.PuJiang | 218 |
| abstract_inverted_index.Results | 226 |
| abstract_inverted_index.Second, | 111 |
| abstract_inverted_index.adopted | 326 |
| abstract_inverted_index.augment | 333 |
| abstract_inverted_index.covered | 174 |
| abstract_inverted_index.domain, | 293 |
| abstract_inverted_index.experts | 124, 132 |
| abstract_inverted_index.include | 194 |
| abstract_inverted_index.medical | 30, 69, 87, 134, 166, 175, 191, 196, 237, 269, 292 |
| abstract_inverted_index.natural | 11 |
| abstract_inverted_index.prevent | 57 |
| abstract_inverted_index.reports | 201 |
| abstract_inverted_index.results | 303 |
| abstract_inverted_index.review. | 110 |
| abstract_inverted_index.safety. | 52 |
| abstract_inverted_index.system, | 82 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Chinese, | 91 |
| abstract_inverted_index.Chinese. | 203 |
| abstract_inverted_index.Finally, | 141 |
| abstract_inverted_index.However, | 261 |
| abstract_inverted_index.Shanghai | 222 |
| abstract_inverted_index.achieved | 7 |
| abstract_inverted_index.chatbots | 154, 205, 287 |
| abstract_inverted_index.clinical | 22, 131 |
| abstract_inverted_index.compared | 272 |
| abstract_inverted_index.complete | 254 |
| abstract_inverted_index.composed | 84 |
| abstract_inverted_index.criteria | 102, 114, 170, 284 |
| abstract_inverted_index.designed | 104, 133 |
| abstract_inverted_index.detailed | 188 |
| abstract_inverted_index.dialogue | 278 |
| abstract_inverted_index.evaluate | 65, 286 |
| abstract_inverted_index.existing | 112 |
| abstract_inverted_index.experts. | 167 |
| abstract_inverted_index.interact | 138 |
| abstract_inverted_index.language | 3, 12 |
| abstract_inverted_index.licensed | 165 |
| abstract_inverted_index.medicine | 126 |
| abstract_inverted_index.obtained | 173 |
| abstract_inverted_index.perceive | 55 |
| abstract_inverted_index.physical | 51 |
| abstract_inverted_index.progress | 9 |
| abstract_inverted_index.provides | 282 |
| abstract_inverted_index.publicly | 93 |
| abstract_inverted_index.recorded | 159 |
| abstract_inverted_index.reliable | 39 |
| abstract_inverted_index.semantic | 250 |
| abstract_inverted_index.slightly | 265 |
| abstract_inverted_index.Objective | 76 |
| abstract_inverted_index.assessing | 299 |
| abstract_inverted_index.available | 94 |
| abstract_inverted_index.candidate | 113 |
| abstract_inverted_index.category, | 257 |
| abstract_inverted_index.community | 331 |
| abstract_inverted_index.conducted | 145 |
| abstract_inverted_index.contexts. | 317 |
| abstract_inverted_index.criteria, | 86 |
| abstract_inverted_index.developed | 78 |
| abstract_inverted_index.dialogues | 197, 238 |
| abstract_inverted_index.essential | 63 |
| abstract_inverted_index.generated | 152 |
| abstract_inverted_index.optimized | 116 |
| abstract_inverted_index.potential | 18 |
| abstract_inverted_index.responses | 44, 151 |
| abstract_inverted_index.scenario. | 279 |
| abstract_inverted_index.Artificial | 223 |
| abstract_inverted_index.Background | 1 |
| abstract_inverted_index.MedGPTEval | 281 |
| abstract_inverted_index.Therefore, | 318 |
| abstract_inverted_index.assessment | 321 |
| abstract_inverted_index.benchmarks | 298 |
| abstract_inverted_index.contextual | 181 |
| abstract_inverted_index.evaluated: | 207 |
| abstract_inverted_index.evaluation | 81, 101, 169 |
| abstract_inverted_index.generating | 35 |
| abstract_inverted_index.indicating | 258 |
| abstract_inverted_index.literature | 109 |
| abstract_inverted_index.processing | 13 |
| abstract_inverted_index.scenarios. | 242 |
| abstract_inverted_index.systematic | 74 |
| abstract_inverted_index.Conclusions | 280 |
| abstract_inverted_index.Laboratory. | 225 |
| abstract_inverted_index.MedGPTEval, | 83 |
| abstract_inverted_index.benchmarks. | 95 |
| abstract_inverted_index.consistency | 251 |
| abstract_inverted_index.demonstrate | 304 |
| abstract_inverted_index.evaluation. | 75 |
| abstract_inverted_index.evaluations | 162 |
| abstract_inverted_index.experiments | 143 |
| abstract_inverted_index.indicators. | 189 |
| abstract_inverted_index.open-source | 294, 335 |
| abstract_inverted_index.outperforms | 308 |
| abstract_inverted_index.patients’ | 50 |
| abstract_inverted_index.potentially | 48 |
| abstract_inverted_index.researchers | 328 |
| abstract_inverted_index.responses). | 40 |
| abstract_inverted_index.robustness, | 185 |
| abstract_inverted_index.robustness. | 260 |
| abstract_inverted_index.substantial | 46 |
| abstract_inverted_index.threatening | 49 |
| abstract_inverted_index.Experimental | 302 |
| abstract_inverted_index.Intelligence | 224 |
| abstract_inverted_index.benchmarking | 142 |
| abstract_inverted_index.capabilities | 271 |
| abstract_inverted_index.demonstrated | 16 |
| abstract_inverted_index.engineering. | 128 |
| abstract_inverted_index.outperformed | 229, 246 |
| abstract_inverted_index.professional | 176, 270, 316 |
| abstract_inverted_index.applications. | 23 |
| abstract_inverted_index.capabilities, | 26, 177, 180, 182 |
| abstract_inverted_index.comprehensive | 80, 108, 179, 283 |
| abstract_inverted_index.computational | 184 |
| abstract_inverted_index.multiple-turn | 236, 277 |
| abstract_inverted_index.Hallucinations | 41 |
| abstract_inverted_index.hallucinations | 36 |
| cited_by_percentile_year.max | 99 |
| cited_by_percentile_year.min | 98 |
| corresponding_author_ids | https://openalex.org/A5066553616 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 10 |
| corresponding_institution_ids | https://openalex.org/I4391012619 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.75 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile.value | 0.8784043 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |