doi.org
Structured Taxonomy and Framework for Developing Medical Benchmark in Large Language Models Derived from Scoping Review
November 2025 • Junbok Lee, Jaeyong Shin
<title>Abstract</title> With the rapid advancement of large language model technology, numerous studies have explored its application in the medical field. Robust evaluation is crucial for ensuring reliability and safety, leading to the development of diverse benchmark datasets. In this study, we propose a structured taxonomy to provide researchers with practical guidance for benchmark selection. Furthermore, we introduce READY, a development framework built on five principles - Reliable, Ethical, Annotated, Diver…