arXiv (Cornell University)
Diagnosing Hallucination Risk in AI Surgical Decision-Support: A Sequential Framework for Sequential Validation
November 2025 • Dong Chen, Yumei Wei, Zhiyang He, Guan‐Ming Kuang, Chao Ye, M. R. An, Hui Peng, Yong Hu, Huiren Tao, Kenneth Cheung
Large language models (LLMs) offer transformative potential for clinical decision support in spine surgery but pose significant risks through hallucinations, which are factually inconsistent or contextually misaligned outputs that may compromise patient safety. This study introduces a clinician-centered framework to quantify hallucination risks by evaluating diagnostic precision, recommendation quality, reasoning robustness, output coherence, and knowledge alignment. We assessed six leading LLMs across 30 expert-v…