arXiv (Cornell University)
Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission Prediction
November 2022 • Grace Yang, Ming Cao, Lavender Yao Jiang, Xujin Chris Liu, Alexander T. M. Cheung, Hannah Weiss, David B. Kurland, Kyunghyun Cho, Eric K. Oermann
Traditional evaluation metrics for classification in natural language processing such as accuracy and area under the curve fail to differentiate between models with different predictive behaviors despite their similar performance metrics. We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level to provide insights into disparities in their decision-making logic. We assess the sensitivity score on a set of representative words in the test set using two classifiers trained …