Multiple Machine Learning Models-Based Diabetes Prediction and Feature Importance Analysis Article Swipe

PDF

Related Concepts

No concepts available.

Jianbo Ye ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.1051/itmconf/20257802006 · OA: W4414071113

Due to the increasing number of diabetic patients in recent years and the inadequacy of traditional diabetes prediction methods, machine learning models with many advantages should be used to predict diabetes. The study selects the data from data set which is on the Kaggle and analyzes them through four models which are used to predict diabetes. The four models are logistic regression, k-nearest neighbor, decision tree and random forest. The optimal model is derived from comparing the prediction accuracy of these four models for diabetes. Based on the optimal model, important features for predicting diabetes are analyzed. Through the above methods, the findings of this study indicate that the random forest model is the most effective, achieving an accuracy rate of 79.870%. At the same time, the results show that the decision tree model has the worst prediction effect on diabetes, with an accuracy of 72.727%. On the basis of random forest as the optimal model, this study finds that glucose, Body Mass Index (BMI) and age are the top three influencing factors, respectively.