Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.26555/ijain.v11i1.1678
Stroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM), hypertension, and high cholesterol levels. Effective preventive measures are crucial to minimize stroke risk, and using predictive methods based on data analysis from the clinical examination dataset over the last three years (2019-2021), known as the general checkup (GCU) dataset, presents an innovative approach. This study aims to predict an individual's stroke risk for the following year. In this context, the study also addresses the preprocessing stage of the GCU dataset, which includes solutions for missing values by substituting them with the statistical mean, label encoding, feature correlation analysis using entropy values, and addressing data imbalance with the Adaptive Synthetic (ADASYN) technique. To evaluate their predictive performance, the research involves comparisons among various machine learning models. The outcome of the experiment shows that the Random Forest model is the best model, with 98.7% accuracy and 63.9% F1-Score. This research highlights the importance of preemptive measures against stroke by utilizing predictive techniques on clinical data, with the Random Forest model proving most effective in forecasting stroke probability.
Related Topics To Compare & Contrast
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.26555/ijain.v11i1.1678
- https://ijain.org/index.php/IJAIN/article/download/1678/ijain_vol11i1_pp39-54
- OA Status
- gold
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4408096307