Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning

Adiwijaya Adiwijaya , Nur Ghaniaviyanto Ramadhan ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.26555/ijain.v11i1.1678

Stroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM), hypertension, and high cholesterol levels. Effective preventive measures are crucial to minimize stroke risk, and using predictive methods based on data analysis from the clinical examination dataset over the last three years (2019-2021), known as the general checkup (GCU) dataset, presents an innovative approach. This study aims to predict an individual's stroke risk for the following year. In this context, the study also addresses the preprocessing stage of the GCU dataset, which includes solutions for missing values by substituting them with the statistical mean, label encoding, feature correlation analysis using entropy values, and addressing data imbalance with the Adaptive Synthetic (ADASYN) technique. To evaluate their predictive performance, the research involves comparisons among various machine learning models. The outcome of the experiment shows that the Random Forest model is the best model, with 98.7% accuracy and 63.9% F1-Score. This research highlights the importance of preemptive measures against stroke by utilizing predictive techniques on clinical data, with the Random Forest model proving most effective in forecasting stroke probability.

Concepts

Computer science Artificial intelligence Machine learning Stroke risk Stroke (engine) Medicine Ischemic stroke Internal medicine Engineering Mechanical engineering Ischemia

Metadata

Type: article
Language: en
Landing Page: https://doi.org/10.26555/ijain.v11i1.1678
PDF: https://ijain.org/index.php/IJAIN/article/download/1678/ijain_vol11i1_pp39-54
OA Status: gold
Related Works: 10
OpenAlex ID: https://openalex.org/W4408096307

All OpenAlex metadata

Raw OpenAlex JSON

No additional metadata available.

Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning Article Swipe

Related Topics To Compare & Contrast

Raw OpenAlex JSON