Advanced Ensemble Machine Learning Techniques for Optimizing Diabetes Mellitus Prognostication: a Detailed Examination of Hospital Data
Diabetes is a chronic disease that affects millions of people worldwide. Early diagnosis and effective management are crucial for reducing its complications. Diabetes is the fourth-highest cause of mortality due to its association with various comorbidities, including heart disease, nerve damage, blood vessel damage, and blindness. The potential of machine learning algorithms in predicting Diabetes and related conditions is significant, and mining diabetes data is an efficient method for extracting new insights. The primary objective of this study is to develop an enhanced ensemble model to predict Diabetes with improved accuracy by leveraging various machine learning algorithms. This study tested several popular machine learning algorithms commonly used in diabetes prediction, including Naive Bayes (NB), Generalized Linear Model (GLM), Logistic Regression (LR), Fast Large Margin (FLM), Deep Learning (DL), Decision Tree (DT), Random Forest (RF), Gradient Boosted Trees (GBT), and Support Vector Machine (SVM). The performance of these algorithms was compared, and two different ensemble techniques?stacking and voting?were used to build a more accurate predictive model. The top three algorithms based on accuracy were Deep Learning, Naive Bayes, and Gradient Boosted Trees. The machine learning algorithms revealed that individuals with Diabetes are significantly affected by the number of chronic conditions they have, as well as their gender and age. The ensemble models, particularly the stacking method, provided higher accuracy than individual algorithms. The stacking ensemble model achieved a slightly better accuracy of 99,94 % compared to 99,34 % for the voting method. Building an ensemble model significantly increased the accuracy of predicting Diabetes and related conditions. The stacking ensemble model, in particular, demonstrated superior performance, highlighting the importance of combining multiple machine learning approaches to enhance predictive accuracy.
Publishing Year
2024