Early Detection of Diabetes Using Machine Learning with Logistic Regression Algorithm
Diabetes is one of the deadliest diseases in the world, including in Indonesia. It can cause complications in numerous body parts and increase the overall risk of death. One way to detect diabetes is to use machine learning algorithms. Logistic regression is a classification model in machine learning widely used in clinical analysis. In this paper, a predictive model was created in Python IDE using logistic regression to conduct an early detection if a person has diabetes or not depending on the initial data provided. The experiment was carried out using a dataset from the Pima Indians Diabetes Database, which consisted of 768 patient data with eight independent variables and one dependent variable. Exploratory data analysis was applied to obtain maximum insight of the datasets owned by using statistical assistance and presenting them through visual techniques. Some dataset variables contained incomplete data. Missing data values were replaced with the median value of each variable. Unbalanced data was handled using the synthetic minority over-sampling technique (SMOTE) to increase the minority class through synthetic data sampling. The model was evaluated based on the confusion matrix, which showed a reasonably good performance with an accuracy value of 77%, precision of 75%, recall of 77%, and F1-score of 76%. In addition, this paper also used the grid search technique as a hyperparameter tuning that could improve the performance of the logistic regression model. The primary model performance with the model after applying the grid search technique was tested and evaluated. The experimental results showed that the hyperparameter tuning-based model could improve the performance of the logistic regression algorithm for prediction with an accuracy value of 82%, precision of 81%, recall of 79%, and F1-score of 80%.
American Diabetes Association (2020) “Diabetes Overview The path to understanding diabetes starts here.” [Online], https://www.diabetes.org/diabetes, access date: 19-Nov-2021.
World Health Organization (2020) “Diabetes,” [Online], https://www.who.int/health-topics/diabetes#tab=tab_1, access date: 19-Nov-2021.
International Diabetes Federation (2020 “Diabetes facts & figures,” [Online], https://idf.org/aboutdiabetes/what-is-diabetes/facts-figures.html, access date: 19-Nov-2021.
J. Elflein (2019) “Number of people with diabetes, by country 2019,” [Online], https://www.statista.com/statistics/281082/countries-with-highest-number-of-diabetics/, access date: 6-Dec-2021.
H. Nurhayati-Wolff (2020) “Projected number of people with diabetes Indonesia 2017-2024,” [Online], https://www.statista.com/statistics/1052625/indonesia-diabetes-projection/, access date: 6-Dec-2021.
B. Hardhana, F. Sibuea, and W. Widiantini, Eds., Profil Kesehatan Indonesia Tahun 2019, Jakarta, Indonesia: Kementerian Kesehatan Republik Indonesia, 2020.
Badan Litbangkes Kemenkes RI (2018), “Hasil Utama Riskesdas 2018,” [Online], https://drive.google.com/file/d/1MRXC4lMDera5949ezbbHj7UCUj5_EQmY/view, access date: 6-Dec-2021.
Diabetes UK (2018) “Diabetes the Basics,” [Online], https://www.diabetes.org.uk/diabetes-the-basics, access date: 8-Dec-2021.
M.C. Riddle, Ed., “Standards of Medical Care in Diabetes—2022,” Diabetes Care, Vol. 45, Supp. 1, pp. 125-143, Jan. 2022.
D.J. Reddy, et al., “Predictive Machine Learning Model for Early Detection and Analysis of Diabetes,” Mater. Today: Proc., akan diterbitkan.
L.V.R. Kumari, et al., “Machine Learning based Diabetes Detection,” Proc. 6th Int. Conf. Commun. Electron. Syst. (ICCES 2021), 2021, pp. 1-5.
N. Abdulhadi and A. Al-Mousa, “Diabetes Detection Using Machine Learning Classification Methods,” Proc. 2021 Int. Conf. Inf. Technol. ICIT 2021, 2021, pp. 350–354.
R. Krishnamoorthi, et al., “A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques,” J. Healthc. Eng., Vol. 2022, pp. 1–10, 2022.
U.M. Butt, et al., “Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications,” J. Healthc. Eng., Vol. 2021, pp. 1–17, 2021.
P. Arsi and O. Somantri, “Deteksi Dini Penyakit Diabetes Menggunakan Algoritma Neural Network Berbasiskan Algoritma Genetika,” J. Inform. J. Pengemb. IT, Vol. 3, No. 3, pp. 290–294, 2018.
A.B. Wibisono and A. Fahrurozi, “Perbandingan Algoritma Klasifikasi dalam Pengklasifikasian Data Penyakit Jantung Koroner,” J. Ilm. Teknol. dan Rekayasa, Vol. 24, No. 3, pp. 161–170, 2019.
J.J. Khanam and S.Y. Foo, “A Comparison of Machine Learning Algorithms for Diabetes Prediction,” ICT Express, Vol. 7, No. 4, pp. 432–439, 2021.
T. Ciu and R.S. Oetama, “Logistic Regression Prediction Model for Cardiovascular Disease,” IJNMT (Int. J. New Media Technol.), Vol. 7, No. 1, pp. 33–38, 2020.
R. Thammasudjarit, et al., “Comparison of Machine Learning with Logistic Regression for Prediction of Chronic Kidney Disease in the Thai Adult Population,” Ramathibodi Med. J., Vol. 44, No. 4, pp. 1–12, 2021.
N. Varshney and A. Sharma, “Identification and Prediction of Liver Disease Using Logistic Regression,” Eur. J. Mol. Clin. Med., Vol. 7, No. 4, pp. 106–110, 2020.
D.Y. Utami, E. Nurlelah, and F.N. Hasan, “Comparison of Neural Network Algorithms, Naive Bayes and Logistic Regression to Find the Highest Accuracy in Diabetes,” J. Inform. Telecommun. Eng., Vol. 5, No. 1, pp. 152–159, 2021.
S. Nusinovici, et al., “Logistic Regression was as Good as Machine Learning for Predicting Major Chronic Diseases,” J. Clin. Epidemiol., Vol. 122, pp. 56–69, 2020.
S. Mezzatesta, et al., “A Machine Learning-based Approach for Predicting the Outbreak of Cardiovascular Diseases in Patients on Dialysis,” Comput. Methods, Programs Biomed., Vol. 177, pp. 9–15, 2019.
S. Ambesange, et al., “Multiple Heart Diseases Prediction Using Logistic Regression with Ensemble and Hyper Parameter Tuning Techniques,” Proc. World Conf. Smart Trends Syst. Secur. Sustain. WS4 2020, 2020, pp. 827–832.
L. Lama, et al., “Machine Learning for Prediction of Diabetes Risk in Middle-aged Swedish People,” Heliyon, Vol. 7, No. 7, pp. 1–6, 2021.
(2016) “Pima Indians Diabetes Database,” [Online], https://www.kaggle.com/uciml/pima-indians-diabetes-database, access date: 23-Oct-2021.
F. Pedregosa, et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., Vol. 12, No. 85, pp. 2825–2830, 2011.
R.D. Joshi and C.K. Dhakal, “Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches,” Int. J. Environ. Res. Public Health, Vol. 18, No. 14, pp. 1-17, 2021.