Multi-Algorithm-Based Ensemble Voting Classifier and SMOTE Method for Herat Disease Classification
Abstract
The heart is a vital organ responsible for pumping blood throughout the body. Hence, impairments can disrupt blood circulation and are the leading causes of mortality worldwide. World Health Organization (WHO) reported that, in 2021, the mortality rate attributed to heart disease reached a significant number. In Indonesia, the prevalence of heart disease attained 1.5%. Consequently, it is essential to prevent and detect heart disease at an early stage utilizing machine learning technologies. This study aims to develop a heart disease classification model using the naïve Bayes and random forest algorithms through the ensemble voting classifier approach. The data were obtained from Kaggle, comprising 1,000 records with 14 variables, including one classification target. Imbalanced data were handled using the synthetic minority oversampling technique (SMOTE), while feature selection was conducted in consultation with cardiologists to ensure clinical relevance. The model was trained using the naïve Bayes algorithm, random forest, and integration of both through the ensemble voting classifier method, in contrast to previous studies that only compared several algorithms to determine the highest accuracy. The test results showed that the model trained with the ensemble voting classifier yielded the best performance, with an accuracy, precision, recall, and F1 score of 98.28%, 98.41%, 98.41%, and 98.41%, respectively. This study demonstrates that the ensemble voting classifier method provides better accuracy than the individual algorithms. This model falls within the excellent classification category and is expected to contribute to the medical field and support the development of decision-support systems for diagnosing heart disease.
References
L.P.C. Dewi. “Jenis, gejala, dan penyebab penyakit jantung.” Access date: 13-Mar-2024. [Online]. Available: https://rs-soewandhi.surabaya.go.id/jenis-gejala-dan-penyebab-penyakit-jantung/
S.D. Sawu, A.A. Prayitno, and Y.I. Wibowo, “Analisis faktor risiko pada kejadian masuk rumah sakit penyakit jantung koroner di Rumah Sakit Husada Utama Surabaya,” J. Sains Kesehat., vol. 4, no. 1, pp. 10–18, Jul. 2022, doi: 10.25026/jsk.v4i1.856.
M. Ardiana, Buku Ajar Prevensi dan Rehabilitasi Jantung. Surabaya, Indonesia: Airlangga University Press, 2022.
World Health Organization. “Cardiovascular diseases.” Access date: 7-Aug-2024. [Online]. Available: https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1
Tim Riskesdas 2018, Laporan Nasional Riskedas 2028. Jakarta, Indonesia: Lembaga Penerbit Badan Penelitian dan Pengembangan Kesehatan, 2019.
Y.P. Santosa. “Memahami pentingnya cek kesehatan jantung.” Primaya Hospital. Access date: 8-Jul-2024. [Online]. Available: https://primayahospital.com/jantung/pentingnya-cek-kesehatan-jantung/
Direktorat Jenderal Pencegahan dan Pengendalian Penyakit. “Pemeriksaan, Gejala, dan Diet untuk Jantung.” Access date: 2-Jun-2024. [Online]. https://p2p.kemkes.go.id/pemeriksaan-gejala-dan-diet-untuk-jantung/
A.M.A. Rahim, I.Y.R. Pratiwi, and M.A. Fikri, “Klasifikasi penyakit jantung menggunakan metode synthetic minority over-sampling technique dan random forest clasifier,” Indones. J. Comput. Sci., vol. 12, no. 5, pp. 2995–3011, Oct. 2023, doi: 10.33022/ijcs.v12i5.3413.
J.D. Muthohhar and A. Prihanto, “Analisis perbandingan algoritma klasifikasi untuk penyakit jantung,” J. Inform. Comput. Sci. (JINACS), vol. 04, no. 3, pp. 298–304, Mar. 2023, doi: 10.26740/jinacs.v4n03.p298-304.
A.F.N. Masruriyah et al., “Evaluasi algoritma pembelajaran terbimbing terhadap dataset penyakit jantung yang telah dilakukan oversampling,” MIND (Multimed. Artif. Intell. Netw. Database) J., vol. 8, no. 2, pp. 242–253, Dec. 2023, doi: 10.26760/mindjournal.v8i2.242-253.
D.H. Depari, Y. Widiastiwi, and M.M. Santoni, “Perbandingan model decision tree, naive Bayes dan random forest untuk prediksi klasifikasi penyakit jantung,” Inform., J. Ilmu Komput., vol. 18, no. 3, pp. 239–248, Dec. 2022, doi: 10.52958/iftk.v18i3.4694.
Ratnasari, A.J. Wahidin, A.E. Setiawan, and P. Bintoro, “Machine learning untuk klasifikasi penyakit jantung,” Aisyah J. Inform. Electr. Eng., vol. 6, no. 1, pp. 145–150, Feb. 2024, doi: 10.30604/jti.v6i1.272.
A. Samosir, M.S. Hasibuan, W.E. Justino, and T. Hariyono, “Komparasi algoritma random forest, naïve Bayes dan k-nearest neighbor dalam klasifikasi data penyakit jantung,” in Pros. Semin. Nas. Darmajaya, 2021, pp. 214–222.
B. Asrun and I. Irmayani, “Penerapan konsep non-deterministic finite automata dalam diagnosa penyakit jantung,” Dewantara J. Technol., vol. 3, no. 1, pp. 122–125, May 2022, doi: 10.59563/djtech.v3i1.184.
P.D. Kusuma, Machine Learning Teori, Program, dan Studi Kasus. Yogyakarta, Indonesia: Deepublish, 2020.
P.B.N. Setio, D.R.S. Saputro, and B. Winarno, “Klasifikasi dengan pohon keputusan berbasis algoritme C4.5,” in PRISMA, Pros. Semin. Nas. Mat., 2020, pp. 64–71.
Rayuwati, H. Gemasih, and I. Nizar, “Implementasi algoritma naive Bayes untuk memprediksi tingkat penyebaran Covid,” J. Ris. Rumpun Ilmu Tek.., vol. 1, no. 1, pp. 38–46, Apr. 2022, doi: 10.55606/jurritek.v1i1.127.
M. Al-Husaini, P.A. Saputra, M. Renaldi, and R.A. Maulana, Prediksi Tsunami dengan Metode Ensemble Machine Learning. Jambi, Indonesia: PT. Sonpedia Publishing Indonesia, 2024.
Sarwido, G.W.N. Wibowo and M.A. Manan, “Penerapan algoritma naive Bayes untuk prediksi heregistrasi calon mahasiswa baru,” J. Tek. Inform., vol. 1, no. 1, pp. 1–10, Feb. 2022, doi: 10.02220/jtinfo.v1i1.126.
S. Saadah and H. Salsabila, “Prediksi harga Bitcoin menggunakan metode random forest,” J. Komput. Terap., vol. 7, no. 1, pp. 24–32, Jun. 2021, doi: 10.35143/jkt.v7i1.4618.
M.R. Adrian, M.P. Putra, M.H. Rafialdy, and N.A. Rakhmawati, “Perbandingan metode klasifikasi random forest dan SVM pada analisis sentimen PSBB,” J. Inform. UPGRIS, vol. 7, no. 1, pp. 36–40, 2021, doi: 10.26877/jiu.v7i1.7099.
I. Daqiqil, Machine Learning: Teori, Studi Kasus dan Implementasi Menggunakan Python, 1st ed. Riau, Indonesia: UR PRESS, 2021.
A.J. Mohammed, M.M. Hassan, and D.H. Kadir, “Improving classification performance for a novel imbalanced medical dataset using SMOTE method,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 3, pp. 3161–3172, Jun. 2020, doi: 10.30534/ijatcse/2020/104932020.
M. Ardiansyah, “Model ensemble algoritma naive Bayes dan random forest dalam klasifikasi penyakit paru-paru untuk meningkatkan akurasi,” SMARTLOCK, J. Sains dan Teknol.., vol. 2, no. 2, pp. 32–38, Dec. 2023, doi: 10.37476/smartlock.v2i2.4407.
S. Bashir et al., “A knowledge-based clinical decision support system utilizing an intelligent ensemble voting scheme for improved cardiovascular disease prediction,” IEEE Access, vol. 9, pp. 130805–130822, Sep. 2021, doi: 10.1109/ACCESS.2021.3110604.
J. Dumlao. “Cardiovascular Disease Dataset.” Kaggle. Access date: 28-Apr-2024. [Online]. Available: https://www.kaggle.com/datasets/jocelyndumlao/cardiovascular-disease-dataset
F. Rahman and Mustikasari, “Optimization of student graduation predictions on time using binning and synthetic minority oversampling technique (SMOTE),” Jagti, vol. 4, no. 1, pp. 30–36, Feb. 2024, doi: 10.24252/jagti.v4i1.77.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.