Comparison of KNN and SVM Algorithms Performance Using SMOTE to Classify Diabetes
Abstract
Diabetes frequently goes undetected or is diagnosed too late. Consequently, it may lead to a range of serious complications, such as organ damage, stroke, and heart disease. The International Diabetes Federation (IDF) reports that 10.5% of the adult population aged 20 to 79 are diagnosed with diabetes, and almost half are unaware of the condition. Hence, the number of people with diabetes has increased by fourfold compared to the prior period. One essential step for preventing complications in patients with diabetes is early detection, one of which is by utilizing artificial intelligence (AI) technology, namely data mining. Therefore, knowledge about effective algorithms used to detect diabetes is needed. This study aimed to compare two algorithms, namely k-nearest neighbor (KNN) and support vector machine (SVM), for diabetes classification using the synthetic minority oversampling technique (SMOTE). In this study, both algorithm performance was measured using the machine learning life cycle method. The results showed they had good performance in detecting diabetes; yet, there were significant performance differences between the two. The SVM algorithm with radial basis function (RBF) kernel achieved 81.67% accuracy, 85.91% precision, 79.01% recall, and 82.32% F1 score. Meanwhile, the KNN algorithm with k = 3 found through cross-validation achieved 83.33% accuracy, 85.00% precision, 83.95% recall, and 84.47% F1 score. Based on confusion matrix evaluation, KNN showed superior performance compared to SVM in terms of accuracy and other evaluation metrics. These results indicate that KNN is more effective in detecting diabetes in the dataset used in this study.
References
F.M. Hana, “Klasifikasi penderita penyakit diabetes menggunakan algoritma decision tree C4.5,” J. Sist. Komput. Kecerdasan Buatan, vol. IV, no. 1, pp. 32–39, Sep. 2020, doi: 10.47970/siskom-kb.v4i1.173.
IDF, “International Diabetes Federation,” Access date: 13-Mar-2024. [Online]. Available: https://idf.org/
World Health Organization, “Diabetes type 1 and type 2 Causes of diabetes,” Access date: 13-Mar-2024. [Online]. Available: https://www.who.int/health-topics/
diabetes?gad_source=1&gclidCj0KCQjw-r-vBhCARIsAGgUO2ATe6b9pbM8tg01IGtkszHXAxW4PvDAnxhK_9-YhqlJNnhkLdVFKHgaAguwEALw_wcB#tab=tab_1
Gunawan et al., “Penerapan linear sampling dan information gain pada algoritma decision tree untuk diagnosis penyakit diabetes,” Multinetics, vol. 7, no. 1, pp. 124–131, Nov. 2021, doi: 10.32722/multinetics.v7i2.3796.
K.R. Widiasari, I.M.K. Wijaya, and P.A. Suputra, “Diabetes melitus tipe 2: Faktor risiko, diagnosis, dan tatalaksana,” Ganesha Med. J., vol. 1, no. 2, pp. 114–120, Sep. 2021, doi: 10.23887/gm.v1i2.40006.
N.M. Putry and B.N. Sari, “Komparasi algoritma KNN dan naïve Bayes untuk klasifikasi diagnosis penyakit diabetes mellitus,” Evolusi, J. Sains Manaj., vol. 10, no. 1, pp. 45–57, Sep. 2022, doi: 10.31294/evolusi.v10i1.12514.
N.W. Mardiyyah, N. Rahaningsih, and I. Ali, “Penerapan data mining menggunakan algoritma k-nearest neighbor pada prediksi pemberian kredit di sektor finansial,” JATI, vol. 8, no. 2, pp. 1491–1499, Apr. 2024, doi: 10.36040/jati.v8i2.9010.
J.A. Wibowo, V.C. Mawardi, and T. Sutrisno, “Penerapan support vector machine untuk analisis sentimen fitur layanan pada ulasan Gojek,” J. Ilmu Komput. Sist. Inf., vol. 12, no. 1, pp. 1–8, Jan. 2024, doi: 10.24912/jiksi.v12i1.28211.
N.K. Sowabi, N.A. Widiastuti, and N.A. Maori, “Optimasi algoritma k-nearest neighbors menggunakan teknik Bayesian optimization untuk klasifikasi diabetes,” J. Inf. Syst. Res. (JOSH), vol. 6, no. 1, pp. 294–301, Oct. 2024, doi: 10.47065/josh.v6i1.5975.
A.M. Argina, “Penerapan metode klasifikasi k-nearest neigbor pada dataset penderita penyakit diabetes,” Indonesian J. Data Sci., vol. 1, no. 2, pp. 29–33, Jul. 2020, doi: 10.33096/ijodas.v1i2.11.
A.W. Mucholladin, F.A. Bachtiar, and M.T. Furqon, “Klasifikasi penyakit diabetes menggunakan metode support vector machine,” J. Pengemb. Teknol. Inf. Ilmu Komput., vol. 5, no. 2, pp. 622–633, Feb. 2021.
H.S.W. Hovi, A. Id Hadiana, and F.R. Umbara, “Prediksi penyakit diabetes menggunakan algoritma support vector machine (SVM),” Inform. Digit. Expert, vol. 4, no. 1, pp. 40–45, May 2022, doi: 10.36423/index.v4i1.895.
H.A.D. Fasnuari, H. Yuana, and M.T. Chulkamdi, “Application of k-nearest neighbor algorithm for classification of diabetes mellitus case study: residents of jatitengah village,” Antivirus, J. Ilm. Tek. Inform., vol. 16, no. 2, pp. 133–142, Nov. 2022, doi: 10.35457/antivirus.v16i2.2445.
D. Kurniadi, F. Nuraeni, and M. Firmansyah, “Klasifikasi masyarakat penerima bantuan langsung tunai dana desa menggunakan naïve Bayes dan SMOTE,” J. Teknol. Inf. Ilmu Komput., vol. 10, no. 2, pp. 309–320, Apr. 2023, doi: 10.25126/jtiik.20231026453.
I.D.A.E.C. Astutisari, A.A.A.Y. Darmini, and I.A.P. Wulandari, “Hubungan pola makan dan aktivitas fisik dengan kadar gula darah pada pasien diabetes melitus tipe 2 di Puskesmas Manggis I,” J. Ris. Kesehat. Nas., vol. 6, no. 2, pp. 79–87, Oct. 2022, doi: 10.37294/jrkn.v6i2.350.
R. Kosasih, “Klasifikasi tingkat kematangan pisang berdasarkan ekstraksi fitur tekstur dan algoritme KNN,” J. Nas. Tek. Elekt. Teknol. Inf., vol. 10, no. 4, pp. 383–388, Nov. 2021, doi: 10.22146/jnteti.v10i4.462.
F.A. Tyas, M. Nurayuni, and H. Rakhmawati, “Optimasi algoritma k-nearest neighbors berdasarkan perbandingan analisis outlier (berbasis jarak, kepadatan, LOF),” J. Nas. Tek. Elekt. Teknol. Inf., vol. 13, no. 2, pp. 108–115, May 2024, doi: 10.22146/jnteti.v13i2.9579.
N. Ikhwana, M. Nusrang, and Sudarmin, “Perbandingan metode PCA-SVM dan SVM untuk klasifikasi indeks kepuasan masyarakat terhadap layanan pendidikan di Kabupaten Jeneponto,” Variansi, J. Stat. Its Appl. Teach. Res., vol. 3, no. 3, pp. 148–155, 2021, doi: 10.35580/variansiunm22988.
L.U. Khasanah, Y.N. Nasution, and F.D.T. Amijaya, “Klasifikasi penyakit diabetes melitus menggunakan algoritma naïve Bayes classifier,” Basis J. Ilm. Matemat., vol. 1, no. 1, pp. 41–50, Sep. 2022, doi: 10.30872/basis.v1i1.918.
N. Saputra et al., “Improving foreign language proficiency in society by decision tree classification,” AIP Conf. Proc., vol. 3001, no. 1, Feb. 2024, Art. no 110005, doi: 10.1063/5.0183888.
F.S. Pamungkas, B.D. Prasetya, and I. Kharisudin, “Perbandingan metode klasifikasi supervised learning pada data bank customers menggunakan Python,” in Prisma, Pros. Semin. Nas. Mat., 2020, vol. 3, pp. 692–697.
K. Akbar and M. Hayaty, “Data balancing untuk mengatasi imbalance dataset pada prediksi produksi padi,” J. Ilm. Intech, Inf. Technol. J. UMUS, vol. 2, no. 2, pp. 1–14, Nov. 2020, doi: 10.46772/intech.v2i02.283.
J. Chen et al., “Machine learning-based classification of rock discontinuity trace: SMOTE oversampling integrated with GBT ensemble learning,” Int. J. Min. Sci. Technol., vol. 32, no. 2, pp. 309–322, Mar. 2022, doi: 10.1016/j.ijmst.2021.08.004.
Karfindo, R. Turaina, and R. Saputra, “Optimalisasi klasifikasi umpan balik mahasiswa terhadap layanan kampus dengan sinergi random forest dan Smote,” J. Nas. Komput. Teknol. Inf., vol. 6, no. 6, pp. 820–827, Des. 2023, doi: 10.32672/jnkti.v6i6.7269.
I.D. Id, Machine Learning: Teori, Studi Kasus dan Implementasi Menggunakan Phyton, 1st ed. Pekanbaru, Indonesia: UR PRESS, 2021.
R.D. Fitriani, H. Yasin, and Tarno, “Penanganan klasifikasi kelas data tidak seimbang dengan random oversampling pada naive Bayes (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal),” J. Gaussian, vol. 10, no. 1, pp. 11–20, Feb. 2021, doi: 10.14710/j.gauss.v10i1.30243.
M.R. Kusnaidi, T. Gulo, and S. Aripin, “Penerapan normalisasi data dalam mengelompokkan data mahasiswa dengan menggunakan metode k-means untuk menentukan prioritas bantuan uang kuliah tunggal,” J. Comput. Syst. Inform., vol. 3, no. 4, pp. 330–338, Aug. 2022, doi: 10.47065/josyc.v3i4.2112.
M.F. Naufal et al., “Analisis perbandingan algoritma machine learning untuk prediksi potensi hilangnya nasabah bank,” Techno.COM, vol. 22, no. 1, pp. 1–11, Feb. 2023, doi: 10.33633/tc.v22i1.7302.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.