The Comparison of ReliefF and C.45 for Feature Selection on Heart Disease Classification Using Backpropagation
Anita Desiani(1*), Yuli Andriani(2), Irmeilyana Irmeilyana(3), Rifkie Primartha(4), Muhammad Arhami(5), Dwi Fitrianti(6), Henny Nur Syafitri(7)
(1) Mathematics, Sriwijaya University
(2) Technical Information, Sriwijaya University
(3) Technical Information, Politeknik Negeri Lhokseumawe
(4) Technical Information, Sriwijaya University
(5) Technical Information, Politeknik Negeri Lhokseumawe
(6) Mathematics, Sriwijaya University
(7) Mathematics, Sriwijaya University
(*) Corresponding Author
Abstract
One of the datasets used to classify heart disease is UCI dataset. unfortunately, the dataset contains missing data. Backpropagation is an easy and fast method, but it is very dependent on input data so if there is missing data, it can reduce the performance of the backpropagation. One of the techniques used to handle missing data is feature selection. This study compares ReliefF and C4.5 algorithm in feature selection. The purpose of the study is to find way in overcoming missing data by feature selection to improve backpropagation performance in the heart disease classification. The results of these algorithms are applied to the classification by Backpropagation. The results will be measured based on accuracy, precision, and recall. The performance results of the ReliefF and Backpropagation are above 82%. The performance results of of C4.5 and backpropagation are 80.54% on average for accuracy, recall and precision. Based on the results it can be concluded the ReliefF gives better performance on backpropagation than C4.5. ReliefF is also able to handle missing data by performing feature selection to improve the performance of the backpropagation method for heart disease classification compared to C4.5. Although the C4.5 algorithm is able to provide increased performance on backpropagation, C4.5 is not appropriate to be used as a feature selection method for handling missing data.
Keywords
Full Text:
PDFReferences
[1] X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, “An Improved Method to Construct basic Probability Assignment based on The Confusion Matrix for Classification Problem,” Inf. Sci. (Ny)., vol. 340–341, pp. 250–261, 2016, doi: 10.1016/j.ins.2016.01.033.
[2] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease prediction using hybrid machine learning techniques,” IEEE Access, vol. 7, pp. 81542–81554, 2019, doi: 10.1109/ACCESS.2019.2923707.
[3] T. Johnson, L. Zhao, G. Manuel, H. Taylor, and D. Liu, “Approaches to therapeutic angiogenesis for ischemic heart disease,” J. Mol. Med., vol. 97, no. 2, pp. 141–151, 2019, doi: 10.1007/s00109-018-1729-3.
[4] H. Yang and J. M. Garibaldi, “A hybrid model for automatic identification of risk factors for heart disease,” J. Biomed. Inform., vol. 58, pp. S171–S182, 2015, doi: 10.1016/j.jbi.2015.09.006.
[5] C. M. Otto et al., “2020 ACC/AHA Guideline for the Management of Patients With Valvular Heart Disease: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines,” J. Am. Coll. Cardiol., vol. 77, no. 4, pp. e25–e197, 2021, doi: 10.1016/j.jacc.2020.11.018.
[6] Z. F. Hussain et al., “A new model for iris data set classification based on linear support vector machine parameter’s optimization,” Int. J. Electr. Comput. Eng., vol. 10, no. 1, pp. 1079–1084, 2020, doi: 10.11591/ijece.v10i1.pp1079-1084.
[7] A. F. Costa, M. S. Santos, J. P. Soares, and P. H. Abreu, Missing data imputation via denoising autoencoders: The untold story, vol. 11191 LNCS. Springer International Publishing, 2018.
[8] Y. Tian, K. Zhang, J. Li, X. Lin, and B. Yang, “LSTM-based traffic flow prediction with missing data,” Neurocomputing, vol. 318, pp. 297–305, 2018, doi: 10.1016/j.neucom.2018.08.067.
[9] J. Zhang, Y. Xiong, and S. Min, “A new hybrid filter/wrapper algorithm for feature selection in classification,” Anal. Chim. Acta, vol. 1080, no. 2, pp. 43–54, 2019, doi: 10.1016/j.aca.2019.06.054.
[10] K. M. Lang and T. D. Little, “Principled missing data treatments,” Prev. Sci., vol. 19, no. 3, pp. 284–294, 2018, doi: 10.1007/s11121-016-0644-5.
[11] V. Vakharia, V. K. Gupta, and P. K. Kankar, “Efficient fault diagnosis of ball bearing using ReliefF and Random Forest classifier,” J. Brazilian Soc. Mech. Sci. Eng., vol. 39, no. 8, pp. 2969–2982, 2017, doi: 10.1007/s40430-017-0717-9.
[12] L. Sun, X. Kong, J. Xu, Z. Xue, R. Zhai, and S. Zhang, “A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification,” Sci. Rep., vol. 9, no. 1, pp. 1–14, 2019, doi: 10.1038/s41598-019-45223-
[13] O. Osanaiye, O. Ogundile, F. Aina, and A. Periola, “Feature selection for intrusion detection system in a cluster-based heterogeneous wireless sensor network,” Facta Univ. - Ser. Electron. Energ., vol. 32, no. 2, pp. 315–330, 2019, doi: 10.2298/fuee1902315o.
[14] S. K. Baliarsingh, W. Ding, S. Vipsita, and S. Bakshi, “A memetic algorithm using emperor penguin and social engineering optimization for medical data classification,” Appl. Soft Comput. J., vol. 85, p. 105773, 2019, doi: 10.1016/j.asoc.2019.105773.
[15] Q. Liu, X. Xu, Y. Tao, and X. Wang, “An Improved Decision Tree Method Base on RELIEFF for Medical Diagnosis,” Proc. - 2016 Int. Conf. Digit. Home, ICDH 2016, pp. 133–138, 2017, doi: 10.1109/ICDH.2016.037.
[16] S. Yahdin, A. Desiani, N. Gofar, K. Agustin, and D. Rodiah, “Application of the Relief-f Algorithm for Feature Selection in the Prediction of the Relevance Education Background with the Graduate Employment of the Universitas Sriwijaya,” Comput. Eng. Appl., vol. 10, no. 2, pp. 71–80, 2021.
[17] M. A. Muslim, S. H. Rukmana, E. Sugiharti, B. Prasetiyo, and S. Alimah, “Optimization of C4.5 algorithm-based particle swarm optimization for breast cancer diagnosis,” J. Phys. Conf. Ser., vol. 983, no. 1, 2018, doi: 10.1088/1742-6596/983/1/012063.
[18] A. Cherfi, K. Nouira, and A. Ferchichi, “Very fast C4.5 decision tree algorithm Cherfi, A., Nouira, K., & Ferchichi, A. (2018). Very fast C4.5 decision tree algorithm. Applied Artificial Intelligence, 32(2), 119–137. https://doi.org/10.1080/08839514.2018.1447479,” Appl. Artif. Intell., vol. 32, no. 2, pp. 119–137, 2018, doi: 10.1080/08839514.2018.1447479.
[19] J. A. Suyatno, F. Nhita, and A. A. Rohmawati, “Rainfall forecasting in Bandung regency using C4.5 algorithm,” 2018 6th Int. Conf. Inf. Commun. Technol. ICoICT 2018, vol. 0, no. c, pp. 324–328, 2018, doi: 10.1109/ICoICT.2018.8528725.
[20] U. Pujianto, A. L. Setiawan, H. A. Rosyid, and A. M. M. Salah, “Comparison of naïve bayes algorithm and decision tree C4.5 for hospital readmission diabetes patients using HbA1c Measurement,” Knowl. Eng. Data Sci., vol. 2, no. 2, p. 58, 2019, doi: 10.17977/um018v2i22019p58-71.
[21] E. Prasetyo and B. Prasetiyo, “Increased Classification Accuracy C4 . 5 Algorithm Using Bagging Techniques in Diagnosing Heart Disease,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 5, pp. 1035–1040, 2020, doi: 10.25126/jtiik.202072379.
[22] N. Mohd Nawi, E. T. Tosida, H. Hasbi, and N. Abdul Hamid, “An Implementation of First and Second Order Neural Network Classification on Potential Drug Addict Repetition,” Emerg. Adv. Integr. Technol., vol. 02, no. 01, pp. 18–29, 2021
[23] S. Setti and A. Wanto, “Analysis of Backpropagation Algorithm in Predicting the Most Number of Internet Users in the World,” J. Online Inform., vol. 3, no. 2, p. 110, 2019, doi: 10.15575/join.v3i2.205.
[24] B. Zhang, X. L. Liang, H. Y. Gao, L. S. Ye, and Y. G. Wang, “Models of logistic regression analysis, support vector machine, and back-propagation neural network based on serum tumor markers in colorectal cancer diagnosis,” Genet. Mol. Res., vol. 15, no. 2, 2016, doi: 10.4238/gmr.15028643.
[25] T. Mhatre and S. Varma, “IJERT-Heart Disease Prediction using Evolutionary based Artificial Neural Network Heart Disease Prediction using Evolutionary based Artificial Neural Network,” Int. J. Eng. Res. Technol., vol. 8, no. 08, 2019.
[26] Y. M. S. Al-barzinji, M. A. Ahmad, and B. K. Saeed, “International Transaction Journal of Engineering , Management , & Applied Sciences & Technologies GENETIC DISTANCES AMONG EIGHT ORNAMENTAL,” vol. 11, no. 2, pp. 1–10, 2020, doi: 10.14456/ITJEMAST.2020.294.
[27] S. Furmanek et al., “University of Louisville Journal of Respiratory Infections The City of Louisville Encapsulates the United States Demographics,” pp. 1–6, 2020, doi: 10.18297/jri/vol4/iss2/4.Abstract.
[28] H. Shan, H. Xu, S. Zhu, and B. He, “A novel channel selection method for optimal classification in different motor imagery BCI paradigms,” Biomed. Eng. Online, vol. 14, no. 1, p. 1, 2015, doi: 10.1186/s12938-015-0087-4.
[29] K. Celikmih, O. Inan, and H. Uguz, “Failure Prediction of Aircraft Equipment Using Machine Learning with a Hybrid Data Preparation Method,” Sci. Program., vol. 2020, 2020, doi: 10.1155/2020/8616039.
[30] L. Zajmi, F. Y. H. Ahmed, and A. A. Jaharadak, “Concepts, Methods, and Performances of Particle Swarm Optimization, Backpropagation, and Neural Networks,” Appl. Comput. Intell. Soft Comput., vol. 2018, 2018, doi: 10.1155/2018/9547212.
[31] M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,” IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: 10.1109/ACCESS.2020.2994222.
[32] A. Seifi and F. Soroush, “Pan evaporation estimation and derivation of explicit optimized equations by novel hybrid meta-heuristic ANN based methods in different climates of Iran,” Comput. Electron. Agric., vol. 173, no. February, 2020, doi: 10.1016/j.compag.2020.105418.
DOI: https://doi.org/10.22146/ijccs.82948
Article Metrics
Abstract views : 1587 | views : 1299Refbacks
- There are currently no refbacks.
Copyright (c) 2023 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1