ESSAY ANSWER CLASSIFICATION WITH SMOTE RANDOM FOREST AND ADABOOST IN AUTOMATED ESSAY SCORING
Wilia Satria(1), Mardhani Riasetiawan(2*)
(1) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(2) (Scopus ID : 36139136200); Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author
Abstract
Automated essay scoring (AES) is used to evaluate and assessment student essays are written based on the questions given. However, there are difficulties in conducting automatic assessments carried out by the system, these difficulties occur due to typing errors (typos), the use of regional languages , or incorrect punctuation. These errors make the assessment less consistent and accurate. Based on the dataset analysis that has been carried out, there is an imbalance between the number of right and wrong answers, so a technique is needed to overcome the data imbalance. Based on the literature, to overcome these problems, the Random Forest and AdaBoost classification algorithms can be used to improve the consistency of classification accuracy and the SMOTE method to overcome data imbalances.
The Random Forest method using SMOTE can achieve an F1 measure of 99%, which means that the hybrid method can overcome the problem of imbalanced datasets that are limited to AES. The AdaBoost model with SMOTE produces the highest F1 measure reaching 99% of the entire dataset. The structure of the dataset is something that also affects the performance of the model. So the best model obtained in this study is the Random Forest model with SMOTE.
Keywords
Full Text:
PDFReferences
M. Z. Alam, M. S. Rahman, and M. S. Rahman, “A Random Forest based predictor for medical data classification using feature ranking,” Informatics in Medicine Unlocked, vol. 15, no. January, p. 100180, 2019, doi: 10.1016/j.imu.2019.100180.
A. J. Wyner, M. Olson, J. Bleich, and D. Mease, “Explaining the success of AdaBoost and random forests as interpolating classifiers,” Journal of Machine Learning Research, vol. 18, pp. 1–33, 2017.
A. S. More and D. P. Rana, “An Experimental Assessment of Random Forest Classification Performance Improvisation with Sampling and Stage Wise Success Rate Calculation,” Procedia Computer Science, vol. 167, pp. 1711–1721, 2020, doi: 10.1016/j.procs.2020.03.381.
K. Nugroho et al., “Improving random forest method to detect hatespeech and offensive word,” 2019 International Conference on Information and Communications Technology, ICOIACT 2019, pp. 514–518, 2019, doi: 10.1109/ICOIACT46704.2019.8938451.
Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB et al. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. In: Proceedings of KDD Bigdas. Canada; 2017.
A. Goyal, L. Rathore, and A. Sharma, "SMO-RF:A machine learning approach by random forest for predicting class imbalance followed by SMOTE," Materials Today: Proceedings, p. S2214785320406091, Feb. 2021, doi: 10.1016/j.matpr.2020.12.891.
M. Müller, L. Longard, and J. Metternich, “Comparison of preprocessing approaches for text data in digital shop floor management systems,” Procedia CIRP, vol. 107, pp. 179–184, 2022, doi: 10.1016/j.procir.2022.04.030.
S. Barua, M. M. Islam, X. Yao and K. Murase, "MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning," in IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 405-425, Feb. 2014, doi: 10.1109/TKDE.2012.232.
G. B. Herwanto, Y. Sari, B. N. Prastowo, I. A. Bustoni, and I. Hidayatulloh, “UKARA: A Fast and Simple Automatic Short Answer Scoring System for Bahasa Indonesia,” Iceap 2018, vol. 2, no. 2, pp. 48–53, 2018.
M. Riasetiawan, B. N. Prastowo, and I. Novindasari, “SISTEM SKORING OTOMATIS UNTUK DATA JAWABAN ESAI DENGAN MENGGUNAKAN PENDEKATAN KOMPUTASI : CLUSTERING DAN CONVOLUTIONAL NEURAL Automatic Scoring System for Essay Answer Data Using Computational Approach :,” PROSIDING 1st National Conference on Educational Assessment and Policy (NCEAP 2018), no. Nceap, pp. 89–96, 2018.
S. F. Abdoh, M. Abo Rizka, and F. A. Maghraby, “Cervical cancer diagnosis using random forest classifier with SMOTE and feature reduction techniques,” IEEE Access, vol. 6, pp. 59475–59485, 2018, doi: 10.1109/ACCESS.2018.2874063.
J. Sun, H. Li, H. Fujita, B. Fu, and W. Ai, “Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting,” Information Fusion, vol. 54, no. July 2019, pp. 128–144, 2020, doi: 10.1016/j.inffus.2019.07.006.
K. N. V. P. S. Rajesh and R. Dhuli, “Classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier,” Biomedical Signal Processing and Control, vol. 41, pp. 242–254, 2018, doi: 10.1016/j.bspc.2017.12.004.
K. Polat, “A Hybrid Approach to Parkinson Disease Classification Using Speech Signal: The Combination of SMOTE and Random Forests,” in 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, Apr. 2019, pp. 1–3. doi: 10.1109/EBBT.2019.8741725.
A. D. Amirruddin, F. M. Muharam, M. H. Ismail, N. P. Tan, and M. F. Ismail, Hyperspectral spectroscopy and imbalance data approaches for classification of oil palm’s macronutrients observed from frond 9 and 17, Computers and Electronics in Agriculture, vol. 178, p. 105768, Nov. 2020, doi: 10.1016/j.compag.2020.105768.
Ngurah, G., Nata, M. & Yudiastra, P.P., 2017, “Preprocessing Text Mining Pada Email Box Bahasa Indonesia”, In, Konferensi Nasional Sistem & Informatika 2017, STMIK STIKOM, Bali, pp. 479-483.
Hayatin, N., Fatichah, C. & Purwitasari, D., “Trending issue untuk Peningkatan Multi Dokumen”, Jurnal Ilmiah TEknologi Informasi (JUTI), 13,1, 38-44.2015.
Saputra, I.P.G.H., “Peringkasan Teks Otomatis Untuk Dokumen Bahasa Bali Berbasis Metode Ektraktif”, Jurnal Ilmu Komputer, X, 1, 33-38. 2017.
Chawla, N.V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P., “SMOTE: Synthetic Minority Over-sampling Technique”, Journal of Artificial Intelligence Research, 2002, Volume 16, p. 321-357.
DOI: https://doi.org/10.22146/ijccs.82548
Article Metrics
Abstract views : 860 | views : 716Refbacks
- There are currently no refbacks.
Copyright (c) 2023 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1