Preprocessing Algorithm for K-Means Anomaly Detection on Payment Logs

https://doi.org/10.22146/ijccs.105290

Nur Rokhman(1*)

(1) Department of Computer Sciences and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada
(*) Corresponding Author

Abstract


The payment aggregator system with the single settlement feature enhances transaction efficiency. However, this also poses risks of cyberattacks and system errors. These risks can lead to abnormal events or anomalies. The middleware service records transaction activities in the form of logs. Log data can be analyzed for anomaly detection resulting from cyberattacks or system errors.

K-Means clustering is less effective in detecting anomalies in log data because transaction log data is often unstructured, inconsistent, and has varying feature scales.

This study develops a preprocessing algorithm to improve data quality before clustering. Transaction log data from July to December 2023 is used, with preprocessing stages including normalization, standardization, and Principal Component Analysis (PCA). K-Means is applied with K-Means++ initialization, and the number of clusters is determined using the kneedle algorithm. The results show that standardization improves segmentation, and PCA enhances anomaly detection effectiveness.


Keywords


Data Preprocessing, K-Means, Anomaly Detection, Middleware Logs, Kneedle Algorithm

Full Text:

PDF


References

C. H. Saputra, “Integrasi Audit dan Teknik Clustering untuk Segmentasi dan Kategorisasi Aktivitas Log,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 1, 2024, doi: 10.25126/jtiik.20241118071.

K. DeMedeiros, C. Y. Koh, and A. Hendawi, “Clustering on the Chicago Array of Things: Spotting Anomalies in the Internet of Things Records,” Future Internet, vol. 16, no. 1, Jan. 2024, doi: 10.3390/fi16010028.

N. Basha and A. K. P.S., “Distance-based K-Means Clustering Algorithm for Anomaly Detection in Categorical Datasets,” Int J Comput Appl, vol. 183, no. 11, 2021, doi: 10.5120/ijca2021921415.

L. Fan, J. Ma, J. Tian, T. Li, and H. Wang, “Comparative Study of Isolation Forest and LOF algorithm in anomaly detection of data mining,” in Proceedings - 2021 International Conference on Big Data, Artificial Intelligence and Risk Management, ICBAR 2021, 2021. doi: 10.1109/ICBAR55169.2021.00008.

H. K. Prakosa and N. Rokhman, “Anomaly Detection in Hospital Claims Using K-Means and Linear Regression,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 4, p. 391, Oct. 2021, doi: 10.22146/ijccs.68160.

A. A. Ma’ali, Girinoto, M. N. Ghiffari, and R. B. Hadiprakoso, “Analisis Log Web Server dengan Pendekatan Algoritme K-Means Clustering dan Feature Importance,” Info Kripto, vol. 16, no. 3, 2022, doi: 10.56706/ik.v16i3.60.

S. Naeem, A. Ali, S. Anam, and M. M. Ahmed, “An Unsupervised Machine Learning Algorithms: Comprehensive Review,” International Journal of Computing and Digital Systems, vol. 13, no. 1, 2023, doi: 10.12785/ijcds/130172.

C. Pradana, S. S. Kusumawardani, and A. E. Permanasari, “Comparison Clustering Performance Based on Moodle Log Mining,” in IOP Conference Series: Materials Science and Engineering, 2020. doi: 10.1088/1757-899X/722/1/012012.

M. Gul and M. A. Rehman, “Big data: an optimized approach for cluster initialization,” J Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00798-1.

E. M. Qumsiyeh and M. N. Sabha, “Classification of Leaf Disease via Deep Neural Network combined with Clustering Algorithm,” Computational Mathematics, 2022.

N. Almusallam, “An Unsupervised Feature Selection Method for Data-Driven Anomaly Detection Systems,” in Proceedings of the Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, WETICE, 2020. doi: 10.1109/WETICE49692.2020.00016.

M. Kherbache, D. Espes, and K. Amroun, “An Enhanced approach of the K-means clustering for Anomaly-based intrusion detection systems,” in 2021 International Conference on Computing, Computational Modelling and Applications (ICCMA), 2021, pp. 78–83. doi: 10.1109/ICCMA53594.2021.00021.



DOI: https://doi.org/10.22146/ijccs.105290

Article Metrics

Abstract views : 72 | views : 36

Refbacks

  • There are currently no refbacks.




Copyright (c) 2025 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs



View My Stats1
View My Stats2