Anomaly Detection in Hospital Claims Using K-Means and Linear Regression

Hendri Kurniawan Prakosa(1*), Nur Rokhman(2)

(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author


 BPJS Kesehatan, which has been in existence for almost a decade, is still experiencing a deficit in the process of guaranteeing participants. One of the factors that causes this is a discrepancy in the claim process which tends to harm BPJS Kesehatan. For example, by increasing the diagnostic coding so that the claim becomes bigger, making double claims or even recording false claims. These actions are based on government regulations is including fraud. Fraud can be detected by looking at the anomalies that appear in the claim data.

This research aims to determine the anomaly of hospital claim to BPJS Kesehatan. The data used is BPJS claim data for 2015-2016. While the algorithm used is a combination of K-Means algorithm and Linear Regression. For optimal clustering results, density canopy algorithm was used to determine the initial centroid.

Evaluation using silhouete index resulted in value of 0.82 with number of clusters 5 and RMSE value from simple linear regression modeling of 0.49 for billing costs and 0.97 for  length of stay. Based on that, there are 435 anomaly points out of 10,000 data or 4.35%. It is hoped that with the identification of these, more effective follow-up can be carried out.


Detection; Anomaly; BPJS Kesehatan; K-Means; Linear Regression

Full Text:



[1] K. K. Firdaus and L. S. Wondabio, 2019, “Analisis Iuran Dan Beban Kesehatan Dalam Rangka Evaluasi Program Jaminan Kesehatan”, Jurnal ASET, 1, 11, 147-158. [Accessed: 12-Mar-2020]

[2] A. Ramadhan, “79000 Klaim Rumah Sakit Berpotensi Anomali”,, 13 November 2018. [Accessed: 13-Apr-2020]

[3] S. H. Tatik, “Pencegahan Kecurangan (Fraud) Dalam Pelaksanaan Program Jaminan Kesehatan Pada Sistem Jaminan Sosial Kesehatan (SJSN) di Rumah Sakit Umum Daerah Menggala Tulang Bawang”, Fiat Justisia, 4, 10, 715-732, 2016. [Accessed: 12-Mar-2020]

[4] L.F.M. Carvalho, C.H.C. Teixeira, Meira, Wager Jr., Martin Ester, Osvaldo Carvalho and M. H. Brandao, “Provider-Consumer Anomaly Detection for Healtcare Systems”, IEEE International Conference on Healthcare Informatics (ICHI), pp. 229-238, 2017 [online]. Available:[Accessed: 23-August-2021]

[5] J. Y. S. Ng, R. V. Ramadani, D. Hendrawan, D. T. Duc, P. H. T. Kiet, “National Health Insurance Databases in Indonesia, Vietnam and the Philippines”, PharmacoEconomics-Open, 3, 517-526, 2019 [online]. Available: [Accessed: 2-July-2021]

[6] Alguliyev, Rasim & Aliguliyev, Ramiz & Abdullayeva, Fargana, “PSO+K-means Algorithm for Anomaly Detection in Big Data”, Statistics, Optimization & Information Computing, Vol. 7, 348-359, 2019 [online]. Available: [Accessed: 8-July-2020]

[7] S. G. Fashoto, O. Owolabi, O. Adeleye, J. Wandera, “Hybrid Method for Credit Card Froud Detection Using K-Means Clustering with Hidden Markov Model and Multilayer Perceptron Algorithm”, British Journal of Applied Science & Technology, 13(5), 1-11, 2016 [online]. Available: [Accessed: 12-Mar-2020]

[8] W. Yang, H. Long, L. Ma, H. Sun, “Research on Clustering Method Based on Weighted Distance Density and K-Means”, Procedia Computer Science, 166, 507-5011, 2020 [online]. Available: [Accessed: 11-Apr-2020]

[9] N. Rokhman, “A Survey on Mixed-Attribute Outlier Detection Method”, CommIT (Communication & Information Technology) Journal 13(1), 39-44, 2019 [online]. Available: [Accessed: 8-Apr-2021]

[10] Do Thi Thu Hien, Cu Thi Thu Thuy, Tran Kim Anh, Dao The Son and Cu Nguyen Giap, “Optimize the Combination of Categorical Variable Encoding and Deep Learning Technique for the Problem of Prediction of Vietnamese Student Academic Performance”, International Journal of Advanced Computer Science and Applications, 11, 10.14569/IJACSA.2020.0111135, 2020 [online]. Available: [Accessed: 6-July-2021]

[11] Larose, D.T., “Discovering Knowledge in Data Introduction to Data Mining, Wiley Interscience, New Jersey, 2005.

[12] A. Jamal, A. Handayani, A. A. Septiandri, E. Ripmiatin and Y. Effendi, “Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction”, LK, 3, 9, 192-201, 2018. [Accessed: 12-Aug-2020]

[13] K. Sirait, Tulus, E. B. Nababan, “K-Means Algorithm Performance Analysis With Determining The Value Of Starting Centroid With Random And KD-Tree Method”, J. Phys.: Conf. Ser., 930, 2017. [Accessed: 2-July-2020]

[14] G. Zhang, C. Zhang, and H. Zhang, “Improvement of K-Means Clustering Algorithm Based on Density”, Knowledge-Based Systems, 289-297, 2018 [online]. Available: [Accessed: 17-July-2020]

[15] R. Ananda, “Silhouette Density Canopy K-Means for Mapping the Quality of Education Based on the Results of the 2019 National Exam in Banyumas Regency”, Khazanah Informatika, 2, 5, 158-168, 2019. [Accessed: 15-Jun-2020]

[16] A. Sreenivasulu, “Evaluation of Cluster Based Anomaly Detection”, Thesis, Univesity of Skovde, Swedia, 2019 [online]. Available: [Accessed: 5-July-2021]

[17] A. C. Muller & S. Guido, Introduction to Machine Learning with Python A Guide for Data Scientist, O'Reilly Media. USA, 2017.

[18] M. A. Mondal and Z. Reehena, “Road Trafic Outlier Detection Technique based on Linear Regression”, Procedia Computer Science, 171, 2547-2555, 2020 [online]. Available: [Accessed: 18-May-2021]


Article Metrics

Abstract views : 411 | views : 582


  • There are currently no refbacks.

Copyright (c) 2021 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2