Social-Child-Case Document Clustering based on Topic Modeling using Latent Dirichlet Allocation

Nur Annisa Tresnasari(1*), Teguh Bharata Adji(2), Adhistya Erna Permanasari(3)

(1) Department of Electrical Engineering & Information Technology, UGM, Yogyakarta
(2) Department of Electrical Engineering & Information Technology, UGM, Yogyakarta
(3) Department of Electrical Engineering & Information Technology, UGM, Yogyakarta
(*) Corresponding Author


Children are the future of the nation. All treatment and learning they get would affect their future. Nowadays, there are various kinds of social problems related to children.  To ensure the right solution to their problem, social workers usually refer to the social-child-case (SCC) documents to find similar cases in the past and adapting the solution of the cases. Nevertheless, to read a bunch of documents to find similar cases is a tedious task and needs much time. Hence, this work aims to categorize those documents into several groups according to the case type. We use topic modeling with Latent Dirichlet Allocation (LDA) approach to extract topics from the documents and classify them based on their similarities. The Coherence Score and Perplexity graph are used in determining the best model. The result obtains a model with 5 topics that match the targeted case types. The result supports the process of reusing knowledge about SCC handling that ease the finding of documents with similar cases


Text Clustering; Topic Modeling; Latent Dirichlet Allocation; Social Child Case

Full Text:



[1] Indonesian Ministry of Social, Pedoman Pendataan dan Pengelolaan Data Penyandang Masalah Kesejahteraan Sosial dan Potensi dan Sumber Kesejahteraan Sosial. Indonesia, 2012, pp. 1–7.

[2] R. S. H. Ellya Susilowati, Krisna Dewi, Meiti Subardhini, Dwi Yuliani, Tuti Kartika, Rini Hartini Rindra, “Kompetensi Pekerja Sosial dalam Pelaksanaan Tugas Respon Kasus Anak Berhadapan dengan Hukum di Cianjur,” PEKSOS J. Ilm. Pekerj. Sos., vol. 16, no. 1, pp. 71–87, 2017.

[3] L. J. Wan H., Ning B., Tao X., “Research on Chinese Short Text Clustering Ensemble via Convolutional Neural Networks,” in Artificial Intelligence in China, 2020, pp. 622–628.

[4] N. Saini, S. Saha, and P. Bhattacharyya, “Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution,” Cognit. Comput., vol. 11, no. 2, pp. 271–293, 2019.

[5] R. A. H. M. Rupasingha and I. Paik, “Alleviating sparsity by specificity-aware ontology-based clustering for improving web service recommendation,” IEEJ Trans. Electr. Electron. Eng., vol. 14, no. 10, pp. 1507–1517, Oct. 2019.

[6] S. Kang et al., “Ontology-Based Ambiguity Resolution of Manufacturing Text for Formal Rule Extraction,” J. Comput. Inf. Sci. Eng., vol. 19, no. 2, Feb. 2019.

[7] R. Sandhiya and M. Sundarambal, “Clustering of biomedical documents using ontology-based TF-IGM enriched semantic smoothing model for telemedicine applications,” Cluster Comput., vol. 22, no. 2, pp. 3213–3230, 2019.

[8] X. Sun, X. Liu, B. Li, Y. Duan, H. Yang, and J. Hu, “Exploring topic models in software engineering data analysis: A survey,” in IEEE/ACIS 17th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2016, 2016, pp. 357–362.

[9] C. Jacobi, W. Van Atteveldt, and K. Welbers, “Quantitative analysis of large amounts of journalistic texts using topic modelling,” Digit. Journal., vol. 4, no. 1, pp. 89–106, 2016.

[10] S. I. Nikolenko, S. Koltcov, and O. Koltsova, “Topic modelling for qualitative studies,” J. Inf. Sci., vol. 43, no. 1, pp. 88–102, 2017.

[11] M. Shovkun, K. R. Fleischmann, and B. Xie, “Computational social science using topic modeling: Analyzing patients’ values using a large hospital survey,” Proc. Assoc. Inf. Sci. Technol., vol. 55, no. 1, pp. 892–893, 2018.

[12] Y. H. Kee, C. Li, L. C. Kong, C. J. Tang, and K. L. Chuang, “Scoping Review of Mindfulness Research: a Topic Modelling Approach,” Mindfulness (N. Y)., vol. 10, no. 8, pp. 1474–1488, 2019.

[13] A. Onan, H. Bulut, and S. Korukoglu, “An improved ant algorithm with LDA-based representation for text document clustering,” J. Inf. Sci., vol. 43, no. 2, pp. 275–292, 2017.

[14] C. Li et al., “LDA Meets Word2Vec: A Novel Model for Academic Abstract Clustering Changzhou,” in WWW ’18 Companion April 23-27, 2018, Lyon, France., 2018, vol. 2, pp. 1699–1706.

[15] H. Ma and T. Zhang, “Research on policy text clustering algorithm based on LDA-Gibbs model,” J. Adv. Comput. Intell. Intell. Informatics, vol. 23, no. 2, pp. 268–273, 2019.

[16] H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimed. Tools Appl., vol. 78, no. 11, pp. 15169–15211, 2019.

[17] W. Etaiwi and G. Naymat, “The Impact of applying Different Preprocessing Steps on Review Spam Detection,” Procedia Comput. Sci., vol. 113, pp. 273–279, 2017.

[18] M. Petrović, Dorde and Stanković, “The Influence of Text Preprocessing Methods and Tools on Calculating Text Similarity,” Ser. Math. Inform., vol. 34, no. 5, pp. 973–994, 2019.

[19] A. Schofield and D. Mimno, “Comparing Apples to Apple: The Effects of Stemmers on Topic Models,” Trans. Assoc. Comput. Linguist., vol. 4, pp. 287–300, 2016.

[20] A. Schofield, M. Magnusson, and D. Mimno, “Pulling out the stops: Rethinking stopword removal for topic models,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 2, pp. 432–436, 2017.

[21] V. H. A. Soares, R. J. G. B. Campello, S. Nourashrafeddin, E. Milios, and M. C. Naldi, “Combining semantic and term frequency similarities for text clustering,” Knowl. Inf. Syst., vol. 61, no. 3, pp. 1485–1516, 2019.

[22] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no. 4–5, pp. 993–1022, 2003.

[23] M. V. Mantyla, M. Claes, and U. Farooq, “Measuring LDA topic stability from clusters of replicated runs,” Int. Symp. Empir. Softw. Eng. Meas., 2018.

[24] G. Xu, Y. Meng, Z. Chen, X. Qiu, C. Wang, and H. Yao, “Research on Topic Detection and Tracking for Online News Texts,” IEEE Access, vol. 7, pp. 58407–58418, 2019.

[25] S. K. Habibabadi and P. D. Haghighi, “Topic Modelling for Identification of Vaccine Reactions in Twitter,” ACM Int. Conf. Proceeding Ser., 2019.

[26] K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler, “Exploring topic coherence over many models and many topics,” EMNLP-CoNLL 2012 - 2012 Jt. Conf. Empir. Methods Nat. Lang. Process. Comput. Nat. Lang. Learn. Proc. Conf., no. July, pp. 952–961, 2012.

[27] K. Toros, D. M. DiNitto, and A. Tiko, “Family engagement in the child welfare system: A scoping review,” Child. Youth Serv. Rev., vol. 88, no. July 2016, pp. 598–607, 2018.


Article Metrics

Abstract views : 2652 | views : 2467


  • There are currently no refbacks.

Copyright (c) 2020 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2