A Multilevel and Hierarchical Approach for Multilabel Classification Model in SDGs Research
Abstract
The progress of research lines, marked by the increasing number of research publications, makes it increasingly difficult to identify the implementation of research publications, especially related to SDGs. Currently, the categorization of research publications into SDGs level has not been carried out. Pusat Penelitian dan Pengabdian Masyarakat Politeknik Statistika STIS needs this to monitor lecturers in implementing SDGs. This study aims to implement and evaluate problem transformation methods and machine learning classification algorithms with a multilevel and hierarchical approach to categorize research publications into SDGs levels. The problem transformation methods used are Binary Relevance, Label Powerset (LP), and Classifier Chains. In addition, machine learning classification algorithms used are Logistic Regression and Support Vector Machine (SVM). The inputs used are titles, abstracts, and titles and abstracts. The best filter model that classifies data into SDGs-NonSDGs is model with title and SVM with an accuracy of 0.8634. The best level model that classifies data to SDGs level is model with title, LP method, and SVM with a multilevel approach. The level model classifies data into 4 pillars, goals, targets, and indicators of SDGs with an accuracy of 0.8067, 0.7501, 0.6792, and 0.6194. In comparison to other inputs with more comprehensive information, the results show that title input has best accuracy. This results from the simultaneous use of English and Indonesian. Thus, future research can modify model with input of only one language to optimize TF-IDF process so that the meanings of words from each other languages are not considered different important words.
References
Bappenas, “Sekilas SDGs,” Online. Accessed: Oct. 01, 2025. [Online]. Available: https://sdgs.bappenas.go.id/sekilas-sdgs/
BPS, “Persentase Penduduk Miskin (P0) Menurut Provinsi dan Daerah, 2007-2023,” 2023. Accessed: Oct. 01, 2023. [Online]. Available: https://www.bps.go.id/id/statistics-table/2/MTkyIzI=/persentase-penduduk-miskin--p0--menurut-provinsi-dan-daerah--persen-.html
BPS, “Angka Partisipasi Kasar (APK), 2003-2022,” 2022. Accessed: Oct. 01, 2023. [Online]. Available: https://www.bps.go.id/id/statistics-table/2/MzAzIzI=/angka-partisipasi-kasar---a-p-k--.html
BPS, “Angka Partisipasi Murni (APM), 2003-2022,” 2022. Accessed: Oct. 01, 2023. [Online]. Available: https://www.bps.go.id/id/statistics-table/2/MzA0IzI=/angka-partisipasi-murni---a-p-m--.html
Indonesia, “Undang-undang (UU) Nomor 12 Tahun 2012 tentang Pendidikan Tinggi,” 2012.
KEMENPANRB, “Peraturan Menteri Pendayagunaan Aparatur Negara dan Reformasi Birokrasi Nomor17 Tahun 2013 tentang Jabatan Fungsional Dosen dan Angka Kreditnya,” 2013. Accessed: Oct. 01, 2023. [Online]. Available: Peraturan Menteri Pendayagunaan Aparatur Negara dan Reformasi Birokrasi Nomor17 Tahun 2013 tentang Jabatan Fungsional Dosen dan Angka Kreditnya
Kemendikbud, “WCU Analysis,” 2016. Accessed: Oct. 01, 2023. [Online]. Available: https://sinta.kemdikbud.go.id/wcu
Politeknik Statistika STIS, “Call For Paper Seminar Nasional Official Statistics 2023,” Online. Accessed: Oct. 01, 2023. [Online]. Available: https://semnas.stis.ac.id/call-for-paper
C. Vens dkk., “Decision trees for hierarchical multi-label classification,” Mach. Learn., vol. 73, no. 2, hal. 185–214, Nov. 2008, doi: 10.1007/s10994-008-5077-3.
J. Hernández, L.E. Sucar, dan E.F. Morales, “Multidimensional hierarchical classification,” Expert Syst. Appl., vol. 41, no. 17, hal. 7671–7677, Des. 2014, doi: 10.1016/j.eswa.2014.05.054.
H.S. Oh dan Y. Jung, “External methods to address limitations of using global information on the narrow-down approach for hierarchical text classification,” J. Inf. Sci., vol. 40, no. 5, hal. 688–708, Okt. 2014, doi: 10.1177/0165551514544626.
Bappenas, “Peraturan Menteri Perencanaan Pembangunan Nasional/Kepala Badan Perencanaan Pembangunan Nasional Republik Indonesia Nomor 7 Tahun 2018 Tentang Koordinasi, Perencanaan, Pemantauan, Evaluasi, Dan Pelaporan Pelaksanaan Tujuan Pembangunan Berkelanjutan,” 2018.
R.C. Morales-Hernández, J.G. Juagüey, dan D. Becerra-Alonso, “A comparison of multi-label text classification models in research articles labeled with sustainable development goals,” IEEE Access, vol. 10, pp. 123534–123548, Nov. 2022, doi: 10.1109/ACCESS.2022.3223094.
I.H. Sarker, “Machine learning: Algorithms, Real-world applications and research directions,” SN Comput. Sci., vol. 2, no. 3, hal. 1-21, Mei 2021, doi: 10.1007/s42979-021-00592-x.
J. Alzubi, A. Nayyar, dan A. Kumar, “Machine learning from theory to algorithms: An overview,” J. Phys. Conf. Ser., vol. 1142, hal. 1-15, Nov. 2018, doi: 10.1088/1742-6596/1142/1/012012.
P. Chapman et al., CRISP-DM 1.0 Step-by-step data mining guide. SPSS Inc, 2000.
S. L. Octaria, “Analisa Integrasi Data SINTA (Science and Technology Index),” 2018. Accessed: Oct. 01, 2023. [Online]. Available: http://edocs.ilkom.unsri.ac.id/2906/1/TUGAS%202MTI%20_0903118%201621128_SITI%20LARISTA%20OCTARIA.pdf
J. Hughes, “krippendorffsalpha: An R package for measuring agreement using Krippendorff’s alpha coefficient,” R J., vol. 13, no. 1, hal. 413-425, Jun. 2021, doi: 10.32614/RJ-2021-046.
K. Krippendorff dan R. Craggs, “The reliability of multi-valued coding of data,” Commun. Methods Meas., vol. 10, no. 4, hal. 181–198, Okt. 2016, doi: 10.1080/19312458.2016.1228863.
M. Sighn, “Stop the Stopwords using Different Python Libraries ,” 2020.
F.Z. Tala, “A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia ,” Universiteti van Amsterdam The Netherlands., 2003.
C. Toraman, E.H. Yilmaz, F. Şahi̇nuç, dan O. Ozcelik, “Impact of tokenization on language models: An analysis for Turkish,” ACM Trans. Asian Low-Resour. Lang. Inf. Proc., vol. 22, no. 4, hal. 1–21, Apr. 2023, doi: 10.1145/3578707.
K. Kowsari dkk., “Text classification algorithms: A survey,” Inf., vol. 10, no. 4, hal. 1-68, Apr. 2019, doi: 10.3390/info10040150.
Z. Abdallah, A. El Zaart, dan M. Oueidat, “Experimental analysis and comparison of multilabel problem transformation methods for multimedia domain,” dalam 2015 Int. Conf. Appl. Res. Comput. Sci. Eng. (ICAR), 2015, hal. 1–8. doi: 10.1109/ARCSE.2015.7338147.
O. Ramadhani, “Klasifikasi Multi-Label dengan Problem Transformation menggunakan Python ,” 2020.
G. Mustafa dkk., “Multi-label classification of research articles using Word2Vec and identification of similarity threshold,” Sci. Rep., vol. 11, no. 1, hal. 1-20, Nov. 2021, doi: 10.1038/s41598-021-01460-7 .
M.D. Turner dkk., “Automated annotation of functional imaging experiments via multi-label classification,” Front Neurosci., vol. 7, hal. 1-13, Des. 2013, doi: 10.3389/fnins.2013.00240.
B.J. Hashimoto, “Is frequency enough?: The frequency model in vocabulary size testing,” Lang. Assess. Quart., vol. 18, no. 2, hal. 171–187, Mar. 2021, doi: 10.1080/15434303.2020.1860058.
N.A. Sajid dkk., “A novel metadata based multi-label document classification technique,” Comput. Syst. Sci. Eng., vol. 46, no. 2, hal. 2195–2214, Feb. 2023, doi: 10.32604/csse.2023.033844.
C. Cortes dan V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, hal. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
M.A. Salam, A.T. Azar, M.S. Elgendy, dan K.M. Fouad, “The effect of different dimensionality reduction techniques on machine learning overfitting problem,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, hal. 641–655, Apr. 2021, doi: 10.14569/IJACSA.2021.0120480.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.