A Multilevel and Hierarchical Approach for Multilabel Classification Model in SDGs Research
The progress of research lines, marked by the increasing number of research publications, makes it increasingly difficult to identify the implementation of research publications, especially related to SDGs. Currently, the categorization of research publications into SDGs level has not been carried out. Pusat Penelitian dan Pengabdian Masyarakat Politeknik Statistika STIS needs this to monitor lecturers in implementing SDGs. This study aims to implement and evaluate problem transformation methods and machine learning classification algorithms with a multilevel and hierarchical approach to categorize research publications into SDGs levels. The problem transformation methods used are Binary Relevance, Label Powerset (LP), and Classifier Chains. In addition, machine learning classification algorithms used are Logistic Regression and Support Vector Machine (SVM). The inputs used are titles, abstracts, and titles and abstracts. The best filter model that classifies data into SDGs-NonSDGs is model with title and SVM with an accuracy of 0.8634. The best level model that classifies data to SDGs level is model with title, LP method, and SVM with a multilevel approach. The level model classifies data into 4 pillars, goals, targets, and indicators of SDGs with an accuracy of 0.8067, 0.7501, 0.6792, and 0.6194. In comparison to other inputs with more comprehensive information, the results show that title input has best accuracy. This results from the simultaneous use of English and Indonesian. Thus, future research can modify model with input of only one language to optimize TF-IDF process so that the meanings of words from each other languages are not considered different important words.
