MULTI-WD: Multilingual Completion Tool for Wikidata Data

  • Mohammad Yani Program Studi Rekayasa Perangkat Lunak, Jurusan Teknik Informatika, Politeknik Negeri Indramayu, Indramayu, Jawa Barat 45252, Indonesia
  • Lilyan Arhatia Agustine Program Studi Rekayasa Perangkat Lunak, Jurusan Teknik Informatika, Politeknik Negeri Indramayu, Indramayu, Jawa Barat 45252, Indonesia
  • Iryanto Program Studi Rekayasa Perangkat Lunak, Jurusan Teknik Informatika, Politeknik Negeri Indramayu, Indramayu, Jawa Barat 45252, Indonesia
Keywords: Wikidata Profiling, Wikidata Multilingualism, Multilingual Profiling, Data Completeness

Abstract

Wikidata, a rapidly expanding knowledge graph (KG), owes its growth to two primary factors. First, Wikidata allows open access and editing by anyone. Second, it offers a multilingual feature that enables data entities to be accessed in various languages worldwide. However, the issue of incomplete information across multiple languages remains a significant challenge. For instance, the description of the entity “bada reuteuk” (ID: Q100606305) is currently available only in Indonesian as “a traditional food in Indonesia,” but it lacks descriptions in other languages. Consequently, these data are not accessible or recognizable in languages other than Indonesian. The system incorporates two primary features: language profiling and data translation. Language profiling, implemented using SPARQL queries via the Wikidata API, provides an overview of the multilingual status of Wikidata entities. For data translation, the system utilized the Translated Labs library, chosen for its open access, cost-free availability, and high-quality translation outputs. The translated results are subsequently saved into Wikidata. System evaluation involved five respondents from the Wikidata community, using a black-box testing approach. Results demonstrated that MULTI-WD’s core functionalities—including category selection, data statistics display, translation, and data updates—achieved 100% operational success. Furthermore, the tool enhanced data translation efficiency by up to 300% compared to manual translation directly through the Wikidata interface.

References

A. Hogan et al., “Knowledge graphs,” ACM Comput. Surv., vol. 54, no. 4, pp. 1–37, May 2022, doi: 10.1145/3447772.

A.A. Krisnadhi, M. Yani, and I. Budi, “Entity and relation linking for knowledge graph question answering using gradual searching,” J. Nas. Tek. Elek. Teknol. Inf., vol. 13, no. 2, pp. 139–146, May 2024, doi: 10.22146/jnteti.v13i2.9184.

M. Yani and A.A. Krisnadhi, “Challenges, techniques, and trends of simple knowledge graph question answering: A survey,” Information, vol. 12, no. 7, pp. 1–31, Jul. 2021, doi: 10.3390/info12070271.

S. Figueroa, “Knowledge discovery in Wikidata with machine learning in graph,” in Inf. Syst. Technol., A. Rocha dkk., Eds., Cham, Switzerland: Springer, 2024, pp. 3–12, doi: 10.1007/978-3-031-45645-9_1.

K. Tharani, “Much more than a mere technology: A systematic review of Wikidata in libraries,” J. Acad. Librariansh., vol. 47, no. 2, pp. 1–8, Mar. 2021, doi: 10.1016/j.acalib.2021.102326.

K. Shenoy et al., “A study of the quality of Wikidata,” J. Web Semant., vol. 72, pp. 1–10, Apr. 2022, doi: 10.1016/j.websem.2021.100679.

D. Vrandečić and M. Krötzsch, “Wikidata: A free collaborative knowledgebase,” Commun. ACM, vol. 57, no. 10, pp. 78–85, Oct. 2014, doi: 10.1145/2629489.

M.U. Akhtar et al., “Entity alignment based on relational semantics augmentation for multilingual knowledge graphs,” Knowl.-Based Syst., vol. 252, pp. 1–10, Sep. 2022, doi: 10.1016/j.knosys.2022.109494.

A. Perevalov, D. Diefenbach, R. Usbeck, and A. Both, “QALD-9-plus: A multilingual dataset for question answering over DBpedia and Wikidata translated by native speakers,” in 2022 IEEE 16th Int. Conf. Semant. Comput. (ICSC), 2022, pp. 229–234, doi: 10.1109/ICSC52841.2022.00045.

Z. Shaik, F. Ilievski, and F. Morstatter, “Analyzing race and country of citizenship bias in Wikidata,” dalam 2021 IEEE 18th Int. Conf. Mob. Ad Hoc Smart Syst. (MASS), 2021, pp. 665–666, doi: 10.1109/MASS52906.2021.00099.

A. Pratapa, R. Gupta, and T. Mitamura, “Multilingual event linking to Wikidata,” in Proc. Workshop Multiling. Inf. Access (MIA), 2022, pp. 37–58, doi: 10.18653/v1/2022.mia-1.5.

F. Darari, R.E. Prasojo, S. Razniewski, and W. Nutt, “COOL-WD: A completeness tool for Wikidata,” in CEUR Workshop Proc., 2017, pp. 1–4.

A. Wisesa et al., “Wikidata completeness profiling using ProWD,” in K-CAP 2019 - Proc. 10th Int. Conf. Knowl. Capture, 2019, pp. 123–130, doi: 10.1145/3360901.3364425.

L.-A. Kaffee et al., “Multilingual knowledge graphs and low-resource languages: A review,” Trans. Graph Data Knowl. (TGDK), vol. 1, No. 1, pp. 1–19, Dec. 2023, doi: 10.4230/TGDK.1.1.10.

M. Yani, A.A. Krisnadhi, and I. Budi, “A better entity detection of question for knowledge graph question answering through extracting position-based patterns,” J. Big Data, vol. 9, pp. 1–26, Jun. 2022, doi: 10.1186/s40537-022-00631-1.

E. Prud’hommeaux and A. Seaborne. “SPARQL query language for RDF.” W3C. Access date: 15-Jan-2024. [Online]. Available: https://www.w3.org/TR/rdf-sparql-query/

G. Xiao and J. Corman, “Ontology-mediated SPARQL query answering over knowledge graphs,” Big Data Res., vol. 23, pp. 1–25, Feb. 2021, doi: 10.1016/j.bdr.2020.100177.

M. Mosser et al., “Querying APIs with SPARQL,” Inf. Syst., vol. 105, pp. 1–14, Mar. 2022, doi: 10.1016/j.is.2020.101650.

M. Bakhshi, M. Nematbakhsh, M. Mohsenzadeh, and A.M. Rahmani, “Data-driven construction of SPARQL queries by approximate question graph alignment in question answering over knowledge graphs,” Expert Syst. Appl., vol. 146, pp. 1–19, May 2020, doi: 10.1016/j.eswa.2020.113205.

K. Syamsi et al., “Developing a culture-based Indonesian language textbook for non-native speakers for academic purposes,” Cakrawala Pendidik., vol. 43, no. 1, pp. 115–126, Feb. 2024, doi: 10.21831/cp.v43i1.60321.

O. Bojar et al., “Findings of the 2018 conference on machine translation (WMT18),” in Proc. 3rd Conf. Mach. Transl. (WMT), 2018, pp. 272–303, doi: 10.18653/v1/W18-64028.

Published
2024-11-28
How to Cite
Mohammad Yani, Lilyan Arhatia Agustine, & Iryanto. (2024). MULTI-WD: Multilingual Completion Tool for Wikidata Data. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 13(4), 297-304. https://doi.org/10.22146/jnteti.v13i4.13289
Section
Articles