MULTI-WD: Multilingual Completion Tool for Wikidata Data
Abstract
Wikidata, a rapidly expanding knowledge graph (KG), owes its growth to two primary factors. First, Wikidata allows open access and editing by anyone. Second, it offers a multilingual feature that enables data entities to be accessed in various languages worldwide. However, the issue of incomplete information across multiple languages remains a significant challenge. For instance, the description of the entity “bada reuteuk” (ID: Q100606305) is currently available only in Indonesian as “a traditional food in Indonesia,” but it lacks descriptions in other languages. Consequently, these data are not accessible or recognizable in languages other than Indonesian. The system incorporates two primary features: language profiling and data translation. Language profiling, implemented using SPARQL queries via the Wikidata API, provides an overview of the multilingual status of Wikidata entities. For data translation, the system utilized the Translated Labs library, chosen for its open access, cost-free availability, and high-quality translation outputs. The translated results are subsequently saved into Wikidata. System evaluation involved five respondents from the Wikidata community, using a black-box testing approach. Results demonstrated that MULTI-WD’s core functionalities—including category selection, data statistics display, translation, and data updates—achieved 100% operational success. Furthermore, the tool enhanced data translation efficiency by up to 300% compared to manual translation directly through the Wikidata interface.
References
A. Hogan et al., “Knowledge graphs,” ACM Comput. Surv., vol. 54, no. 4, pp. 1–37, May 2022, doi: 10.1145/3447772.
A.A. Krisnadhi, M. Yani, and I. Budi, “Entity and relation linking for knowledge graph question answering using gradual searching,” J. Nas. Tek. Elek. Teknol. Inf., vol. 13, no. 2, pp. 139–146, May 2024, doi: 10.22146/jnteti.v13i2.9184.
M. Yani and A.A. Krisnadhi, “Challenges, techniques, and trends of simple knowledge graph question answering: A survey,” Information, vol. 12, no. 7, pp. 1–31, Jul. 2021, doi: 10.3390/info12070271.
S. Figueroa, “Knowledge discovery in Wikidata with machine learning in graph,” in Inf. Syst. Technol., A. Rocha dkk., Eds., Cham, Switzerland: Springer, 2024, pp. 3–12, doi: 10.1007/978-3-031-45645-9_1.
K. Tharani, “Much more than a mere technology: A systematic review of Wikidata in libraries,” J. Acad. Librariansh., vol. 47, no. 2, pp. 1–8, Mar. 2021, doi: 10.1016/j.acalib.2021.102326.
K. Shenoy et al., “A study of the quality of Wikidata,” J. Web Semant., vol. 72, pp. 1–10, Apr. 2022, doi: 10.1016/j.websem.2021.100679.
D. Vrandečić and M. Krötzsch, “Wikidata: A free collaborative knowledgebase,” Commun. ACM, vol. 57, no. 10, pp. 78–85, Oct. 2014, doi: 10.1145/2629489.
M.U. Akhtar et al., “Entity alignment based on relational semantics augmentation for multilingual knowledge graphs,” Knowl.-Based Syst., vol. 252, pp. 1–10, Sep. 2022, doi: 10.1016/j.knosys.2022.109494.
A. Perevalov, D. Diefenbach, R. Usbeck, and A. Both, “QALD-9-plus: A multilingual dataset for question answering over DBpedia and Wikidata translated by native speakers,” in 2022 IEEE 16th Int. Conf. Semant. Comput. (ICSC), 2022, pp. 229–234, doi: 10.1109/ICSC52841.2022.00045.
Z. Shaik, F. Ilievski, and F. Morstatter, “Analyzing race and country of citizenship bias in Wikidata,” dalam 2021 IEEE 18th Int. Conf. Mob. Ad Hoc Smart Syst. (MASS), 2021, pp. 665–666, doi: 10.1109/MASS52906.2021.00099.
A. Pratapa, R. Gupta, and T. Mitamura, “Multilingual event linking to Wikidata,” in Proc. Workshop Multiling. Inf. Access (MIA), 2022, pp. 37–58, doi: 10.18653/v1/2022.mia-1.5.
F. Darari, R.E. Prasojo, S. Razniewski, and W. Nutt, “COOL-WD: A completeness tool for Wikidata,” in CEUR Workshop Proc., 2017, pp. 1–4.
A. Wisesa et al., “Wikidata completeness profiling using ProWD,” in K-CAP 2019 - Proc. 10th Int. Conf. Knowl. Capture, 2019, pp. 123–130, doi: 10.1145/3360901.3364425.
L.-A. Kaffee et al., “Multilingual knowledge graphs and low-resource languages: A review,” Trans. Graph Data Knowl. (TGDK), vol. 1, No. 1, pp. 1–19, Dec. 2023, doi: 10.4230/TGDK.1.1.10.
M. Yani, A.A. Krisnadhi, and I. Budi, “A better entity detection of question for knowledge graph question answering through extracting position-based patterns,” J. Big Data, vol. 9, pp. 1–26, Jun. 2022, doi: 10.1186/s40537-022-00631-1.
E. Prud’hommeaux and A. Seaborne. “SPARQL query language for RDF.” W3C. Access date: 15-Jan-2024. [Online]. Available: https://www.w3.org/TR/rdf-sparql-query/
G. Xiao and J. Corman, “Ontology-mediated SPARQL query answering over knowledge graphs,” Big Data Res., vol. 23, pp. 1–25, Feb. 2021, doi: 10.1016/j.bdr.2020.100177.
M. Mosser et al., “Querying APIs with SPARQL,” Inf. Syst., vol. 105, pp. 1–14, Mar. 2022, doi: 10.1016/j.is.2020.101650.
M. Bakhshi, M. Nematbakhsh, M. Mohsenzadeh, and A.M. Rahmani, “Data-driven construction of SPARQL queries by approximate question graph alignment in question answering over knowledge graphs,” Expert Syst. Appl., vol. 146, pp. 1–19, May 2020, doi: 10.1016/j.eswa.2020.113205.
K. Syamsi et al., “Developing a culture-based Indonesian language textbook for non-native speakers for academic purposes,” Cakrawala Pendidik., vol. 43, no. 1, pp. 115–126, Feb. 2024, doi: 10.21831/cp.v43i1.60321.
O. Bojar et al., “Findings of the 2018 conference on machine translation (WMT18),” in Proc. 3rd Conf. Mach. Transl. (WMT), 2018, pp. 272–303, doi: 10.18653/v1/W18-64028.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.