Entity and Relation Linking for Knowledge Graph Question Answering Using Gradual Searching
Abstract
Knowledge graph question answering (KGQA) systems have an important role in retrieving data from a knowledge graph (KG). With the system, regular users can access data from a KG without the need to construct a formal SPARQL query. KGQA systems receive a natural language question (NLQ) and translate it into a SPARQL query through three main tasks, namely, entity and relation detection, entity and relation linking, and query construction. However, the translation is not trivial due to lexical gaps and entity ambiguity that may occur during entity or relation linking. This research proposed an approach based on multiclass classification of NLQ whose entity occurrences are detected into categories based on KG relations to address the lexical gap challenge. Next, to solve the entity ambiguity challenge, this research proposed a three-stage searching procedure to determine appropriate KG entities associated with the NLQ entities, given the correspondence between the NLQ and a particular KG relation. This three-stage searching consisted of text-based searching, vector-based searching, and entity and relation pairing. The proposed approach was evaluated on the SimpleQuestions and LC-QuAD 2.0 datasets. The experiments demonstrated that the proposed approach outperformed the state-of-the-art baseline. For the relation linking task, the proposed approach reached 89.87% and 74.83% recall for the SimpleQuestions and LC-QuAD 2.0, respectively. This approach also achieved 91.74% and 61.96% recall on the entity linking tasks for the SimpleQuestions and LC-QuAD 2.0, respectively.
References
D. Jurafsky and J.H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. London, England: Prentice Hall, 2009.
M. Yani, A.A. Krisnadhi, and I. Budi, “A better entity detection of question for knowledge graph question answering through extracting position-based patterns,” J. Big Data, vol. 9, pp. 1–26, Jun. 2022, doi: 10.1186/s40537-022-00631-1.
M. Yani and A.A. Krisnadhi, “Challenges, techniques, and trends of simple knowledge graph question answering: A survey,” Inf., vol. 12, no. 7, pp. 1–31, Jul. 2021, doi: 10.3390/info12070271.
E. Prud’hommeaux and A. Seaborne (2008) “SPARQL query language for RDF,” [Online], https://www.w3.org/TR/rdf-sparql-query/, access date: 15-Jan-2024.
F. Manola, E. Miller, and B. McBride (2014) “RDF 1.1 primer,” [Online], https://www.w3.org/TR/rdf11-primer/, access date: 15-Jan-2024.
K. Höffner et al., “Survey on challenges of question answering in the semantic web,” Semant. Web, vol. 8, no. 6, pp. 895–920, Aug. 2017, doi: 10.3233/SW-160247.
H. Bast and E. Haussmann, “More accurate question answering on freebase,” CIKM ’15, Proc. 24th ACM Int. Conf. Inf. Knowl. Manag., 2015, pp. 1431–1440, doi: 10.1145/2806416.2806472.
S. Shin, X. Jin, J. Jung, and K.-H. Lee, “Predicate constraints-based question answering over knowledge graph,” Inf. Process. Manag., vol. 56, no. 3, pp. 445–462, May 2019, doi: 10.1016/j.ipm.2018.12.003.
K. Xu, S. Zhang, Y. Feng, and D. Zhao, “Answering natural language questions via phrasal semantic parsing,” Nat. Lang. Process. Chin. Comput., 2014, pp. 333–344, doi: 10.1007/978-3-662-45924-9_30.
A. Delpeuch, “OpenTapioca: Lightweight entity linking for Wikidata,” 2019, arXiv: 1904.09131.
M. Dubey et al., “AskNow: A framework for natural language query formalization in SPARQL,” Proc. 13th Int. Conf. Semant. Web. Latest Adv. New Domains, 2016, pp. 300–316, doi: 10.1007/978-3-319-34129-3_19.
M. Dubey, D. Banerjee, D. Chaudhuri, and J. Lehmann, “EARL: Joint entity and relation linking for question answering over knowledge graphs,” 2018, arXiv: 1801.03825.
A. Sakor, K. Singh, and M.-E. Vidal, “FALCON: An entity and relation linking framework over DBpedia,” Proc. ISWC 2019 Satell. Tracks (Posters Demonstr. Ind. Outrageous Ideas) co-located with 18th Int. Semant. Web Conf. (ISWC 2019), 2019, pp. 265–268.
A. Sakor, K. Singh, A. Patel, and M.-E. Vidal, “Falcon 2.0: An entity and relation linking tool over Wikidata,” CIKM '20, Proc. 29th ACM Int. Conf. Inf. Knowl. Manag., 2020, pp. 3141–3148, doi: 10.1145/3340531.3412777.
C. Unger et al., “Template-based question answering over RDF data,” WWW ’12, Proc. 21st Int. Conf. World Wide Web, 2012, pp. 639–648, doi: 10.1145/2187836.2187923.
K. Xu, S. Zhang, Y. Feng, and D. Zhao, “Answering natural language questions via phrasal semantic parsing,” Natural Lang. Process. Chin. Comput., 2014, pp. 333–344, doi: 10.1007/978-3-662-45924-9_30.
(2024) The Wikimedia website. [Online], https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.nt.gz, access date: 15-Jan-2024.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2018, arXiv: 1810.04805.
(2024) The Hugging Face website. [Online], https://huggingface.co/bert-base-cased, access date: 15-Jan-2024.
T. Wolf et al., “HuggingFace’s transformers: State-of-the-art natural language processing,” 2019, arXiv: 1910.03771.
D. Cer et al., “Universal sentence encoder for English,” Proc. 2018 Conf. Empir. Methods Natural Lang. Process., Syst. Demonstr., 2018, pp. 169–174, doi: 10.18653/v1/d18-2029.
A. Bordes, N. Usunier, S. Chopra, and J. Weston, “Large-scale simple question answering with memory networks,” 2015, arXiv: 1506.02075.
D. Diefenbach, T.P. Tanon, K. Singh, and P. Maret, “Question answering benchmarks for Wikidata,” Proc. ISWC 2017 Posters Demonstr. Ind. Tracks co-located with 16th Int. Semant. Web Conf. (ISWC 2017), 2017, pp. 1–4.
M. Dubey, D. Banerjee, A. Abdelkawi, and J. Lehmann, “LC-QuAD 2.0: A large dataset for complex question answering over Wikidata and DBpedia,” Semant. Web – ISWC 2019, 2019, pp. 69–78, doi: 10.1007/978-3-030-30796-7_5.
M. Farda-Sarbas and C. Müller-Birn, “Wikidata from a research perspective - A systematic mapping study of Wikidata,” 2019, arXiv: 1908.11153.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.