Chatbot in Bahasa Indonesia Using NLP to Provide Banking Information

FAQs are mostly provided on the company's website to inform their service and product. It's just that the FAQ is usually less interactive and presents too much information that is less practical. Chatbot can be used as an alternative in providing FAQ. In this study, chatbots were developed for BTPN in providing information about their products, namely Jenius. Chatbot developed utilizes natural language processing so that the system can understand user queries in the form of natural language. The cosine similarity algorithm is used to find similarities between queries and patterns in the knowledge base. Patterns with the highest cosine values are considered to be most similar to user queries so they can be used as a response to user queries. It's just that, this algorithm does not pay attention to the structure of the sentence so that it adds checking the structure of the sentence with the parse tree to give weight to the pattern. This chatbot application has been tested by 10 users and it was found that the suitability of the answers with user input was 84%. Therefore the chatbot developed can be used by BTPN to provide Jenius product information to consumers more interactively and practically.


INTRODUCTION
The world wide web technology has grown rapidly allowing a revolution in terms of information exchange. One of them is the provision of FAQs on the company's website. FAQs usually display information in the form of questions and answers about the services and products that the company has. It's just that, the FAQ usually contains too much information because it covers all product information in detail. This makes access to information less interactive and practical. In addition, users sometimes have to access all pages to get the information they need, which is, it takes time.
One alternative made by the company is to provide online question and answer services, like chatbox. However, this kind of feature requires more employees to answer each user's question. The problem arises when the number of users who want to get information is very large, but the limited number of employees causes many user questions to be missed. So that automation is needed in answering user questions regarding the information needed. Therefore, chatbots can be used as an alternative to overcome this problem.
Chatbot has been widely used in many ways, including in providing entertainment, education, tourism, and so on. Chatbot can be used as a tool for learning new languages, tools for accessing information systems, tools for visualizing corpus content, and tools for answering questions on certain domains and can be trained in different languages.
In the banking sector, several studies were conducted to develop chatbots in providing bank information. One chatbot was developed in banking using natural language processing [1][2] [3]. The dataset used is the FAQ data obtained from a banking website in India. The application uses rule-based and pattern-based techniques where NLP is used to process user queries. Chatbot is also used in other fields, such as providing tutors for students [4], counseling services [5], modular knowledge services [6], providers of humor [7], and so on.
In the implementation of chatbot, so that the system can respond to user queries more dynamically, the use of natural language processing plays an important role, namely in understanding user queries in natural languages. Therefore, it is necessary to use an algorithm to find query proximity with patterns in the database, such as the cosine similarity algorithm. Many cosine similarity algorithms are used to find the value of proximity between documents [8] [9]. It's just that this algorithm does not consider the position of tokens in the sentence so that patterns with different sentence structures can have the same cosine value. Studies to check the structure of language were previously conducted to see the structure of the sentence[10] [11]. In this paper, the cosine algorithm will be added by checking sentence structure.
BTPN is one of the companies that provides a FAQ on its website that contains products owned, namely Jenius. Jenius is a digital banking application that can be enjoyed through smartphone devices. At present, to access FAQ information, customers must move from Jenius application to browser. This is considered less practical. In addition, users sometimes have to queue when they as the customer service by chat. For this reason, it is necessary to provide automatic product information services to users through the application of chatbots.
With the use of this chatbot, consumers can more quickly get the information needed rather than having to move to the browser to read all the information on the FAQ page which is quite long or must come to the bank and meet customer service to ask for the information needed. Thus, the system only displays what information the user needs.
This paper tries to contribute to the development of chatbots in presenting information about banking products which were previously presented in the form of a FAQ system to be more interactive and practical. The built-in chatbot supports Bahasa Indonesia in reading user queries and displays responses that are more in line with user expectations.

1 Knowledge Base Preparation
In this research, the data used as a knowledge base is obtained from the FAQ on the BTPN website about the Jenius application. The data needs to be processed before it can be used to respond to user queries. It is also intended to increase system performance because the system does not need to pre-process the data against patterns in the database. The FAQ data used includes one of the Jenius features, namely Dreamsaver. The FAQ data consists of 20 sentence questions and answers about the Dreamsaver feature. Data is stored in files with the extension .aiml and preprocessed using Natural Language Processing (NLP). Preprocessing with NLP conducted includes: 1. Tokenisation User queries will be identified and broken down into tokens. At this stage, punctuation such as dots, commas, question marks will be omitted.

Slang Word Checking
Tokens and then check the slang dictionary to see whether the word is a slang word and whether it can be replaced into a more standard Indonesian word or can be ignored, for example, the slang word for Bahasa Indonesia: sih, dong, deh, and so on.

Morphology Checking
Tokens are analyzed morphologically to get the basic words and types of words.
Morphological examination is carried out by removing the prefix, suffix, prefixes, and repetitions as shown in Figure 1 to get the basic word from each token.

2 User Query Processing
The process through which the query is passed until the response is seen is shown in Figure 2. In Figure 2, the query is carried out by preprocessing first, then a pattern search is performed which is most similar to the query using the cosine similarity and parse tree algorithms.

2.1 Query Preprocessing
The preprocessing stages for user query is the same as in the knowledge base preparation stage, the user query will be performed: tokenisation, slang word checking and morphological checking. For example, the following query:

user: Gimana sih cara bikin akun Jenius?
It will follow the tokenization process to: gimana, sih, cara, bikin, akun, jenius. These tokens will be checked in the dictionary slang and omitted or replaced with a more standard word if it is a slang word. After that the basic word and type of words are searched through checking morphology. From this preprocess, preprocess results are obtained: bagaimana (WH) cara (NN) buat (VB) akun (NN) jenius (NN).

2.2 Pattern Matching
The results of the query preprocessing are then matched with the patterns stored in the knowledge base. If the exact pattern found with the results of the pre-process query is found, the IJCCS ISSN (print): 1978-1520, ISSN (online): 2460-7258  Chatbot in Bahasa Indonesia Using NLP to Provide Banking Information (Abidah Elcholiqi) 95 system will generate a response according to that pattern. But if there is no similar pattern, then the process is followed by a search pattern that is most similar to the cosine similarity algorithm.

2.3 Similarity Pattern Seacrh with Cosine Similarity
In this paper, matching the pattern similarity using TF-IDF weighting and Cosine Similarity. In this process, patterns containing tokens that are present in the preprocess results will be included in the TF-IDF weighting process. All tokens that are in the preprocess results and patterns in the knowledge base will be calculated TF and IDF for each token by following equations (1) and (2)  The process is carried out on other tokens / terms in the same way. After obtaining the weight of each term in the pattern, then the value of proximity between the patterns is calculated using the Cosine Similarity algorithm according to equation (3)  Calculation results vector Q and K lengths are shown in Table 2: The results of calculating the total weight of K with Q are shown in Table 3:  Of the 4 patterns compared to Q queries, K4 has the highest cosine value. However, this study added weighting with the parse tree to find the closeness value of the parsing tree structure between the query and the patterns in the knowledge base.

2.4 Sentence Structure Checking with Parse Tree
Parsing on NLP is very useful in understanding the meaning of human language by looking at language grammar [11]. Parsing will form a parsing tree so we can deduce the meaning of the sentence. So in the example above, if the grammar rule follows Figure 3. The results of checking the parse tree structure for Q queries are shown in Figure 4 as follows: Figure 4 Example of Parse Tree of Query Q So that the tree structure Q to K when compared is shown in Table 4:  Then, in the example query "Bagaimana cara membuat akun jenius", the system will display a response from the K4 pattern because it has the highest similarity value. In this research, the patterns are stored in the form of a .aim file as shown in Figure 5. The pattern is processed and the preprocess results are stored in the form of a .json file.

Figure 5 Example of Pattern Stored in Knowledge Base
Thus when a user enters a query, the system can directly use the preprocessed results stored in the knowledge base. The results of the chatbot implementation are shown in Figure 6 as follows: Figure 6 Chatbot Design Implementations The system was tested by 10 testers with the results as in table 5. From the test results, it was found that the system still had an error rate of 16% : a. The user feels that he has entered a specific query, but the system responds to the question option so that the user selects one of these options. b. Response is not in line with user expectations. c. The system returns an incorrect response because the answer to the user's query is not yet available From user queries that are not responded to according to user expectations, such as caused by several patterns having the same cosine value and the same parse tree weight so that the system displays question options for users. In some queries also found a sentence structure that cannot be handled with grammar rules stored in the database so that the system cannot complete the calculation of the proximity of the pattern correctly. In addition, there are also user queries whose patterns are not yet available in the knowledge base, in this case it can be caused by the use of words that are not understood by the system or slang words that are not handled properly so that the results of cosine similarity calculations are not good. In terms of system performance, testing by running as many as 28 queries on the system can be completed in 2.23 seconds with the average query being responded to less than 20 milliseconds. However, for reasons of user experience so that the chatbot is not too fast in displaying responses to users, each query on the chatbot application adds a loading time of 1-3 seconds depending on the response length. This way, user will feel like they are chatting with other human, not a computer.

CONCLUSIONS
Based on the research and test results, it can be concluded that the chatbot that was developed was able to provide a response to user queries with a response rate of 84%. The response mismatch of 16% is caused by the limited number of vocabulary slang and a combination of patterns with different sentence structures so that some queries produce the same value of closeness patterns, as well as answers that are expected to be unavailable. The system responds to each user query on average less than 20 milliseconds and displays a response of about 1-3 seconds after adding the waiting time depending on the response length.

FUTURE WORKS
Suggestions that can be used for further research, especially in research related to the development of chatbot applications is the need to enrich the dictionary of words, both Indonesian dictionaries and slang dictionaries. In addition, it is also necessary to enrich the existing patterns in the knowledge base with a combination of different patterns and sentence structures so as to increase the possibility of a better response.