Attention-Based BiLSTM For Negation Handling In Sentimen Analysis

Research on sentiment analysis in recent years has increased. However, in sentiment analysis research there are still few ideas about the handling of negation, one of which is in the Indonesian sentence. This results in sentences that contain elements of the word negation have not found the exact polarity. The purpose of this research is to analyze the effect of the negation word in Indonesian. Based on positive, neutral and negative classes, using attention-based Long Short Term Memory and word2vec feature extraction method with continuous bag-of-word (CBOW) architecture. The dataset used is data from Twitter. Model performance is seen in the accuracy value. The use of word2vec with CBOW architecture and the addition of layer attention to the Long Short Term Memory (LSTM) and Bidirectional Long Short Term Memory (BiLSTM) methods obtained an accuracy of 78.16% and for BiLSTM resulted in an accuracy of 79.68%. whereas in the FSW algorithm is 73.50% and FWL 73.79%. It can be concluded that attention based BiLSTM has the highest accuracy, but the addition of layer attention in the Long Short Term Memory method is not too significant for negation handling. because the addition of the attention layer cannot determine the words that you want to pay attention to.

Sentiment analysis is a field of study that analyzes a opinions, sentiments, evaluations, attitudes and emotions from written language [1]. The are many that have applied sentiment analysis to review. Research by [2] utilizes a dictionary-based method (lexicon based) to research sentiment analysis. From this research also have problems because word of negation cannot be handled. The handling of the word negation in Indonesian is still under consideration in the sentiment analysis. Negation has a big impact on sentiment analysis, if left untreated it can affect the polarity value [3].
Research on negation handling for Indonesian tweet by [4]. In this study, using algorithms First Sentiment Word (FSW) and Fixed Window Length (FWL) . However, this research using the dictionary as its base. The more complete the sentiment dictionary you have, the more sentiment words that can be detected.
Neural networks can achieve this important word using attention, focusing on part of a subset with information they're given [5].
[6] Using attention to handle negation words in Electronic Health Center (EHR) data. Attention mechanism that is combined in the Bidirectional Long Short Term Memory (BiLSTM) model is called attention-based BiLSTM.This shows that the attention based BiLSTM method is appropriate for classification of text.
For these reasons, in this study, we will discuss the handling of negation in sentimen analysis which has previously been studied by [4] using FSW and FWL algorithm. This study, the BiLSTM method is used which refers to [6]. While for the data used from [4]. Then the result will be compared between the fsw and fwl algorithm with attention in the BiLSTM method.

Architecture System
In this research the system architecture that will be built has for parts, they are data collection, preprocessing, feature axtraction, the last classification and evaluation systems. The process can be seen in Figure 1 and

Data Collection
The data of reviews tweet used in this research uses a dataset from research [4]. Dataset used Indonesian Language tweet data that have been labelled as positive, neutral and negative.

Preprocessing
Preprocessing is very important in sentiment analysis, because preprocessing manage data to get data that is clean to be processed in the making word vectors and sentiment classification were more accurate [7]. As for the step in preprocessing is a) Case folding, b) Filtering, c) Tokenizing, d) slangwords convertion, and e) stopword removal. For example of the preprocessing process can be seen in Table 1: An explanation of each step can be seeb below :

Casefolding
Not all text documents are letter-consistent, so in this process can be change the letter characters in the comment to all lowercase characters.

Filtering
In this process adjustments are made by removing special characters and reviews such oter characters ($,%,*, and so on). This process will be eliminates words that do not match the parsed results. For example usernames that start with the symbol "@", hashtags "#", Uniform Resource Locator (URL) and emoticons. Signs, symbol or numbers are omitter because they have just a little effect on labelling process.

Tokenizing
Tokenizing servers to break the review down into word units. The tokenizing process is carried out by looking at every space in the review. Based on these spaces the words can be separated.

Slangword convertion
Slangword conversion is the process of converting non-standard words into standard words. This stage is carried out using the help of a slangword dictionary and its equivalent in standard words. This stage will check the words contained in the slang word dictionary or not. If a nonstandard word is found in the slang word dictionary, the nonstandard word will be converted to the standard word found in the slang word dictionary.

Stopword Removal
This stage serves to eliminate words that have no influence (which, and, or, to, from, etc.) in the later classification process.

Sentence Conversion
The steps in this process creating a word dictionary, converting sentences into numbers, and padding. The results of this sentence conversion process will be used as input to the BiLSTM method. The first process is making a word dictionary that is used to provide the word is contained in a sentence in the tweet data that has gone through a preprocessing process

Word2vec
Word2Vec, developed by Thomas Mikolov, is an implementation of artificial neural networks that can process words from very large datasets in a relatively short time. This tool works by taking a corpus of text as input, then produces a vector representation of each word in the corpus as output [8]. There are two Word2Vec modeling architectures that can be used to represent word vectors, the architecture is Continuous Bag-of-word (CBOW) and Skip-gram [9]. In this study used Continuous bag of words (CBOW) architecture and skipgram. Vector size of 200 dimensions, this refers to [6].

Attention based bidirectional LSTM
First, build BiLSTM as a model. Taking word embedding as input. This layer will be change the positive integer index in the input into a fixed-size vector based on the vector dimensions of the word dictionary based in word2vec model. In LSTM, this layer determines the previous input, whether it can pass in the cell state or not. What determines the data can be continued or not is the sigmoid layer called "forget gate" . Output 1 means "let pass" and 0 means "forget information" [10]. The calculation ot the forget gate value was with equation (1).
For the next step was to determine the new information that were going to be stored in the cell state. First sigmoid layer was called the input gate which determined which part to update. Then, the tanh layer created a new candidate value vector, to be added to the cell state. And the next step, the two were combiner to make an update to the state. To calculate the input gate value with equation (2)  Next step is output gate. First, runing the sigmoid layer which determined which cell would be the output, then place the cell state through the tanh and increased the output of the sigmoid gate. So that only the part we specified was the output. Calculation of output gate with equation (5) and (6).
All hidden states are fed into a subsequent attention layer [11]. We added attention layer because not all words contribute equally to the negation detection. The normalized word weight is obtained through a softmax function equation (8). The aggregate of all information in the sentence v is the weighted sum of each ht with , as corresponding weights.

= ∑ ℎ (9)
This vector v is then fed to a fully connected layer with softmax activation to perform the final classification. The prediction is a vector y ∈ R2 with the probabilities of being positive, netral or negative. The model architecture is shown in Figure 3 and

RESULTS AND DISCUSSION
This section discusses the results of sentiment classification test from the model that had been built. Sentiment classification test was done by measuring the values of accuracy, precicion, recall and f1 score.
Total tweet rivew data used were 612 data for positive, neutral and negative sentiment. The training data used was 80% of the total data. While 20% of the total data was used as test data.
Classification test was done by measuring the value of accuracy, precision, recall and f1 score obtained by comparing each review with the results of the calculation of the attention based Bidirectional Long Short Term Memory method carried out by the system. There were two architecture of the attention based Long Short Term Memory method tested, attention based LSTM and attention based BiLSTM which were compared with LSTM, BILSTM model and First sentiment window and fixed window length algoritm.

Attention based LSTM Classification Test
The classification results of the calculation sentiment classification using the attention based LSTM method is shown in Table 2.

Attention based BiLSTM Classification Test
While overall in the test result from the calculation of sentiment classification using the attention based BiLSTM method showed that the accuracy value was better than the attention based LSTM. The overall accuracy value was 79.68%. This means that by increasing added BiLSTM model, it can increase the results of sentiment classification accuracy. The results sentiemn classification using the attention based BiLSTM method are shown in Table 3.

LSTM Classification Test
To see the attention gain performance with LSTM model, we will compare it with LSTM model without attention model. And the result can be seen in Table 4.  Table 4, it shows that the best accuracy result in the overall test of the LSTM method was 75.60% with the best test parameters of CBOW architecture, 200 neurons each layer, 200 epoch, L2 regularization of 0.00001 and softmax activation function.

BiLSTM Classification Test
In the BiLSTM classification test using same parameters with LSTM classification, and can be seen in Table 5.

FSW and FWL algorithm Test
The FSW and FWL uses the same data as the data in the attention based BiLSTM/LSTM method. The data had also gone through the same preprocessing process while the feature extraction used was TF (Term-Frequency). Furthermore. The results of the FSW and FWL Algoritm can be seen in Table 6.

Comparation of Accuracy Results
Comparation of the accuracy result from the calculation of sentiment classification using attention based LSTM, attention based BiLSTM and FSW and FWL algorithm is showed in Table 7. The classification of sentiment classification using the attention based BiLSTM method had a better value of accuracy than the attention based LSTM, BiLSTM or LSTM and FSW and FWL algorithm. Because in the attention just focus on capturing important word will then be reweighted can get maximum result. This study focuses on the use of attention to handle negation, but after doing this research it can be concluded that the addition of the attention layer is not to significant for dealing with negation words, because it is important to choose words randomly.

Result of Prediction
In this part we can seen the result of true or false prediction. For details we can be seen Figure 5 and Figure 6. 710. the weet is in accordance with its actual value. This also applies to reviews of neutral tweet and negative tweets.

CONCLUSIONS
Based on study, it can be concluded that: 1. The addition of layer attention to the Long Short Term Memory method is not significant for the handling of negation words, because the addition of layer attention cannot determine the words that you want to pay attention to. So that the words you want to pay attention to are obtained in the training process. 2. Attention based BiLSTM method produces more accuracy, namely 79.68% compared to attention based LSTM 78.15%, BiLSTM 76.87%, LSTM 75.60% and FSW 67.32%, FWL 68.79%.

SUGGESTION
1. There are still some weakness in this study that can be improved. Some suggestions for tuther research are as follows: 2. Try another method for handling negation words, because if you only add a layer of attention to the neural netowork, the negation word cannot be handled. 3. Using other feature extraction methods, you can uses other methods such as FastText and Glove. 4. If the research is in the form of handling negation, it is better if the dataset used is data tat contains a lot of negation sentences in order to see wethert the method can handle negation sentences.