Comparison of SVM and LIWC for Sentiment Analysis of SARA

SARA is a sensitive issue based on sentiments about self-identity regarding ancestry, religion, nationality or ethnicity. The impact of the issue of SARA is conflict between groups that leads to hatred and division. SARA issues are widely spread through social media, especially Twitter. To overcome the problem of SARA, it is necessary to develop an effective method to filter negative SARA. This study aims to analyze Indonesian-language tweets and determine whether the tweet contains positive or negative SARA or does not contain SARA (neutral). Machine learning (i.e., SVM) and lexicon-based method (i.e., LIWC) were compared based on 450 tweet data to determine the best approach for each sentiment (positive, negative, and neutral). The best evaluation results are shown in the negative SARA classification using SVM with λ = 3 and γ = 0.1, where Precision = 0.9, Recall = 0.6, and F1-Score = 0.72. The best results from the positive SARA classification were shown in the LIWC method, where Precision = 0.6, Recall = 0.8, and F1-Score = 0.69. The best evaluation results for neutral classification are shown in SVM with λ = 3 and γ = 0.1, with Precision = 0.52, Recall = 0.87, and F1-Score =


INTRODUCTION
SARA are the sensitive issues based on sentiments about self-identity regarding descent, religion, nationality or ethnicity and class. The impact of the issue of SARA is conflict between groups that leads to hatred and division. SARA issues are widely spread through social media, especially Twitter. To overcome SARA problems, it is necessary to develop an effective method to filter out negative SARA.
SARA can be divided into three categories: 1) individual SARA with actions that offend, harass, discriminate, or insult other groups; 2) Institutional SARA with actions through rules or policies that are discriminatory against a group; and 3) Culture SARA with spreading discriminatory traditions or ideas between groups.
This research was based on tweet data about individual SARA. There are several approaches to analyzing SARA issues of Indonesian tweets [1][2][3]. [1] used LIWC to analyze three classes of SARA sentiments (i.e., negative, neutral, and positive). The results showed that the average values of Precision, Recall, and F1-Score were 69.62%, 70%, and 69.81%, respectively. [2] analyzed the two classes of SARA sentiments (i.e., negative, and positive) using the Improved-KNN method. This study shows good results in term of Precision, Recall and F1-Score of 0.976422, 1, and 0.987944444, respectively. SARA sentiment analysis was also carried out by [3] using a combination of k-means and SVM. SVM was used to classify two classes of SARA sentiments. The results obtained based on k-means include 118 positive SARA tweets and 83 negative SARA tweets where the results showed Precision and Recall values of 64.18% and 63.68%, respectively.
Quite a lot of research on sentiment analysis using the Support Vector Machine (SVM) method [3][4][5][6]. [4] compared the SVM and Naive Bayes methods to analyze three classes of Covid-19 sentiment (negative, neutral, and positive) of Indonesian tweets. This study showed that SVM results were better than Naive-Bayes method, where the average F1-Scores were 93% and 92%, respectively. [5] analyzed two classes of sentiments about radical content (i.e., positive and negative) of Indonesian tweets using SVM method with a polynomial degree 2 kernel. The best accuracy value from this study was 70% with λ = 0.1 and γ = 0.1. [6] analyzed the performance of the SVM method for sentiment analysis. This study uses two tweet datasets (about self-driving cars and about apple products). The results showed that the average Precision, Recall and F1-Score values were 55.8%, 59.9% and 57.2%, respectively, for the first dataset, and 70.2%, 71.2% and 69.9, respectively, for the second dataset.
There were very few studies using Linguistic Inquiry and Word Count (LIWC) for sentiment analysis [1,7,8]. [7] applied LIWC as a booster feature for improving machine learning performance for analyzing sentiment of hate speech in the English and Spanish. The result showed improvement of the baseline machine learning method without LIWC feature, with the average of accuracy are 0.7 and 0.8 for the English and Spanish corpora, respectively.
[8] compared the LIWC and some machine learning methods on different social media datasets of positive and negative sentiment reviews. The results showed that LIWC performed poorly compared with machine learning.
However, to the best of my knowledge, comparative studies between machine learning and lexicon-based methods in sentiment analysis especially for SARA (i.e., ancestry, religion, nationality or ethnicity) were not yet available. Most of the comparative studies conducted for sentiment analysis but not for SARA were comparisons between several machine learningbased methods [9,10,11]. Moreover, most of the sentiment analysis studies for SARA only classify sentiment into two classes, namely positive and negative [2,3].
In this research, we compared the machine learning (i.e., SVM) and the lexicon-based method (i.e., LIWC) for analyzing three classes of SARA sentiment (i.e., negative, neutral, and positive) of Indonesian tweets. Some experiments have been conducted to select the best SVM model. Moreover, some new lexicon has been added to improve the performance of LIWC method.

Collecting and Labeling Data 2.1.1 Collecting Data
The data collection process was carried out by collecting data on Indonesian-language tweets. Tweet data used is primary data obtained by using the TwitterAPI library 'tweepy' in Python. Tweet data retrieval is done using specific keywords of ethnicity and religion. The tweet data collected are those that contain SARA issues, negative and positive sentiments and those that do not contain SARA issues. Examples of tweet data collected are shown in Table 1, Table 2, and Table 3.

Labeling Data
The data collected is 450 tweets with a composition of 150 data for each class. Data collection was carried out for four weeks starting from 25-02-2021 to 25-03-2021. The data obtained were selected and stored in a file with .csv extension with 2 columns, the first column is to store the sentiment class (positive, neutral, and negative) and the second column is to store the tweet text. The results of the tweets that have been obtained were labeled into three classes by two labelers. To validate and determine data consistency, Cohen's Kappa test was performed [12]. From the tests that have been carried out by 2 labelers, a Kappa value of 0.8 was obtained that means the level of approval of data labeling was said to be "strong".
About 90% of the total data, which is 405, is used in the process of selecting the best model in the SVM classification method using 5-fold cross validation, and 45 data or 10% of the total data is used in the process of evaluating the best model of SVM and the LIWC method.

Feature extraction
This study uses the Bag-of-Word model, using vocabulary terms from the tweet data set. LIWC only uses the word feature. On the other hand, SVM uses TF-IDF weights as a feature.

Preprocessing
The preprocessing stages carried out in this study were tokenization, case folding, data cleaning, stopword removal, and ended with stemming [13].

TF-IDF weighting
Based on [13], the term weighting (Wft,d) began by calculating the term frequency (tf). To reduce the value of terms that often appear in many documents, the inverse-document frequency (idf) value was calculated, where N is the total number of documents. Furthermore, the term weighting was carried out by calculating the TF-IDF. The tf, idf, and Wft,d are calculated as follows [13],

SVM
SVM is a linear classification method. The main principle of SVM in classifying is to determine a separator in the search space that can separate different classes which is commonly called a hyperplane. One of the advantages of the SVM method is that it is quite good at classifying high-dimensional data because this method tries to determine the optimal direction of discrimination in the feature space by checking the right combination of features [14].

Training
In the training process, the combination of model parameters, namely C (slack variable), d (degree polynomial), γ (learning rate), and λ (error control) is determined manually, and adjusted to get the optimal value (the best SVM model) via k-fold validation. The SVM kernel used in this study is a polynomial kernel with the following equation [15].
With K(xi, xj) is kernel fuction, xi is i-th data, xj is j-th data. The steps of the SVM method are based on equation (5) equation (10), as follows [15].
a. Initiation of parameters used. b. Calculate the Hessian matrix.
With Dij is Hessian matrix value, yi is i-th class, yj is j-th class. c. Starting from the 1 st data to the n th data, do the calculation iterations.
With ε is error value and αi is support vector. d. From the previous calculation, the largest value of αi is sought and calculations are carried out to determine the bias.
with b is for bias value.
e. Finally, the classification model is defined as follow, with f(x) is the predicted label.

Testing
The testing process is carried out to validate the classification model. In this study, validation was carried out by 5-fold cross validation using 405 datasets. The actual label is compared with the predicted label shown in equation (10). The best model was selected based on the F1-Score value. The best SVM model is the highest F1-Score among some combinations of d (degree polynomial), γ (learning rate), and λ (error control).
The F1-Measure is calculated based on the coincidence matrix [16]. Table 3 shows the coincidence matrix for the classification of the three classes. The F1-Measure formula based on

LIWC
LIWC is a text analysis application developed with the aim of analyzing the emotional, cognitive, and structural components of a text [17]. LIWC works by searching for each word in the text and matching it to a word in the lexicon. The lexicon contains words that fall into categories that reflect the word linguistically, psychologically, and socially, such as pronouns, positive emotions, social processes, and so on. LIWC adds a word category percentage value if a category is found [17]. If all the words in the text document have been categorized, the results will be displayed in the form of a table containing the category percentage values for the text document [8]. The steps taken to perform sentiment analysis using LIWC were as follows [17]. 1. Read all terms in the document 2. Count the frequency of terms that are part of each class label defined in the dictionary (positive and negative) 3. Calculate the class ratio with equation (14).
With Rk is the class ratio k (positive or negative), nt,k is frequency of term t in document and term t is a member of class k in dictionary, and Nd is frequency of the total term in the document. 4. Determine the class label, if class ratio positive > negative then class = 1, if class ratio negative > positive then class = -1, and if class ratio positive = negative then class = 0. In this research, the LIWC lexicon used was obtained from the translation of Liu's opinion word list by Wahyu & Azhari [18,19], which contains 1182 positive words and 2402 negative words. The dictionary was then modified by adding new words so that it contained 1256 positive words and 2463 negative words.

Evaluation
Some experiments were conducted to evaluate the performance of the SVM and LIWC method using three evaluation metrics which are Precision, Recall, and F1-Measure measured by equation (11), (12), and (13), respectively.

Model Selection of the SVM Method
The model selection was done by doing a 5-fold cross validation. The process started by using d (polynomial degree) = 1 and γ (learning rate) = 0.1. Table 5 presents the F-1 Score for each fold and its average value based on changes in λ (error control). As can be seen, the highest average F1-Score (= 0.6477) was obtained at λ = 3. Furthermore, the process was continued by using d = 1 and λ = 3, to see the effect of changes in the value of γ. Table 6 shows the F-1 score for each fold and its average value. The highest average F1-Score (=0.6477) was obtained when γ = 0.1. However, experiments conducted for the SVM with degree 2 polynomial kernel (d = 2) resulted in poor performance. The best degree 2 polynomial SVM model was obtained using λ = 3 and γ = 0.1, which resulted F1-Score of 0.5381.

The SVM Evaluation Results
The best SVM model was SVM with d = 1, γ = 0.1 and λ = 3, evaluated based on 45 test datasets. Table 7 presents the evaluation results of Precision, Recall, and F1-Score for each class label.  As can be seen, the negative class Precision was the highest value (= 0.9), meaning that about 90% of the negative SARA tweets obtained were correct. However, the negative class Recall was low (= 0.6), meaning that only 60% of the negative SARA tweets were taken. High neutral class Recall (= 0.87), meaning that it obtained about 87% non-SARA tweets, but low Precision (0.52), this indicates that only 52% of non-SARA tweets obtained were correct. In addition, the Precision of positive class was also high (0.8), meaning that 80% of the positive SARA tweets obtained were correct, but only 53% positive SARA tweets were captured (Recall = 0.53).
The overall results of SVM's performance in the sentiment analysis of Indonesianlanguage tweets contained SARA issues were not good in terms of F1-Score. The average F1-Score values for positive, neutral, and negative sentiment were 0.64, 0.65, and 0.72, respectively. Poor SVM performance is likely due to the data features used in the SVM method. In this study, text data was only represented by the document terms from the TF-IDF weight matrix. It might be better to take a semantic representation approach, for example by incorporating the LIWC lexicon as a feature.

Lexicon Modification of the LIWC Method
To determine the effect of the lexicon on the LIWC classification, a comparison of the results of the classification between LIWC with the lexicon before modification and with the lexicon after modification was carried out. Table 8 and Table 9 respectively show the results of the evaluation before and after the modification. As can be seen, there were some improvements in terms of F1-Score for each class label. The highest increase occurred in the positive class. F1-Score increased from 0.5830 to 0.69 or increased by 18.35%. F1-Score for the neutral class showed an increase of 17.65%. For the negative class, there was also an increase of 13.17% in term of F1-Score.
Similar to the performance of the SVM method, the overall results of LIWC showed poor performance, where the average value of F1-Score was 0.64. This value was worse than the SVM result.
Improving the performance of the LIWC method by adding several new lexicon provides a possible way how to improve the performance of sentiment analysis by progressively adding new relevant lexicon for each sentiment class.  In this study, a comparison between machine learning-based methods (i.e., SVM) and lexicon-based methods (i.e., LIWC) for analyzing the sentiment of Indonesian-language tweets showed that machine learning was better, but both approaches perform poorly overall. The best evaluation results were shown in the negative SARA classification with polynomials degree 1 SVM, λ = 3 and γ = 0.1, where Precision = 0.9, Recall = 0.6, and F1-Score = 0.72. The best results from the positive SARA classification test were shown in the LIWC method, where Precision = 0.6, Recall = 0.8, and F1-Score = 0.69. The best evaluation results for neutral classification are shown on polynomials degree 1 SVM, λ = 3 and γ = 0.1, with Precision = 0.52, Recall = 0.87, and F1-Score = 0.65.
Poor SVM performance was likely due to the data features used in the SVM method. In this study, text data was only represented by the document terms from the TF-IDF weight matrix. It might be better to take a semantic representation approach, for example by incorporating the LIWC lexicon as a feature.
On the other hand, improving the performance of the LIWC method by adding several new lexicon provides a possible way how to improve the performance of sentiment analysis by progressively adding new relevant lexicon for each sentiment class.