A Support Vector Machine-Firefly Algorithm for Movie Opinion Data Classification

Analisis


INTRODUCTION
Classification is the process of grouping many data into classes that have been determined and given according to the similarity of the characteristics and patterns contained in these words [1]. Classification is mostly done in various case studies including tone recognition, text categorization, image classification, protein structure prediction, data/document classification, etc [2]. There are many classification methods including Naive Bayes, Support Vector Machine (SVM), Maximum Entropy, J48 [3]. The classification in this study is used to distinguish between the data entered in the positive class and the negative class. Research conducted by [4], comparing three classification methods namely Naive Bayes (NB), maximum entropy and Support Vector Machine (SVM). The best result is SVM.
SVM is a machine learning technique that relies on the concept of statistical learning. SVM has good generalization capabilities in cases with small samples [5]. But on the other hand, SVM cannot choose the appropriate parameters so that the use of parameters is not optimal. Using the appropriate parameters is expected to increase the accuracy of SVM [3] [5]. The importance of using the appropriate parameter values also explained by [6] is said that the success of the SVM model depends on the soft-margin coefficient C, as well as the parameters of the kernel function. So, choosing optimal parameters for SVM is one important step when using SVM as a classification method. From the SVM shortcomings presented by [5], various SVM methods developed, including Support Vector Machine and Particle Swarm Optimization (SVM-PSO) [5] [3], Firefly Algorithm and Support Vector Machine (FA-SVM), Accelerated Particle Swarm Optimization and Support Vector Machine (APSO-SVM) [7] and so on. The research conducted by [7] used heart, diabetes, liver, iris, and cancer datasets using the PSO-SVM, APSO-SVM, and FA-SVM algorithms obtained the highest accuracy results from the combination of the FA-SVM algorithm. Based on the research conducted by [7], the authors are interested in classifying movie opinion data using the FA-SVM method, then the resulting accuracy will be compared with the SVM method without parameter optimization.
FA-SVM is an SVM classification method combined with the Firefly Algorithm (FA). FA is an optimization method based on flashing patterns and firefly behavior [8]. Whereas SVM is a machine learning technique that relies on the concept of statistical learning [5].

Research Design
The research design of movie opinion data classification using SVM and FA-SVM method which is built in general consists of four stages, namely data preprocessing, weighting, SVM classification, and FA-SVM classification. Preprocessing is the stage to produce a collection of words that are ready to be processed and used as input at a later stage. Weighting is the process of assigning weight values to each term. The weighting process uses TF-IDF. The SVM classification is the process of classification of Indonesian language movie opinion data into two classes, namely negative class or positive class. The FA-SVM classification is a process of classification of Indonesian opinion movie data using the SVM model, but before the classification process is carried out the search process for the combination of SVM parameter values is done beforehand. The search for SVM parameter combination aims to get the highest accuracy value. The parameters to be optimized are C and σ. Parameter C is a parameter used to control SVM in controlling errors [9]. While the σ parameter is an SVM parameter that is used to find the optimal value in each dataset [10]. The method used in searching SVM parameters is Firefly Algorithm (FA).
Classifier performance evaluation is conducted to find out whether FA-SVM can be used to classify movie opinion data with shorter time. The method used for classifier performance evaluation in this study is K-Fold Cross Validation and Confusion Matrix. The cross validation  process is done by dividing the data into two segments. The first segment is used to train the model while the other segment is used to validate the model. The classifier performance evaluation process uses the Confusion Matrix method, where this method contains the actual classification and predictions made by the classification system. The performance of the system is generally evaluated using data in the matrix. Figure 1 shows the stages of classification of movie opinion data.

Figure 1 Stages of classification
The first stage of the system architecture is collecting data. Data obtained from the scraping process that has been carried out by previous researchers is [11]. The data produced by the previous researcher was preprocessing using the case folding method and normalization of features. Then in this study coupled with the tokenization process, slang word conversion, and stopword conversion. In this study coupled with the slang words and stopword conversion process because the preprocessing data results in previous studies still found errors in writing words such as "gak jelas" and many words that do not have sentiments such as "yaitu". After the preprocessing process, then the words are mapped into vector models using Bag-of-Word. After the data is in the form of a vector, then weighted by using the term Frequency-Inverse Document Frequency (TF-IDF) method.
The weighting of words in this study was carried out to determine the word weights in the movie opinion data so that we got word weights that could represent the basis of information that would be used to determine the classification of the data. After the weighting process, data is then divided. Data is divided into 2 namely training data and testing data. Training data is divided into 2 namely training data and validation data. The method used in the distribution of training data and data testing is the Splitting method, while the method for dividing training data and validation data is the K-Fold Cross Validation method. Training data is used to model SVM and FA-SVM classification. While the validation data to find the best parameters. Then the model is tested with test data which aims to measure the extent to which the classifier successfully performs the classification process. Table 1 shows examples of training data, validation data, and testing data.  The opinion data in Table 1 shows that the number of training data is nine data, validation data are three data, and the test data is three data. Then from the training data classification is done to produce a classifier model that will be used to classify the test data.

Preprocessing
Preprocessing is one of the essential steps in sentiment analysis. Data obtained from previous research only through two preprocessing processes namely case folding and normalization of features, so there are still abbreviations or typos and words that have no sentimental value. Examples of abbreviated words or typos and words that do not have sentiment values are "gak", "membosakan", "yaitu", and so on. The purpose of the preprocessing process is to get clean data so that the process of making word vectors and sentiment classifications becomes more accurate. The preprocessing method added to this study is tokenization, slang word conversion, and stopword removal conversion. Tokenization functions to break comments into units of words. The tokenization process is done by looking at each space in the comment, so based on these spaces comments can be broken down. After the tokenization process a stopword conversion is performed. Stopword is the process of deleting words that are included in the stopword list. Stopword is general words that appear in large numbers that have functions but have no meaning. Examples of stop words are "yang", "yaitu", and so on. Dictionary Stopword in this study was taken from a dictionary made by Tala on Stopwords ID. The last preprocessing is slang word conversion. Slang word conversion is the process of changing to non-standard words to standard words. This stage is done using the help of the slang word dictionary and its equivalent in standard words. This stage will check the words contained in the slang word dictionary or not. If the non-standard word is in the slang word dictionary, the non-standard word will be changed to the default word in the slang word dictionary. The author made the slang word dictionary in this study based on research conducted by [12].

Weighting
Weighting is the process of assigning weight values to each term in each document [13]. In this study the weighting method used was TF-IDF. TF-IDF is a combination of Term Frequency (TF) and Inverse Document Frequency (IDF) used in calculating the weight of each word (term) in each document. The calculation scheme of TF-IDF is shown in Equation 1. (1)

Classification with the SVM method
Classification is the process of grouping many data into classes that have been determined and given according to the similarity of the characteristics and patterns contained in these words. In general, the classification process begins with the provision of any data which are used as references to make data classification rules. The data is usually known as training sets. From the training sets, a model is then made to classify the data. The model is then used as a reference for classifying unknown data classes known as test sets [1]. The classification method used in this study is SVM. Support Vector Machine (SVM) is one of the supervised machine learning algorithms that have outstanding performance [9]. Support Vector Machine (SVM) is linear classifier based on the principle of maximizing margins. SVM uses the hyperplane optimally to classify data into two groups of data in a higher dimensional space [13]. The margin is the distance between the hyperplane and the closest data from each class. The closest data is called support vector [9]. The hyperplane is the best separator between two predetermined classes [9]. The basic principle of SVM is a linear classifier and then developed to work on non-linear problems. By incorporating the kernel trick concept in a high-dimensional workspace [14]. The SVM kernel used in this study is the RBF kernel for the transformation process from the input space into feature space.
The SVM method has the central concept in classifying data, namely finding the best hyperplane to separate between two predetermined classes [9]. The best hyperplane is obtained by maximizing the margin support vector. The process of maximizing support vector margins can be done by minimizing lagrangian and being reduced to w and b found in equation 1 with terms 1 and 2.
Terms 1: Because the value of α is unknown, the values w and b cannot be determined. The value of α is sought by maximizing the Lagrangian multiplier with the optimal conditions for its duality using the Karush-Kuhn-Tucker (KKT) constraint. The use of KKT constraints makes the Lagrange multiplier (α) value equal to the amount of training data.. The process of maximizing the Lagrangian multiplier still has many possible values of w, b, and α. Based on these problems, the maximization process of Lagrange multiplier must be transformed into Lagrange multiplier duality in equation 5 with constraints 1 and 2.
Maks Ld = ii α j y i y j x i. x j After the values of w, b, and α are obtained, then determine the label using the SVM model.
If the value of f(x) produced is f(x)>0 then the data is classified into a positive class (+1), if f(x)<0 then the data is classified into a negative class (-1).

Classification with the FA-SVM Method
The process of data classification with the FA-SVM method begins with initializing the parameters needed for the search process with the firefly algorithm, which is determining the number of firefly population (number_of_fireflies), generation (maximun_generation), initial attractiveness coefficient (β 0 ), light absorption coefficient (γ), and random parameter coefficient (α). After initializing the parameters needed, then optimize the parameters C and σ using the Firefly Algorithm method. The optimization uses several steps, namely [15] The distance between two fireflies i and j at the position of the x i and x j coordinates is the cartesian distance which is formulated as If an equation uses dimensions (d = 2) then the above equation becomes [16] r ij = (10)

Calculate Attractiveness
Attractiveness of a firefly is proportional to the intensity of light seen by other fireflies. Attractiveness is formulated by: Information: β(r) = Attractiveness of fireflies at distance r β 0 = Attractiveness at distance 0 γ = coefficient of light absorption r = distance between source fireflies and fireflies 6. Calculate the movements of fireflies Firefly movements that are attracted to fireflies j (which are brighter or have higher attractiveness) are formulated as The parameter values used in general are β 0 = 1 and α ∈ [0,1]. The randomization process can be done using a normal distribution of N (0.1) or another distribution.
After the values C and σ are obtained from each firefly then the value is used to train the data using the SVM method. After the data is trained using the SVM model, then calculate the accuracy of each C and σ produced by fireflies. Then ranking is done to determine the most optimal parameter values C and σ. Then the values of C and σ are used to model the SVM classifier then the model is tested using test data. From the test results, accuracy will be obtained based on the test data.

Testing
Classification testing is carried out using the SVM and FA-SVM methods. The dataset used in this test is the movie opinion dataset. The purpose of this test is to analyze whether the classification with the FA-SVM method produces good accuracy with a faster processing time compared to the SVM method without parameter optimization. Tests are carried out with several combinations of SVM parameters and several combinations of populations and generations.

Testing Scenarios
The first test is done by the SVM method without parameter optimization. Tests are carried out with a range of SVM parameter values C=1.0-3.0 and σ=0.1-1.0. The next test uses the range C = 1.0-3.0 and σ = 1.0-2.0. The difference used is 0.01. Experiment with some of these combinations to find out the value of a better combination based on the highest accuracy. The second test was carried out using the FA-SVM method using several combinations of population and generation with the same goal as SVM testing without optimization. The last test is to compare the processing time required by the SVM and FA-SVM methods. The comparison is done to find out whether the FA-SVM method can be used to classify movie opinion data with good accuracy results and have a faster processing time compared to SVM without parameter optimization.

Testing the SVM method without parameter optimization
Testing the SVM method is done by using the k-fold cross validation approach. The k fold cross validation process in this study uses 10 fold (k = 10) with the same partition data size and is done randomly. The data in the fold will be divided into 10 subsets so that each subset has the same size and has different data. Using 10 fold of 2179 data, the distribution of 436 data as test data and 1743 data were divided into training data. The process will be repeated as many as 10 fold with different data distribution. This test uses a combination of SVM parameters with a range of values C=1.0-3.0 and σ=0.1-1.0. The test produced 18000 combinations of C and σ with the highest accuracy of 87.84%. The processing time needed is 5928 seconds.  σ, so to obtain the value of C and σ best based on the highest accuracy requires a trial and error process.

Testing the FA-SVM method
Evaluation of the performance of the FA-SVM method is done by using the k-fold cross-validation approach. The k fold cross-validation process in this study uses 10 fold (k = 10) with the same partition data size and is done randomly. The data in the fold will be divided into 10 subsets so that each subset has the same size and has different data. Using 10 fold of 2179 data, the distribution of 436 data as test data and 1743 data were divided into training data. The process will be repeated as many as 10 folds with different data distribution. Before evaluating the FA-SVM method, we first look for SVM parameters using the Firefly Algorithm method.  The FA-SVM method is tested by finding the best C and σ values based on the highest accuracy value obtained from each Firefly. Searching with the Firefly Algorithm method produces 50 combinations of C and σ. Based on the 50 combinations there are three best combinations seen from the highest accuracy which is 87.84%. This test requires execution time of 2330 seconds. Table 2   Based on Table 2, it can be seen that the Firefly Algorithm method produces three combinations of the best C and σ values based on the highest accuracy, which is 87.84%. Accuracy results of 87.84% were obtained from the evaluation process using Confusion Matrix. The evaluation produced 266 data true positive, 117 data true negative, 39 false positive data, and 14 false negative data.
The next test still uses 436 test data but uses a different range from the previous test. The FA-SVM parameter range used is C = 1.0-3.0 and σ = 1.0-2.0. Figure 5 shows the results of the FA-SVM test with a range of parameter values C = 1.0-3.0 and σ = 1.0-2.0. The second test with a different range of values produces 50 combinations of C and σ. Based on the combination the highest accuracy is 87.15% with a value of C = 1.63 and σ = 1.08. 87.15% accuracy is obtained from the evaluation process using the Confusion Matrix. The evaluation produced 265 true positive data, 115 true negative data, 41 false positive data, and 15 false negative data. This test requires execution time of 2388 seconds.

Comparison of SVM and FA-SVM methods
The evaluation results of SVM and FA-SVM methods on 436 test data with ranges C = 1.0-3.0 and σ = 0.1-1.0 giving the highest accuracy of 87.84%. Table 3 shows the results of evaluating the SVM and FA-SVM methods. Based on the evaluation results of the SVM and FA-SVM methods in Table 3, that with the range C = 1.0-3.0 and σ = 0.1.01.0 the SVM method produces the highest accuracy of 87.84%. The FA-SVM method with a range C=1.0-3.0 and σ=0.1-1.0 also produces the best combination of C and σ with the highest accuracy of 87.84%. The process of evaluating the SVM method takes 5928 seconds. While the evaluation process using the FA-SVM method takes 2330 seconds. The SVM method takes longer than the FA-SVM method. The SVM method requires a longer execution time because the method tries every combination of C and σ with the range C=1.0-3.0 and σ=0.1−1.0. The process produces 18000 combinations of C and σ. The FA-SVM method requires a shorter time because the method does not try every combination of C and σ with a certain range. The best combination search process based on accuracy is done by using the objective function as a reference for moving towards a better point. The next evaluation uses the same amount of data but uses a different range of C and σ values. The range used is C=1.0-3.0 and σ=1.0−2.0. Tests performed on the SVM and FA-SVM methods using these ranges provide the highest accuracy results of 87.15%. Table 4 shows the evaluation results of SVM and FA-SVM with a range of parameter values C=1.0-3.0 and σ=1.0-2.0.  Table 4 shows that the SVM method and the FA-SVM method provide the highest accuracy results of 87.15%. The execution time required by the SVM method is 7205 seconds, while the FA-SVM method requires execution time of 2388 seconds. The time required by SVM is greater because SVM tries all combinations of C and σ with ranges C = 1.0-3.0 and σ = 1.0-2.0. The difference used is 0.01. Based on that range the combinations of C and σ are tried as many as 20000 combinations. While the time required by the FA-SVM method is smaller because the method does not try every combination of C and σ, but the method uses the objective function as a reference to find the best combination.

CONCLUSION
The results of this study indicate that the FA can help SVM to get the appropriate combination of parameters based on accuracy, so there is no need for trial and error to get the parameter values. This conclusion is proved by the results of the evaluation of the SVM method with a range of values C = 1.0-3.0 and σ = 0.1-1.0 giving the highest accuracy of 87.84%. The execution time needed is 5928 seconds. While the results of the evaluation of the FA-SVM method with the same range produce the same accuracy as the SVM method, which is 87.84%, but the execution time needed is shorter which is 2330 seconds. The difference in execution time of the SVM method with the FA-SVM method with a value range of C = 1.0-3.0 and σ = 0.1-1.0 is 3598 seconds. The next evaluation is carried out with a range of values C = 1.0-3.0 and σ = 1.0−2.0. Based on this range, the SVM method and the FA-SVM method produced the highest accuracy of 87.15%. SVM takes 7205 seconds, while the FA-SVM requires 2388 seconds to classify data. The time difference between the SVM method and the FA-SVM method is 4817 seconds.

FUTURE WORKS
This research still has shortcomings which can be further improved in future studies. The suggestion for the next research is that this study only tests the FA algorithm as an optimization method on SVM, then can be tested using other metaheuristic algorithms such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Gravitational Search Algorithm (GSA), etc.