Sentiment Analysis of Stakeholder Satisfaction Measurement

Mahasiswa Data kepentingan (Mahasiswa) semester Naïve Bayes Classifier (NBC). penelitian penelitian identifikasi studi pustaka, data (Mahasiswa), preprocessing data, agar mempermudah menggunakan algoritma Naïve Bayes Classifier (NBC). Data latih yang digunakan sebanyak 200 data sedangkan data latih sebanyak 2133 data. Hasil dari penelitian ini dapat memberikan rekomendasi kepada pihak ITB STIKOM Bali sentiment Abstract Measuring the satisfaction of stakeholders is very impoirtant in order to get feedback and input for the purposes of developing and implementing the improvement strategies. ITB STIKOM Bali routinely measures student stakeholder satisfaction every semester. This study aims to analyze stakeholder comments to generate sentiment analysis on stakeholder satisfaction. The data used are comments on the results of the measurement of stakeholder satisfaction (students) for the Odd Semester of 2020/2021 which are filled out through questionnaire. The algorithm used in this research is the Naïve Bayes Classifier (NBC). The research method in this study consisted of several stages, namely problem identification and literature study, data collection on stakeholder satisfaction (students), data preprocessing, feature extraction in order to facilitate classification using the Naïve Bayes Classifier (NBC) algorithm. The training data used is 200 data while the training data is 2133 data. The results of this study can provide recommendations to ITB STIKOM Bali for the results of student comments as a whole where the percentage of sentiment generated is 58% positive sentiment and 42% negative sentiment.


INTRODUCTION
Stakeholder satisfaction measurement is very important to be done in order to get feedback and input for the development and implementation of strategies to increase stakeholder satisfaction. ITB STIKOM Bali has standard operating procedures regarding the measurement of stakeholder satisfaction as outlined in the Manual Procedure for Stakeholder Satisfaction Measurement. The results of the measurement of stakeholder satisfaction are obtained from filling out the questionnaire consisted of various respondents' comments. This study aims to analyze stakeholder comments and the they are processed using the Naïve Bayes algorithm to generate sentiment analysis on stakeholder satisfaction.
The sentiment analysis research which uses the Naïve Bayes algorithm has been done numerously before. Research conducted by Samsir., et al in analyzing public opinion sentiment about online learning during the COVID19 pandemic generate 30% positive sentiment, 69% negative sentiment, and 1% neutral [1]. Research conducted by Safitri Juanita related to people's perceptions of the 2019 elections on Twitter social media concluded that negative perceptions of 52% are much larger than positive perceptions of 18% and neutral perceptions of 31% higher than positive perceptions [2]. Sentiment analysis is a branch of science from text mining, natural language programming, and artificial intelligence. The process carried out by sentiment analysis is to understand, extract, and process text data automatically so that it becomes useful information [3]. In addition, sentiment analysis is a field of science that analyzes opinions, attitudes, evaluations, and assessments of an event, topic, organization, or individual [4].
The Naïve Bayes method is a classification method in text mining which is used in sentiment analysis. This method is potentially good in classification in terms of precision and data computation. Naïve Bayes is widely used in classification techniques, especially Twitter using several methods such as Unigram Naïve Bayes, Multinomial Naïve Bayes, and Maximum Entropy Classification. The main feature of Naïve Bayes classification is to get a strong hypothesis from each condition or event.

Clustering
Clustering is a process of data grouping, observing, or grouping classes that have the same object (Junaedi, 2011). In contrast to the classification process, clustering does not have a target variable to perform. Clustering is often done as the first step in the data mining process. There are many clustering algorithms that have been used by previous researchers such as K-Means, Improved K-Means, K-Medoids (PAM), Fuzzy C-Means, DBSCAN, CLARANS and Fuzzy Subtractive.
Clustering has been widely used and the importance of clustering is growing rapidly because the amount of data associated with algebraic exponents in computer processing speed is very large [5]. The clustering algorithm functions to group data according to the characteristics and measure the distance of resemblance between the data in one group, although each clustering algorithm has its own advantages and disadvantages.

Algoritma Naïve Bayes Classifier (NBC)
The Naïve Bayes method is a classification method in text mining used in sentiment analysis. This method is potentially good in classification in terms of precision and data computation. Nave Bayes is widely used in classification techniques, especially Twitter which uses several methods such as Unigram Naïve Bayes, Multinomial Naïve Bayes, and Maximum Entropy Classification. The main feature of Naïve Bayes classification is to get a strong hypothesis from each condition or event.
Bayes is a simple probabilistic-based prediction technique which is based on the application of Bayes' theory (or Bayes' rule) with the assumption of strong (naive) independence. Bayes predictions are based on Bayes' theorem with the general formula [6]: Information :

X
: is sample data with unknown class (label). H : is a hypothesis that X is data with class (label) C. P(H) : is the probability of hypothesis H. P(X) : is the probability of the observed sample data. P(X|H) : is the probability of sample data X, if it is assumed that the hypothesis is true (valid).

Research Stages
The research method in this study consists of several stages including the first stage of problem identification and literature study, the second stage of collecting stakeholder satisfaction data (students), the third stage of data preprocessing, the fourth stage is feature extraction in order to facilitate classification using the Naïve Bayes Classifier (NBC) algorithm. The fifth stage is the testing and evaluation stage of the model. Figure 1 shows the stages of research in the analysis of stakeholder satisfaction sentiment using the Naïve Bayes Classifier (NBC) algorithm.  At this stage, identification of problems regarding stakeholder (students) satisfaction was carried out. The results of filling out the stakeholder satisfaction questionnaire were obtained directly from the Directorate of Quality Assurance and Internal Supervision of ITB STIKOM Bali. Stakeholder satisfaction measurement is carried out once in every semester according to the rules contained in the Procedure Manual. Then the search for theories related to text mining and sentiment analysis was conducted as well as studying the relevant literature on similar research problems that have been done before which were used to support this research.

b. Research Data Collection
At this stage, research data is collected. The data was obtained from the Directorate of Quality Assurance and Internal Control. The data was in the form of comments as the results of measuring stakeholder satisfaction (students) for the Odd semester 2020/2021 which was filled out through questionnaires. The number of active students in the 2020/2021 Odd Semester was 4338 people, while the number of respondents in that semester was 3856 people (88.89%).

c. Preprocessing Data
Data preprocessing stage is the stage of selecting data and transforming it structurely. There are several processes in data preprocessing, including the Cleansing stage. The sentences obtained usually still contain noise, namely random errors or variances in the measured variable, thus the noise must be eliminated. The next stage is parsing, which is the process of breaking the document into a word by analyzing a collection of words by separating the words and determining the syntactic structure of each word. The next stage is sentence normalization which aims to normalize sentences so that slang (street language) becomes normal words so that the slang can be recognized as a language that is in accordance with the KBBI. The tokenization process is used in identifying words and breaking sentences into terms based on spaces and punctuation marks. The last stage in preprocessing is stemming, changing affixes into basic words [7].

d. Feature Extraction
Next in this research is the feature extraction to make the classification of Naïve Bayes easier. This stage produces a model and is used to show the accuracy of the classification results. At this stage, the word terminology is weighted using the TF/IDF Algorithm, which is a way to calculate the frequency of word occurrences.

Preprocessing Data
The data result of measuring the satisfaction of stakeholders (students) consists of Student ID, expected and actual values, as well as comments stored in the form of an XLS file. The data obtained were 4338 data. The data preprocessing process is carried out to remove noise and clarify features and clean up by removing unnecessary attributes. This process will cultivate a text into data that is easily accepted by the system when the main process is carried out [8].
The preprocessing stage is needed before implementing the algorithm. The data preprocessing process uses Rapid Miner, where there are several stages of the process, including: a. Import Data: retrieve data to be processed from storage so that data can be read on local Rapid Miner. At this stage, you will see the overall data and missing data which need preprocessing. Based on the results of the analysis on statistical data, there are 153 missing data. The number of data processed at this stage is 4338 data. b. Replace Data: this operator is used to remove some of the characters in the comments. Characters or symbols that are omitted such as period (.), exclamation mark (!), question mark (?), and others. The number of comment data after being processed using the replace operator becomes 3855 comments. c. Filter Examples: is an operator used to remove empty data rows marked with a question mark symbol (?) (missing attribute). The number of data after the filtering process is 3688 comments. d. Remove Duplicates : is an operator used to delete the same multiple comments to simplify the sentiment analysis process and reduce comment labeling time. The number of comments data after processing with this operator is 2333 comments. This stage produces data that is ready to be processed. e. Write CSV : is an operator to save the last result data in *.CSV data format. This data is a data set that will be manually labeled with sentiment. The results of preprocessing data using Rapid Miner can be seen in Figure 2.

Building Models using Naïve Bayes Classifier (NBC)
The data generated from the Preprocessing Data process is 2333 data divided into training data and test data. The training data is 200 data and the test data is 2133 data. Of the 200 training data, they will be manually labeled with two categories, namely "Positive" and "Negative". There are several stages in building a classification model using the Naïve Bayes Classifier (NBC) using Rapid Miner, including: a. Filter Data: is the stage for filtering data. At this stage the Filter Examples operator is used, where the data to be used is only data that has a label, which is 200 data. The remainder that is not included in the label will be used at a later stage. b. Data Processing: is the stage to process data/documents that have been sorted (filtered) in the previous stage. This stage uses the Process Document operator in which there are several stages of text processing including: Tokenize, Transform Cases, Filter Stopwords, and Filter Tokens). At this stage, feature extraction from the comment data is carried out so that from 200 comment data, they form attributes in the form of columns containing words contained in each comment line, of which there are 371 attributes (number of important words extracted from comments). c. Data modeling: is the stage of recognizing the data model using the Naïve Bayes Algorithm. At this stage the algorithm will study the data pattern in the data set from the previous stage in which it produces a classification model to be implemented on other comments that do not yet have a label. At this stage the Naïve Bayes operator is used. Then, after the algorithm studies the model, the Naïve Bayes classification model will be stored on the Store operator and also store the training data on the Store operator.
The stages of the results on Rapid Miner can be seen in Figure 4.

Classification Model Implementation
The model that we have built in the previous stage will be implemented on comment data that does not yet have a label. There are 200 data from 2333 comment data that have been labeled, so the rest will be labeled in this process. There are several stages in the process of implementing this model, including: a. Reading the data set: is the stage of collecting data from the results of the previous sentiment analysis using the read operator. b. Filter Data: is the stage of sorting data, where the data to be used in this process is data that does not yet have a label using the Filter Examples operator. The results of the filter process resulted in 2133 data that did not yet have a label in the sentiment column as shown in Figure 5. Combining training data attributes with test data. At this stage, training data is collected which has been processed by sentiment analysis and stored previously. This is done to combine the attributes in the training data with the attributes in the test data. When combining training data attributes and test data attributes there will be unequal attributes where the rapid miner reads the data as missing volume data. Thus the Union operator is needed to combine the training data attributes with the test data attributes then proceed with the process of filtering missing data using the Filter Examples operator, the results of the process can be seen in Figure 6. e. Change the missing attribute (containing a "?") to a value of 0 by using the Replace Missing Values operator. Furthermore, the data is ready to be implemented on the classification model that was previously formed by taking the model (model_analysis_sentiment) and adding the Apply Model operator with the results of the process as shown in Figure 7. The results of the label with the implementation of the model using rapid miner can be seen in the "prediction" column.

Figure 7. Prediction Result from the Implementation Model
The training data which are used are as many as 200 comments and manually labeled where the results of the "Positive" comment label are 124 comments or 62% of the total training data while the "Negative" comment label is 76 comments or 38% of the total training data. The test data used are 2133 comments and are labeled from the classification model where the results of the "Positive" comment label are 1229 comments or 58% of the total test data, while the "Negative" comment label is 904 comments or 42% of the total test data. So it can be concluded from the total of all data (the results of preprocessing data) there are 58% "Positive" comments and 42% "Negative" comments as shown in Table 1. Based on the results of the test data labels from the resulting classification model, it can be analyzed that there are several label predictions that do not match the example in Figure 8, the comment "disappointing the service, please improve" the prediction of the label "Positive" even though this comment is a "Negative" comment. a. The results of this study can provide recommendations to the ITB STIKOM Bali for the results of student comments as a whole where the percentage of sentiment generated is 58% positive sentiment and 42% negative sentiment. b. There are several label predictions that are not appropriate, this is due to the lack of training data used in the formation of the classification model using the Naïve Bayes algorithm..