Indonesian Music Classification on Folk and Dangdut Genre Based on Rolloff Spectral Feature Using Support Vector Machine (SVM) Algorithm

Music Genre Classification is one of the interesting digital music processing topics. Genre is a category of artistry, in this case, especially music, to characterize and categorize music is now available in various forms and sources. One of the applications is in determining the music genre classification on folk songs and dangdut songs. The main problem in the classification music genre is to find a combination of features and classifiers that can provide the best result in classifying music files into music genres. So we need to develop methods and algorithms that can classify genres appropriately. This problem can be solved by using the Support Vector Machine (SVM). The genre classification process begins by selecting the song file that will be classified by the genre, then the preprocessing process, the collection features by utilizing feature extraction, and the last process is Support Vector Machine (SVM) classification process to produce genre types from selected song files. The final result of this research is to classify Indonesian folk music genre and dangdut music genre along with the 83.3% accuracy values that indicate the level of system relevance to the results of music genre classification and to provide genre labels on music files as to facilitate the management and search of music files. Keywords— Classification, Music Genre, Support Vector Machine (SVM).  ISSN (print): 1978-1520, ISSN (online): 2460-7258 IJCCS Vol. 15, No. 1, January 2021 : 11 – 20 12


INTRODUCTION
Music consists of various genres and types according to the music content. Genre is a characteristic or style of music itself. The development of music production has made many new music genres, namely folk music and dangdut music genres. These conditions can provide potential research to design automatic sub-genre classification which can be done better than manual classification.
In general the genre of music is easy to determine for listeners but difficult to distinguish. These limitations encourage the creation of genre classification for the development of digital music, which can facilitate the determination and study of variations in genre classification that is able to optimize the accuracy searching for a song. So the research needs a development with various methods and better algorithms. One of the methods that can be used is Deep Learning. The Deep Learning method is able to identify hidden patterns in data dynamics through an independent learning process [1]. The algorithm used is Support Vector Machine (SVM).
The research on music classification has been carried out by several previous researchers, including (Rene et al. 2017) entitled "Jazz Music Sub-Genre Classification Using Deep Learning" which discusses the classification of sub-genres of jazz music using the MLP, SVM and KNN methods where the results are not as expected by the author. Low performance is caused by not being able to learn from the features that distinguish each sub-genre. From the results of his study, the highest results is using the SVM algorithm. [2] Then the research continued by (Elbir et al. 2018) entitled "Music Genre Classification and Recommendation by Using Machine Learning Techniques" which also discussed music genre classification and recommendation using the Support Vector Machine (SVM) method. This study was aimed to classifying and recommending songs using acoustic features, extracted by digital signal processing methods and convolutional neural networks. Feature extraction has been done through digital signal processing methods and then CNN has been trained as an alternative feature extraction. Then the song acoustic feature is used in classification to determine the best classification algorithm and the best recommendation results is to use SVM [3].
From the description above, so far there has been no classification of Indonesian music in the genre of folksongs and the dangdut music genre, so an Indonesian music classification system was made in the folk song genre and dangdut genre by using the Support Vector Machine (SVM) algorithm. This music classification is expected to make it easier to determine the accuracy of song search based on the genre, especially for folk and dangdut songs.

1 Music
Music is a sound that comes from musical instruments or not musical instruments. Music is divided into 2 types in terms of the presence or absence of a lyric that is music having lyrics called songs and music having no lyrics, or is commonly known as instrumental. Music digital is one of the important data distributed via the internet. But it is still difficult to classify music based on its type.
Music is composed of three elements, namely melody, rhythm and harmony. Melody is a series of arranged notes or arranged pitch of the notes so that it becomes a song. Playing a melody is the same as playing the notations in the framework of a song notation without lyrics (called instrumental). Rhythm is the foundation from music or the step accompaniment in a song so that it becomes various kinds of rhythm patterns that can bring up various musical genres. Harmony is to harmonize between melody and rhythm or compose a tone by inserting ornaments and dynamics so that melodies in songs can be played loudly, softly, surging or vibrating [5].

2 Genre
Genre is a grouping of music according to their similarities to each other or the characteristics of a music that is formed based on the type of instrument used, regional cultivation and geographical conditions [5]. The word "genre" comes from the Latin "genus", which means type or class. Each genre has a unique pattern, for example sounds that are typical of guitar, bass, drums or electronic music instruments.

2.1 Folk Music
Folk music or songs are songs that are created by authors in certain areas and use local languages in the area and also one form of art and a part of culture known by the community [6] [7]. This explained that folk songs contained the behavior of the song, and the life of the local people in general. The rhythm pattern was very simple so it was easy to be reformed by anyone, both the local community and the people from other regions. Furthermore, stated as accompaniment of traditional ceremonies, accompaniment of traditional performance or games, and as a medium of communication in the show [8].

2.2 Dangdut Music
Dangdut song is one of the musical genres developed in Indonesia.. This form of music has its roots in Malay music in the 1940s. Changes in the flow of Indonesia politics in the late 1960s opened the entry of a strong influence of western music with the inclusion of the use of electric guitars and also forms of marketing. Since the 1970s dangdut can be said to have matured in its contemporary form. As a popular music, dangdut is very open to the influence of other forms of music, ranging from keroncong, langgam, degung, gambus, rock, pop even house music [9].

3 Spectrogram
Spectrogram is a representation of spectral (color of sound) that varies with time indicating the level of spectral density. In other words, it is the form of indicating the level of visualization of each frequency value that is equipped with energy levels that vary with time [10]. Spectrogram is a way to visualize how the different frequencies can create grayscale image that represent the signal frequency value at a certain time [11].  Figure 1 is a visualization of the AbangPulang.wav folk song genre, where in the seconds 20-25, 60 and 80-120 seconds area shows the highest amplitude intensity. The color in the spectrogram shows the intensity of the amplitude: the red color indicates the highest amplitude value and the blue color indicates the lowest amplitude. For example: the red color indicates the intensity of the amplitude of -20db to -40db, the blue color indicates the intensity of the amplitude of -10db to -30db.

4 Rolloff Spectral
Rolloff Spectral is a feature that shows the value of the bit index when a cumulative 14 amount of energy reaches 85% of the total signal energy in the frame. Rolloff is used to measure spectral acuity. In general, Rolloff spectral has an equation defined in Equation (1). (1) is the signal data bit index value when the cumulative amount of signal energy reaches 85% of the total amount of energy in the frame. [ ] is the amount of signal energy resulting from the fourier transform in the n-th frame of the index and N is the total number of bits present in frame t. The spectral Rolloff feature value is obtained by funding the values of in Equation 1 in the t frame that is processed [12].

5 Support Vector Machine
SVM was introduced by Vapnik as a kernel based machine learning model for classification and regression tasks. The extraordinary generalization capability of SVM, along with its optimal solution and its discriminative power, has attracted the attention of data mining, pattern recognition and machine learning communities in the last years. SVM has been used as a powerful tool for solving practical binary classification problems. It has been shown that SVMs are superior to other supervised learning methods [13].

5.1 SVM pada Linearly Separable Data
Linear separable data [8] is a data that can be separated linearly. For example { 1 , … , } are dataset and 1 ∈{+1, -1} is class label from data , for i = 1,2,3, ..., n, where n is amount of data. Whereas w is the weight vector and b is bias. Figure 2. shows that two different classes have alternative separators (hyperplane), where the dividing plane is the largest margin. It is assumed that the two classes -1 and 1 can be completely separated by the dimensionless hyperplane function d, which is defined as follows [14]: This problem can be solved by changing the Lagrange primal problem formula. So the equation becomes [14]: is the Lagrange coefficient and . The optimal value of the equation (2)  Every research data has value , where the data train with value is a support vector that can influence the decision function. After this quadratic programming problem solution is found (value of ), then the class of data of can be determined based on the value of the decision function [14]: (4) Where is support vector, ns is number of support vectors, and is data to be classified.

5.2 SVM pada NonLinear Separable Data
SVM on nonlinear separable data [14] is an approach that can be done for data that can not be separated linearly by transforming data into the dimension of feature space. Therefore, it can be linearly separated in feature space. To classify data that can not be separated linearly the SVM formula must be modified, otherwise, no solution will be found. Therefore, the two boundary fields must be changed, thus it is more flexible. Another method for classifying data that can not be separated linearly is by transforming data into feature space so that it can be separated linearly in the feature space. The data is mapped using mapping function (transformation) into feature space so there is a separating field to separate data according to its class as in Figure 3 For example there is a dataset having two attributes and two classes, namely positive and negative classes. The data that has a positive class is {(2,2),(2,−2),(−2,2),(−2,−2)}, and data that has a negative class is {(1,1),(1,−1),(−1,1),(−1,−1)}. If the data is depicted in two dimensional space it can be seen that this data can not be separated linearly. Therefore, the transformation function in equation 2.5 is used [14]: The data after transformation are {(6,2), (6,6), (2,6), (2,2)} for negative class, and {(1,1), (1,−1), (−1,1), (−1,−1)} for positive class. After that, the best separator field search is performed on this data. Figure 4 shows that data can not be separated linearly due to class differences. This section compares the results of folksong and dangdut songs has been implemented in Python. To extract from .wav music format, the researcher uses Librosa library and Keras. Figure 5 is the flow process of this research. The first step is data collection in the form of .wav music that is used. The second step is preprocessing, that applies the Spectral Rolloff to the data. The third step is scaling the feature columns process, used to scale the feature value. The fourth step is encoding the labels, which encodes genre labels to boolean value that represent the genre label. The fifth step is data separation divided into testing and training data. The sixth is training data and testing which will be applied with the SVM algorithm and produce accuracy values from the music genre classification. The last step is evaluation of this research.

1 Data Collection
In the test used a dataset of 90 songs data that consist of two music genres, they are folk songs and dangdut songs. Every genre consists of 45 songs with .wav data audio format and 22050 Hz frequency with mono channel. The research used 3 different ratios The result will be distinguished data by genre in each folder which will be visualized from each song in the form of spectrogram. The function of spectrogram is for visual representation of the sound frequency spectrum of the song genre.
Each audio file will be visualized in the form of a spectrogram by loading the audio file with the title track mono signal and has a duration of 90 seconds. This spectrogream is plotted on the condition of the FFT values of 2048, x axis represents time and y axis represents frequency.

2 Preprocessing
In this preprocessing step, the researcher performed the Rolloff feature extraction method to the collected data. The researcher specified the data format (the audio data in this case) and the location of the data folder. Then the researcher extracted from spectrogram to Rolloff feature extraction. The purpose of Rolloff feature extraction was to determine the frequency value that lies below the value of the concentrated spectrum (usually 90%). The next process was extracting spectrograms with the Rolloff extraction feature, the researcher created a file named data.csv. The file contains three columns with different values. The first column is the filename, where this column contains the file titles of each imputed dataset. The second column was rolloff, where this column contained the spectral rolloff value for each data from spectrogram extraction with Rolloff feature extraction. The third column was the label, where this column contained the genre information for each filename. The label contained two values, namely folk and dangdut. After we got the values of the dataset from extracting the Rolloff feature extraction, we wrote the data to a csv file. Next step was clearing data by dropping the filename because the data used such as Rolloff feature extraction values and the label values for testing in the SVM algorithm.

3 Encoding the Labels
This section converts the label text data into model-understandable numerical data using the Label Encoder class. To encode the first column, the researcher imported the Label Encoder class from sklearn library, fit and transformed the first column of the data, and then replaced the existing data with the new encoded data. The labels from folk and music were changed into boolean values [1 and 0] in an array that would be saved as y values. Table 1 shows that the value 1 represents as folk and value 0 represents as dangdut.

4 Scaling the Feature Columns
Audio features act as a quantitative way to provide the most important information in an audio file. The process of extracting relevant characteristics enclosed within the input data is called as feature extraction. This process converts an audio signal into a sequence of feature vectors. Feature extraction reduces the redundant information from audio signals and provides a compact representation. Audio features can be divided into two levels as top-level and low level according to perspective of music understanding. The top level labels provide information on how listeners interpret and understand music using different genres, moods, instruments, etc. Low-level audio features can also be categorized into short term and long term features on the basis of their time scale [15]. Feature scaling is a way to make numerical data in a dataset that has the same range of values (scale). In this step, we used the standard scaler assuming that our data was normally distributed within each feature and would scale them, so that the distribution is now centered around 0, with standard deviation of 1. The purpose of scaling was to improve the accuracy and speed of calculation of the machine learning model.

5 Training and Testing Data Set
In machine learning, there are several ways of data partitioning for experimentation. The most popular ways are typically referred to as training or testi partitioning or cross validation. The training or test partitioning involves the partitioning of the data into a training set and test set in a specific ratio, e.g., 70% of the data are used as the training set and 30% of the data are used as the test set. This partitioning can be done randomly or in a fixed way [16].
During this training and testing process, the researcher used 0.1, 0.2 and 0.3 values of test size, which means that 90% of the data was for training and the rest 10% is testing (90:10 ratio), and so on. Based on the best result value was by using the 80:20 ratio. The 80:20 ratio was based on the number of datasets that we have (90 data). This describes 72 data for training and 18 data for testing. Each part of training and data set will be informed as two variables which is X and Y. X is represented as a variable that contains a feature vector of a single song and Y is represented as a variable that contains a single label song.

5. 1 Training Data Set
Training set is a part of a dataset that the researcher trains to make predictions or run the functions of a machine learning algorithm. The researcher provides clues through the algorithms, so the machine learning that is already trained can look for their own correlations or learn patterns from the given data.

5. 2 Testing Data Set
In the training stage, the testing method for classification used the metrics accuracy score function from the sklearn. This function computed subset accuracy, either the fraction (default) or the count (normalize = False). The input layer for the SVM algorithm were the values from Rolloff feature extraction.The set of labels predicted for a sample must exactly match the corresponding set of labels that were tested. Support Vector Machine (SVM) trained using the prepared training data.

6 Algorithm Modelling
In this research, the algorithm used was the Support Vector Machine algorithm for classification or commonly called SVC (Support Vector Classification). In this modelling classification, the researcher used a kernel to compute the dot product of two vectors x and y in a high dimensional feature space, or sometimes called "generalized dot product". The Radial Basis Function (RBF) was to validate the data. After applying the algorithm, the x data for predicting which data was folk and dangdut genre, then the y data for testing.  Table 2 shows that there were four predicted and tested data for the classification of genres. Data with 0 value was folk genre data and data with 1 value was dangdut genre data. All four data had a random filename for the input. The first data had 0 values for the test and 0 value for the prediction, showing the same value. It means that the data was dangdut genre. The second data tested had 1 value in the test and 1 value in prediction, showing the same value. It means that the data was folk genre. The third data tested had 1 value for test and 0 value for prediction. The researcher concluded between test and prediction showed invalid values. The last data tested had 0 value for test and 0 value in prediction, having the same value. It means that the data is dangdut genre.
From Table 3 shows the results of predicting and testing from three different ratios that were used. Based on the 90:10 ratio, the amount of true predicting data is 7 amount data and false predicting data is 9 amount of data. Next is the 80:20 ratio has 15 amount data of true predicting data and 3 amount data of false predicting data. Last is the 70:30 ratio has 18 amount of data of true predicting data and 9 amount of data of false predicting data. After predicting the pred and y_test value, the researcher imported the accuracy_score to know the exact result of the research. Accuracy_score is a function that computes subset accuracy where there are sets of labels predicted for a sample that must exactly match the corresponding set of the labels in y_test. Line three described the accuracy value times 100, because the researcher needed it in percentage.  Table 4 shows the accuracy result of three different ratios that were used. The highest result is by using the 80:20 ratio with 15 amounts of data where true and 3 amounts of data that were incompatible. This was calculated with the accuracy formula in equation (6) and got an accuracy value of 83.3%. The lowest result is by using the 70:30 ratio with 18 amounts of data where true and 9 amounts of data were incompatible. Based on the result of the implementation, testing and analysis were carried out, it could be concluded that the classification of music genre on folk songs and dangdut songs using one algorithm which is Support Vector Machine (SVM) was successfully applied with calculations based on training and test data results and result in accuracy data reaching 83.3% to determine genre based on the music sample used as the test material.
The accuracy produced in the classification of 2 music genres with 90 songs using three different ratios and Support Vector Machine (SVM) algorithm have significant differences. The existence of several different algorithms and amount of songs with different tone frequencies are an important factor in obtaining a higher level of accuracy. The more algorithms, data samples and good quality music frequencies which are used can certainly make it easier to determine the accuracy of song search based on the genre, especially for folk songs and dangdut songs.