Adaptive Unified Differential Evolution for Clustering

Various clustering methods to obtain optimal information continues to evolve one of its development is Evolutionary Algorithm (EA). Adaptive Unified Differential Evolution (AuDE), is the development of Differential Evolution (DE) which is one of the EA techniques. AuDE has self adaptive scale factor control parameters (F) and crossover-rate (Cr).. It also has a single mutation strategy that represents the most commonly used standard mutation strategies from previous studies. The AuDE clustering method was tested using 4 datasets. Silhouette Index and CS Measure is a fitness function used as a measure of the quality of clustering results. The quality of the AuDE clustering results is then compared against the quality of clustering results using the DE method. The results show that the AuDE mutation strategy can expand the cluster central search produced by ED so that better clustering quality can be obtained. The comparison of the quality of AuDE and DE using Silhoutte Index is 1:0.816, whereas the use of CS Measure shows a comparison of 0.565:1. The execution time required AuDE shows better but Number significant results, aimed at the comparison of Silhoutte Index usage of 0.99:1 , Whereas on the use of CS Measure obtained the comparison of 0.184:1. Keywords— AuDE, DE, Clustering  ISSN (print): 1978-1520, ISSN (online): 2460-7258 IJCCS Vol. 12, No. 1, January 2018 : 53 – 62 54


INTRODUCTION
Clustering as one of the popular pattern recognition techniques and has been used in various fields, such as web mining, machine learning, image segmentation, biometric recognition, electrical engineering, mechanical engineering, remote sensing, and genetics [1].
Various clustering methods to obtain optimal information continue to grow with the rapidity of science.One such development is the Evolutionary Algorithm (EA).EA is part of the Evolutionary Computation in Artificial Intelligence that mimics the evolutionary biology of living things.There are several algorithms included in the Evolutionary Algorithm, including Genetic Algorithm (GA), Genetic Programming (GP), Evolutionary Strategies (ES), Differential Evolution (DE), Evolutionary Programming (EP), and Grammatical Evolution (GE).Data clustering studies like [2] and [3] use DE as the main method.
In 1995 Storm and Price introduced DE as one of the techniques of Evolutionary Computation.The DE population encoding uses real numbers and has been widely applied to solve optimization problems e.g. in data clustering, digital filter design, linear function optimization, and multi-objective optimization [4].
DE has parameters that greatly affect its own performance.Determining the right combination of parameter values in the DE is not easy because it takes careful observation and also depends on the issues to be solved.Therefore some researchers like [5], [6], and [7] developed parameter tuning models automatically to overcome manual parameter tuning weaknesses.In addition to the control parameters that greatly affect performance, DE also relies heavily on its mutation strategy [8].When DE is used to solve the optimization problem, the first must be determined is the mutation strategy, then determined the control parameters of the DE by the procedure of trial and error.The selection of appropriate mutation strategies and parameter values with these trial and error procedures often takes a great deal of time, the problem being the initial ideas of developing new DE variations with self adaptive parameters.
In 2016 Qiang and Mitchel introduced Adaptive Unified Differential Evolution (AuDE).AuDE has a scale factor control parameter (F) and a self-adaptive crossover rate (Cr).AuDE also has a single mutation strategy from some combination of mutation strategy expressions that are representative of commonly used standard mutation strategies from previous studies [9].
In this study AuDE is used for clustering.The application of AuDE in clustering begins with a chromosome initialization process containing the centroid.Each chromosome contained the activation threshold to be used as an active determinant of whether or not a centroid.The initialization process of forming a chromosome population which underwent the process of updating the population through mutation, crossover and selection to get the population of chromosomes that will be used in the next generation.In the AuDE, there is a new generation of scaling factor parameter values (F) and crossover rate (Cr) self-adaptive in mutation and crossover processes so that users do not need to select the appropriate control parameters.

METHODS
In this study AuDE was applied to overcome the weaknesses of DE in determining static parameters.Evaluation of population quality was performed using Silhouette Index and CS Measure as a fitness function of AuDE clustering.The results of the AuDE evaluation were then compared with the results of the clustering evaluation produced using DE.In addition to measuring the quality of clustering results, the execution time of both methods will also be compared.

Testing Data
Table 1 is a breakdown of the amount of data, attributes, and classes on each dataset used to test the clustering results.The dataset used to compare the clustering results of AuDE and DE is the Iris dataset, Wine, Glass, Ecoli from the UCI Machine Learning Repository (URL: http://archive.ics.uci.edu/ml/)[10].

Adaptive Unified Differential Evolution (AuDE)
AuDE is a development algorithm of Differential Evolution (DE) proposed by Qiang and Mitchell in 2016.DE belongs to the population-based metaheuristic category.The basic idea of ED is to take advantage of individual differences within the population to perform a search for solutions.DE is an algorithm included in the Evolutionary Algorithm (EA) group.
The development of the AuDE algorithm lies in the control parameters used during the mutation and crossover process, in addition the AuDE mutation strategy also gains development.

Initialization Population
The initialization stage is the determination of initial control parameters and initial generation (G) population.Initialization of the initial population is done by determining the number of members in the population (NP).For each chromosome or target vector in a population it can be written with .a control parameter scale factor worth positive natural number between [0,1] that are used during the process of mutation to control the evolution of the population.Cr is a parameter of the control parameter value between the positive natural numbers [0,1] are used to determine the inheritance of genes possessed by the target vector and mutant vectors in formation trial vector by comparing it with the random number generated at the crossover process , in other words Cr controls the crossover process.

Mutation Process
Mutation is the process of mutant vector formation.Before the process of forming the mutant vector, the four parameters of the new scale factor control Fj with j = 1, 2, 3, 4 in generation (G + 1) are searched using Equation 1.While the mutant vector formation is done by using Equation 2 In Equation 2, is the i-th mutant vector, is the i-th destination vector, is the best vector among the population NP, and , , , , are random vectors of the population NP.

Crossover Process
The crossover process between the mutant vector and the target vector is done to improve the diversity of potential new solutions.Prospective new solutions formed from these two vectors are called trial vector .The component of the trial vector is generated from the mutant vector or the target vector depending on the value of the crossover rate control parameter (Cr).Equation 4 is a crossover scheme used in AuDE.Before performing the crossover process, carried out searches crossover rate control parameter in the generation of new Cr (G + 1) using Equation 3 In Equation 4, is the i-th trial vector, is a random value between [0, 1], and is the randomly selected element of the target vector target .

Selection Process
The selection process is performed to generate target vectors in the next generation by comparing the value of the vector fitness trial with the target vector fitness value, if the trial vector fitness value is better than the target vector fitness value then the trial vector will be the In Equation 5, is the i-th destination vector for the next generation, is the fitness value of the i-th trial vector and is the value fitness i-th target vector.

Silhouette Index
In Equation 6, Si is the Silhoutte Index of the i data or silhoutte width, is the average distance of the i data with the other data in the same cluster, whereas is the average distance between the i data and the data in different clusters [11].and can be calculated using Equation 7and Equation 8, representing the distance between the i-th and j-th data while and represent the number of i-th cluster data and the k-cluster In Equation 11Xk and Yk are the values of x and y in the k-th attribute.Determination of cluster membership is done by allocating data into cluster members where the data distance to the cluster center point is the shortest [13].

Clustering Flow Using AuDE
The clustering flow using AuDE is shown in Figure 2.

Comparison of Number of Final Clusters
Table 2 is the average number of clusters end results of 25 trials using Silhouette Index.On testing Aude use Silhouette Index, Ecoli dataset has average number of clusters end results and the standard deviation of the best among the four datasets are 7.88 and 0.6.While dataset Wine has the average number of clusters end result and the worst standard deviation is 3.85 and 1.3.
Reviewed DE testing using Silhouette Index, Iris dataset has the average number of clusters end results and the standard deviation of the best among the four datasets are 4.28 and 0.93.While E. coli dataset has the average number of clusters end result and the worst standard deviation is 5.76 and 1.36.Standard deviation can have great value due to the result of the number of clusters in 25 trials had variations in results with a range of great value.Table 3 is an average number of 25 trial times the final cluster uses CS Measure.In the AuDE test using CS Measure, Glass dataset has average number of final clusters and the standard deviation of the best among the four datasets, which is 6 and 1.While dataset Wine has average number of final clusters and the standard deviation which is 5.1 and 2,66.Table 3 is the result of DE using CS Measure, Iris dataset has the best average number of final clusters and the standard deviation among the four datasets which is 3.36 and 0.91.While Ecoli dataset has average number of final clusters and the worst standard deviation which is 4.96 and 1.53.

Comparison of Clustering Results Quality
Comparison of cluster quality using Silhoutte Index from clustering result by using AuDE and DE method can be seen in Table 4.
Table 4 Comparison of cluster quality using Silhouette Index Comparison of CS Measure of clustering results by using Aude and DE can be seen in Table 5.The results of the cluster using the CS Measure will be well worth it if the validity value close to 0, and would be worth worse if the value generated even greater.The clustering results using the CS Measure cluster validity show that the AuDE method has better validity on all tested datasets than the DE method, but the AuDE method also has poor validity on the Iris, Wine and Ecoli datasets because they have sufficient CS measures which is 0.78, 0.86, and 1.48.

Comparison of Execution Time
The required execution time using the Silhouette Index in AuDE and DE as shown in The required execution time using CS Measure in AuDE and DE also shows the result difference is not too large.By Table 7, execution time using CS Measure validity takes more time on DE testing, with the most time difference occurring in the Wine dataset.

CONCLUSIONS
Based on the data used in this study can be concluded the quality of the clustering using a measuring instrument Silhouette Index show that the clustering results generated by Aude better with a ratio of 1: 0816 compared to classical DE.The quality of clustering results using a measuring instrument CS Measure indicate that the clustering results generated by Aude better with a ratio of 0565: 1 compared to DE.
Execution time using Silhouette Index in AuDE shows the execution time is almost equal to DE, can be seen from the comparison of execution time AuDE with DE is 0.99: 1.While the execution time using CS Measure, AuDE and DE has a ratio of 0.82: 1.

Figure 1 .
Figure 1.Clustering testing architecture Figure 1 is a clustering testing architecture using AuDE and DE.The quality of clustering results and execution time is obtained from clustering testing of 25 experiments for each dataset with different methods and fitness functions.

Figure 2
Figure 2 Clustering flow using AuDEThe active centroid of each i-target vector is obtained by using rule Figure3.IF activation threshold > 0.5 THEN centroid ACTIVE, ELSE centroid INACTIVEFigure3The centroid activation rule Figure3is a centroid activation rule.Centroid is declared active when threshold activation ≥ 0.5 while centroid is declared inactive when threshold activation <0.5.

Figure 4
Figure 4 is a comparison graph of changes in cluster quality values or fitness values for each generation in AuDE and DE using the Silhoutte Index cluster quality gauge.Based on Figure 4 it can be analyzed that on the use of the Silhouette Index the change in the rate of fitness in AuDE has increased in the next generation.Increased fitness values in AuDE occur because of the search for adaptive control parameters and mutation strategies that can perform wider searches.The use of the Silhouette Index in DE, most of the 25 trials of each dataset, results in a fitness value that does not change significantly for each generation.This occurs because of the search for control parameters on static DE and simple mutation strategies owned by ED so that the results in updating chromosomes mutation process is less varied.

Table 2
Average number of final clusters using Silhouette Index

Table 3
Average number of final clusters using CS Measure ], the resulting clusters would be well worth it if the validity of close to 1, and will be worth bad if close to -1.The clustering results in Table4using Silhouette Index shows Aude has validity better value in all the tested dataset compared with the use DE methods, but the methods of DE also generate good validity in the dataset Iris and Wine.Tabel 5 Perbandingan validitas cluster menggunakan CS Measure

Table 6
shows the difference in yields that are not too large, but of the four datasets, Wine, Ecoli require longer execution time in AuDE compared to the time of execution on DE.This is because AuDE takes time to look for new scale factor parameters in the mutation process and new crossover rate parameters in crossover process adaptively.Tabel 6 Waktu eksekusi menggunakan Silhoutte IndexBesides finding a new control parameters, Aude also takes time to get five random vectors and have a longer mutation strategy.The execution time in the clustering testing process using AuDE and DE is greatly influenced by the size of the population and the number of generations.Table7Execution time using CS Measure