Detection of Cataract Based on Image Features Using Convolutional Neural Networks

Cataract are the highest cause of blindness that there are 32.4 million people experiencing blindness and as many as 191 million people experiencing visual disabilities in 2010 in the world. On the other hand, the longer a patient suffers from cataracts or late treatment. The development of cataract identification using a traditional algorithm based on feature representation is highly dependent on the classification process carried out by an eye specialist so that the method is prone to misclassification of a person detected or not. However, at this time there is a deep learning, convolutional neural network (CNN) which is used for pattern recognition which can help automate image classification. This research was conducted to increase the accuracy value and minimize data loss in the process of cataract identification by performing an experience namely the manipulation process was carried out by changing epochs. The results of this study indicate that the addition of epochs affects accuracy and loss data from CNN. By comparing variety of epoch values it can be ignored that the higher the age values used, the higher the value of the model. In this study, using the epoch 50 value reached the highest value with a value of 95%. Based on the model that has been made it has also been successful to receive images according to the specified class. After testing accurately, 10 images achieved an average accuracy of 88%.


INTRODUCTION
Cataract is defined as a clouding of the lens of the eye that causes visual defects. Cataract is the highest cause of blindness [1], [2]. Based on research by [1] states that there are 32.4 million people experiencing blindness and as many as 191 million people experiencing visual disabilities in 2010 in the world, of which 33.4% of all cases of blindness and 18.4% of cases of vision defects caused by cataracts. Besides, the number of people who have lost their sight due to cataracts worldwide is likely to reach 40 million by 2025. On the other hand, the longer a patient suffers from cataracts or late treatment, the more severe the damage will be to a cataract sufferer's vision. An ophthalmologist can diagnose cataracts by looking at the degree of brightness from the fundus photo. The development of cataract identification using a traditional algorithm based on feature representation is highly dependent on the classification process carried out by an eye specialist so that the method is prone to misclassification of a person detected or not.
However, at this time there is a new method known as deep learning, in particular the convolutional neural network (CNN) which is used for pattern recognition (including images) which can help automate image classification, in this case, the retinal fundus data image [3], [4]. Among several methods of machine learning, the convolutional neural network (CNN) is a very popular method because of its ability to solve problems in computer vision domains, namely among others in segmentation, detection systems, classification systems, and other computer vision and video analysis applications [5]. It is an improved version of multilayer perceptron or other networks [6], [7]. A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, RELU layer i.e. activation function, pooling layers, fully connected layers, and normalization layers [8]. On the other hand, nowadays, evolutionary computing is increasingly being used to solve a problem with an optimization approach [9], [10]. One case is the use of evolutionary computing to determine the optimal parameters of a function [11]. When associated with CNN, the determination of CNN architecture is closely related to the selection of several parameters, including the number and size of the kernel, the number and type of layers, and the type of activation function [7].
This research proposes to determine optimal CNN with a set number of epochs for the case of cataract identification. Research on classification or identification for cataracts has been carried out by many previous researchers, [12], [13], [14]. Karamihan [12] utilizes the detection of cataract eye images and their characteristics using Deep CNN through GoogleNet Transfer Learning and MATLAB to prove that the system created is accurate and reliable. Sahana [13] used data with an initial V3 architecture trained on a deep learning image network divided into adult and immature cataracts and produced an accuracy of 87.5% using transfer learning and TensorFlow. Zhang [14] used the Deep Convolutional Neural Network (DCNN) to detect and assess cataracts automatically, as well as visualize multiple feature maps on the pool5 layer with high-order empirical semantic meaning, explaining the representation of features extracted by DCNN. As the number of samples available increases, the accuracy of DCNN classification increases, and the range of fluctuation inaccuracy become more stable.

Cataract
Cataract is clouding of the lens or loss of transparency which is normally clear. Lenses that are transparent or clear, are maintained by the uniformity of the fibers, distribution, and composition of the crystalline protein in the lens. The transparency property of the lens can decrease because the lens changes the bond structure of the protein and lens nucleus, increasing in the turbidity of the lens nucleus [15]. Cataracts can occur without the appearance of symptoms, or they can be discovered incidentally during an eye examination of the patient. Cataracts rarely cause pain but can make central vision loss and even lead to blindness [16].
One of the initial complaints that patients feel is glare or not resistance to bright light, for example, direct sunlight or headlights from motorized vehicles. Then vision at both distance and near distance will begin to be disturbed. Other complaints that can arise include foggy vision, vision is unclear colors, or double vision [17].

Digital Image Processing
The digital image is a multimedia component that plays an important role in presenting visual information. Image processing aims to improve image quality according to user needs [18]. Image processing techniques can produce other images or producing features from the input image. Figure 1 shows the steps in digital image processing which consists of: Image acquisition, the first step in image processing is image acquisition, which is the process of capturing or taking the required image using imaging sensors such as cameras, scanners, and others.

Research Design
System design that uses image processing requires several processes to form an output decision from the detection system. In this system, the functions are interconnected with other processes so that the resulting process will be the input of the next process until it becomes the final output of the system based on the results of a model that has been trained on a set of data (dataset/database). The data used are image data or eye images, both normal and diagnosed with cataracts. The developed system flowchart is shown in Figure 2.

Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN) is developed based on the multilayer layer perceptron (MLP) which is designed or intended for processing two-dimensional data in the form of images. CNN is a variant of the deep neural network because of its high network depth and is widely applied to complex image data [18]. The image classification process can basically only use MLP, but one of the weaknesses of this MLP method is that it is not suitable because it cannot store spatial information from the mind data and each pixel is considered an independent or large feature, thus allowing to get unfavorable results [19].

Figure 3 Arsitekstur CNN
The CNN method is the same as other neural network theories which are trained using the backpropagation algorithm. CNN is designed to recognize visual patterns directly from image pixels by minimizing preprocessing. CNN can recognize patterns with a wide variety, resistant to distortion and simple geometric transformations. The architecture of CNN is divided into 2 major parts, namely the Feature Extraction Layer and the Fully-Connected Layer [20]. Each stage consists of three layers, namely the convolutional layer, the layer activation function, and the pooling layer, which can be seen in Figure 3 which is the CNN network architecture.

4.1 Convolutional layer
The convolution layer is part of the stage in the CNN architecture. This stage carries out a convolutional operation on the output of the previous layer [21]. This layer is the main process that underlies the CNN network architecture. This operation applies the output function as a feature map of the input image. The convolution operation can be written as equation 1.

4.2 Activation Functions
Activation functions are mathematical operations that are applied to the output signal. The activation function is used to determine whether a neuron is active or not based on the weighted sum of input. Activation functions that are often used in convolutional neural networks include tanh (), ReLu (Rectified Linear Unit), sigmoid, and softmax [22] [23]. This research will be used ReLu and SoftMax activation functions.

Relu.
The ReLu function is a function the output value of a neuron can be expressed as 0 if the input value is negative. If the input value is positive, the output of a neuron is the activation input value itself. This function equation can be shown in equation (2) [22].

4.3 Pooling Operation
After calculating the activation function, pooling operations are carried out by reducing size of the matrix using max-pooling or average-pooling. The output from the pooling operation is a matrix with smaller dimensions compared to the initial image [22]. The convolution and pooling process is carried out to obtain the desired feature map to be input into a fully connected layer [24]. The pooling illustration is shown in Figure 4, namely pooling by max-pooling.

4.4 Dropout Regulation
Dropout is a neural network regulation technique to select several neurons randomly and not to be used during the training process, in other words, these neurons are randomly discarded [25]. This means that the contribution of the discarded neurons will be temporarily stopped by the network and the new weight is also not applied to the neurons at the time of backpropagation. Following the dropout process can be seen in Figure 5.

Figure 5 Dropout Regulation
In an ordinary artificial neural network, for example, is the output value of a layer and is the input value at layer where and are the weight and bias of layer , with units then the calculation of the feedforward process using the activation function can be done on Equations 4 and 5 [25]. Meanwhile, in a network that implements the dropout technique, the variable represents the vector along which stores the value obtained from the Bernoulli distribution. The feedforward process is carried out in Equations 6, 7, and 8.

4.5 Optimizer
The optimizer is one of the parameters needed to build the model. The optimizer has an important role in increasing the accuracy of a model. In this study using the Adam method for the optimizer parameter. Adam (adaptive moment estimation) is calculating the adaptive learning rate for each parameter [10]. The recommended parameter values are b1 = 0.9, b2 = 0.999, and e = 10-8 where b1 = b2 = the exponential rate of decline and e = the epsilon value for parameter updates.

4.6 Cross-Entropy Loss Function
As a common loss function in the training of classification tasks, the cross-entropy loss plays an important role in the training process of the neural network to measure whether the current model is good enough or not. Loss calculated based on this criterion can update the model parameters by its gradient, and the output loss could be minimized in this way. Currently improved classification loss functions are usually extended from the standard cross-entropy loss, such as L-Softmax and AM-Softmax. The cross-entropy loss function is optimized to make the features extracted from the neural network more representative [26]. For instance, the calculation formula of AM-Softmax is as follows (9): ……(9) 3. RESULTS AND DISCUSSION

Preprocessing
The data used in this research is fundus image data from the dataset of Kambang Eye Hospital. The data used were 380 fundus images consisting of 240 images of cataracts and 140 images of normal eyes. In the data preprocessing stage, the first thing to do is define the input parameters that will be used to determine or equalize the image dimensions used. In this study, the dimensions are 150 x 150 and then determine the batch_size. Batch_size is the number of images used in one training test later.
Besides, at this preprocessing stage, it is determined how many epochs (iterations) are used for the training test. In this preprocessing, an augmentation process is carried out for the fundus image or image used. Augmentation is the process of changing or modifying an image in such a way that the computer will detect that the changed image is a different image, but humans can still tell that the changed image is the same. Augmentation can increase the accuracy of the trained CNN model because the augmentation model gets additional data that can be useful for making models that can generalize better. In the augmentation process, rescale is carried out for the dataset used. All data in the form of normal eye data or cataract eye data were rescaled with a size of 1/255, share_range scale = 0.2 zoom_range scale = 0.2 and Shearing image scale 0.2, image zooming with a range of 0.2 and horizontal flip (horizontal rotation) or horizontal rotation was performed 180 degrees. And the results of the augmentation process are shown in Figure   Figure 6 Image Augmentation

Partition and Image label
After the augmentation process, the next process is to divide the data into 2 + 1 parts, the data sharing uses the Train Test Split technique which is supported by the Sklearn library, which is dividing the data into 80% training data and 20% test data, then the test data is divided again into two data, namely 10% test data, and 10% validation data. Besides, the classification class was also labeled into 2 classes, namely normal class, and cataract class. This data sharing and labeling process are shown in Figure 7.
This class and data sharing process was carried out as the beginning of the process to build a CNN model from fundal image detection for the classification of cataract eyes or normal eyes. Based on Figure 18, it can be seen that there are 181 images used as training data, 100 images as validation data, and 99 images as testing data. Each of these sections has been divided into 2 classes, namely normal or cataract classes.

Learning Rate and Optimization of epoch for fitness values calculation
In this research, the manipulation process was carried out by changing various iterations or epochs. Considering this process, we need to determine first the optimal number of epoch in training data. The hope is that by doing modeling with various epochs, a minimum of epochs will be obtained to get maximum accuracy.
Learning rate is one of the hyperparameters that greatly affects the performance of a CNN model. In searching for the right level of learning, a method called cyclic learning rates, in this method, training will be carried out several times, with a learning speed starting from a small value, and each iteration the learning level will be enlarged, each iteration will be seen from the losses obtained, and if the loss obtained increases drastically, the search process will be stopped. Based on data losses are obtained, then the level of learning chosen at any time before the loss has the lowest value. This research using a learning rate with a value of 5, 10, 20, and 50. It is shown in Figure  8,9,10,11 and 12 that doing experience by using variations in the number of epochs in learning, it appears that increasing the number of epochs greatly affects getting good accuracy.

Result of Cataract identification
After the CNN modeling process is carried out and obtaining the best accuracy value, the image testing process is carried out by inserting a new image. In testing the image, has been able to detect cataract eye images that are input into the system. Testing will be carried out using test data of 8 different images. Images are divided into two classes, namely: normal eyes and cataract eyes. The purpose of testing with manual testing is to determine the level of effectiveness and accuracy in this test before it is operated by the user. The results of image testing can be seen in Table 2   Based on the table above, it can be seen that the more epoch values used, the better the accuracy results from image testing. Of course not only is this a determinant of the level of accuracy of the image test results but also many things such as the number of datasets, image dimension sizes, and others but in this study, the focus is on the learning rate value and the number of epochs. The prediction results from the model obtained on the tested dataset showed very good results, namely an average accuracy of 88% which can be seen in Table 2 as follows:

CONCLUSIONS
Based on the research that has been done, it has succeeded in making a CNN model that will be used to test the image. By comparing a variety of epoch values it can be ignored that the higher the age values used, the higher the value of the model. In this study, using the epoch 50 value reached the highest value with a value of 95%. Not only does the accuracy of the model get high scores but also the examiner process shows accurate results using the CNN model which uses a total of 50 times. Based on the model that has been made it has also been successful to receive images according to the specified class. After testing accurately, 10 images achieved an average accuracy of 88%.