Estimation of Average Car Speed Using the Haar-Like Feature and Correlation Tracker Method

The speed of a car traveling on the road can generally be estimated by using a speed gun. Efforts are needed to use CCTV (closed circuit television) as a tool that can be used to estimate the speed of the car so as to ease the burden on the road operator to estimate the speed of the car. This study discusses the estimated average speed of the car with the Haar-like Feature method used to detect the car, then the detection results are tracked using Correlatin Tracker to track the movement of objects that have been detected and calculate the distance of movement from the car, so that the speed of the car detected in video can be estimated. The results of the estimated average speed compared with the results of taking speed with a speed gun so that an error is obtained by MAE testing of 5,55 km / hour and the resulting standard deviation is 4,61 km / hour, thus it can be concluded that the system is made valid and can be used by road organizers to monitor the average speed of a car. Keywords— Haar-like Feature, Car detection, Tracking, Average Estimation of Speed  ISSN (print): 1978-1520, ISSN (online): 2460-7258 IJCCS Vol. 14, No. 4, October 2020 : 353 – 364 354


INTRODUCTION
Discipline is important in driving, including regulating driving speed so as not to endanger other road users. One of the factors to determine the maximum vehicle speed limit can be seen in the speed limit indicator that is on the highway. Lack of supervision by officers who control driving speed limits results in motorists disregarding the driving speed limits. Speed measuring devices such as speed guns are useful for measuring the speed of vehicles passing on the road, however, the use of speed guns is still limited to watching vehicles that pass on the freeway. Based on the Law of the Republic of Indonesia Number 22 Year 2009 concerning Traffic and Road Transportation, the speed limit for the freeway ranges from 60 Km / hour to 100 Km / hour. As for the vehicle speed on urban roads not more than 50 Km / hour.
Vehicle detection is one of the important problems when trying to estimate vehicle speed, one of the methods that are often used is Haar-like Feature. The Haar-like Feature method is used to detect car plates [1] getting an accuracy of 94% with testing data of 3,000 car plate images. When it is used to detect the type of vehicle that is small, medium and large vehicles [2] get an accuracy of 77.8% when the road conditions are quiet and when the road conditions are busy accuracy becomes 28.8%. [3] Haar-like Feature is used to detect vehicles in the frame image on toll roads getting an average accuracy of 92.3%. Therefore it is necessary to have a tracking feature when the vehicle is successfully detected to reduce the error rate.
Research conducted by [4] and [5] use the same method namely Haar-like Cascade used to detect vehicles, the difference between the three studies is in the research of [4] Vehicle detection is used for Counting when the vehicle crosses imaginary lines in the frame, in research conducted by [5] the detection results are added with stereo sub-pixel matching to track vehicle movement which is then used to calculate vehicle speed. The results obtained when used for counting obtained an accuracy of 78% in the research of [4]. The results obtained when used to measure vehicle speed get an MAE of 2.46 in a study conducted by [5]. Machine learning in [6] and [7] research is used to estimate vehicle speed, each researcher uses different methods including a combination of SVM, CNN and deep learning. Research conducted by [6] classification and estimation of vehicle speed by utilizing a car front camera. Classifier Training is done by changing the image size to 64 × 64 pixels and then extracting its features with the Histogram of Gradients to be used as a classification. The combination of them are Linear SVM, Quadratic SVM, Cubic SVM, Fine Gaussian SVM, Medium Gaussian SVM and Coarse Gaussian SVM. The best results from the combination that has been used is Cubic SVM with an accuracy of 94.29%. Whereas [7] use a deep neural network that is used for vehicle detection and tracking. Network architecture with 9 convolutional layers, 4 inception modules, one SPP layer and 2 fully connect layers. A deep neural network implementation can be implemented to detect and track vehicles in real time.

Traffic
Defined as the movement of vehicles and people in the road traffic space, while what is meant by road traffic space is infrastructure intended for movement of vehicles, people and/or goods in the form of roads and supporting facilities. The government has a goal to realize traffic and road transportation that is safe, safe, fast, smooth, orderly and orderly, convenient and efficient through traffic management and traffic engineering. According to the Law of the Republic of Indonesia

Haar-Like Feature
Vehicle detection using Haar-like Feature, [2] Haar-like Feature is a classifier that is trained with several sample images of an object. The classifier is trained using the Adaboost algorithm. In this case, the image sample used is the image of the vehicle, in the form of the side view (left and right), front and rear. The size of the image used for training must be the same (for example 20x20), where later this will be a positive sample. Negative samples are images of different objects but still have the same size. This collection of images will produce a collection of object features called cascades. The classifier will produce a value of "1" if the inserted image contains recognized objects and the value of "0" if no objects are recognized. Figure 3 is some of the Haar features that are used to detect vehicles.

Figure 1 Haar Like Feature
The results of applying each feature to a particular image region are produced through the number of pixels located inside the black rectangle of the feature reduced by the number of pixels overlapping with the white rectangle. This rectangle is defined through the top left coordinates x, y, width w and height h. The total pixels in the rectangular area of ri are represented by RecSum (ri). (1) Equation (1) has N, Wi and ri values chosen randomly, depending on the object identified. RecSum is the sum of the intensities in each upright or rotated Haar feature inside the detection window.
In recognizing an object, cascade will ignore the image area that does not have an object that meets the criteria. This is very helpful in improving the performance of the classifier. In effect, the possibility of object detection errors decreases and detection accuracy increases.

Discriminative Correlation Filters for Multidimensional Features
The discriminative correlation filters method is commonly used for tracking, some filters are used in multi-dimensional correlation filters, which are using a 1-dimensional filter to estimate scale, 2-dimensional filter for translation and 3-dimensional filter for localized spacescale localization of the target [8].
Map feature that is used to represent images on the d-dimension. f is a square patch on the target, extracted from the map feature. Dimension numbers are indicated by from to . The aim is to find the optimal filter correlation, consisting of one filter per dimension feature. can achieve optimal by minimizing the cost function, as shown in Equation (2) (2) Variable g is the correlation result obtained from the training data with variable f. The parameter is used to control the chaotic regularization term. Note to note is that Equation (2) only takes into account one training sample in case t = 1. As a solution for Equation (2) is shown in Equation (3).
The regularization parameter reduces the problem of frequently appearing zero in the f spectrum, which will cause division by zero. Optimal filters can be obtained by minimizing output errors in all training patches. However, the solution requires a system of linear equations d × d per pixel. To get a strong estimate it is necessary to update the numerator and denominator of the correlation filter in Equation (4).
is the learning rate parameter. The correlation score of y in the z region of the feature map can be calculated using Equation (5). The targeted new state will be found by maximizing the score of y. (5)

Estimated Average Speed
the speed of the moving object can be calculated by finding the total distance traveled from the beginning of the moving object to the end of the moving object divided by the time taken to make the move [9], can be determined by using Equation (6) below (6) Where: (velocity) = speed achieved in displacement.
= the time taken to make the move. Furthermore, based on the speed equation Equation (6), to get the results of the calculation of the speed of the vehicle from the results of the video recording, the distance value is obtained based on the distance of the reference taken from the beginning of the moving object to the end of the moving object that has been determined and for the time obtained from the number of frames needed when the object moves with the fps value of the video. Then it can be seen in Equation (7) (7) Where: = The distance specified in the study. = The number of frames for one second during video recording. The process of finding moving objects in a sequence of frames is known as tracking. This tracking can be done by using object feature extraction and detecting moving objects or objects in the frame. By using the value of the position of objects in each frame we can calculate the position and speed of moving objects. The distance traveled by the object is determined using the center of the bounding box. Distances are calculated using the Euclidean distance formula. The Euclidean distance formula in two dimensions is the movement of a symbol between one point to another on the X-axis and Y-axis can be seen in Equation (8

Root Mean Square Error (RMSE)
MSE is used to measure the average square of errors or deviations and the calculation of root-based errors is RMSE also called the root mean squared deviation (RSMD). These individual differences are also called residues and RMSE functions to add them into a single measure of predictive power [10]. (9) Where: N = Number of Samples = Actual value i = Estimated value i

Mean Absolute Error (MAE)
MAE in its research is used to measure the normal size of the error in determining predictions, without considering the relationship. The normal MAE is used to test the total contrast between prediction and original perception where each individual difference has a weight. where represents the results of the experiment and represents the estimated variable at the time i [10]. (10) Where: N = Number of Samples = Actual value i = Estimated value i i

Standard Deviation
Standard deviation (SD) or also called standard deviation is a measurement of variability that best meets the requirements because it does not discard extreme values, calculate deviations from the average and account for account deviations from positive and negative values. Standard deviation can basically be said as the root of the sum of squares deviations divided by the amount of data in the distribution of values [9]. This section discusses the results of the system that has been made along with testing the difference in error from the estimated vehicle speed obtained by the system compared with the vehicle speed data retrieval by using a speed gun tool.

Effect of Video Size
Video size has a considerable influence on vehicle detection. The original size of the CCTV video obtained was 1280 × 720, in this research experiment the size changes can be seen in Table 2  Table 2 Effect of video size on processing speed Size Processing time per frame FPS 1280 × 720 0,494 seconds 2 fps 854 × 480 0,383 seconds 3 fps 320 × 180 0,087 seconds 11 fps Figure 2 Detection results with frame size 320 × 180 The video used in this study has a speed of 25 fps, which means one video has a processing speed of 0.04 seconds per frame. Thus the closest video is 320 × 180, as in Figure 2. However, with a small video size, the vehicle will be difficult to detect. While the size of 1280 × 720 takes too long to do the process, the size used in this study is 854 × 480.

Effect of ROI
ROI is a way of taking certain areas in an image that aims to make it easier to analyze and reduce the size of image storage. Video input used in this study is a two-lane road with four lanes, while only one lane left is taken to be analyzed. The process to get ROI is to close a portion of the video with white, can be seen in Figure 4. The vehicle detection results in Figure 3 show a false positive/false detection located on the far left detector. ROI determination is also used to take only the parts that are needed. As mentioned in the problem limitation, this study only uses one lane to estimate velocity, as in Figure 7.

Figure 4 Video results after ROI is determined
The results of applying ROI to the video can be shown in Table 4. ROI used in this study aims to reduce detection errors, as in Figure 4 which at the beginning before using ROI there were errors in detection. Table 3 The results of the influence of ROI Video Input Processing time per frame Without ROI 0,280 seconds ROI 0,131 seconds ROI can also reduce the processing time which initially without using ROI takes 0.280 to 0.131 seconds for each frame so the performance has increased by 53.21%.

Tracking Testing
Tracking testing is carried out with two scenarios, namely without tracking and using tracking to estimate vehicle speed. testing in experiments without using tracking by giving two reference lines is used to detect when a vehicle is passing to record changes in time from the first and second reference lines. As in Figure 5. When a vehicle crosses the first (bottom) line, the crossing time will be recorded as the start time, when the vehicle crosses the second (top) line then the time will be recorded as the end time. From the results of the recorded time used as a reference separately to calculate the estimated speed of the vehicle. The reference line has an actual distance of 13 meters, for example, the time needed to cross is 1.25 seconds so to calculate the estimated speed to be as follows: The results obtained are 10.4 meters / second, from these results are multiplied by 3.6 to make it to km/hour which is 37 km/hour. However, in experiments without using tracking there are some problems that occur such as the id of the vehicle being swapped can be seen in Figure  7 which causes an error that occurs very influential in calculating the estimated speed of a passing vehicle.

Figure 6 Exchangeable Vehicle ID
The swapping of the vehicle id can result in a wrong time recording the time the vehicle passes, as in Figure 8 the black car with the initial id 0 changes to 1 as well as that experienced by the red car with the initial id is 1 changes to 0. Thus a car with id 1 will cross the reference line faster than it should, for example, a car with id 1 that should cross two reference lines takes as much as 1 second can change to 0.15 seconds because the car id is exchanged with a car that is closer to the final reference line . Then the calculation of the estimated speed can be illustrated as follows.
The speed should be The speed of the ID is swapped Problems other than exchanged IDs that may occur in trials without tracking is when a detection fails when the vehicle is passing will result in an estimated speed of the vehicle being 0, because when the vehicle crosses the reference line one or two detection failures occur as in Figure 7.  The median generated is based on Table 4 for | A-C | is 4 whereas for | B-C | is 34. The actual speed results are taken using a speed gun that is assisted by Sukoharjo DISHUB officer, the speed gun used in actual data collection has a fault tolerance taking the speed of approximately 2 km/hour, the speed taken is only vehicles that are driving at a constant speed or not braking.

Speed Evaluation
This stage is carried out to see how optimal the system has been made by testing to determine performance by using several methods, namely RMSE, MAE and Standard deviation. MAE and RMSE can be used together to identify variations in errors in a prediction set of average speeds on vehicles that have been recorded with the help of a speed gun. RMSE will always be greater or equal to MAE; the greater the difference between RMSE and MAE, the greater the variance in the error of each individual in the sample. If RMSE = MAE, then all errors on large scale have in common. Based on Table 4, the results of performance calculations can be seen in Table 5. The results of the test have quite a comparison between the use of the tracking and without tracking. According to [11] in his book for the standard deviation of error tolerance on urban roads with 2 lanes is 7.7 km/hour.

CONCLUSIONS
The error tolerance limit for measuring vehicle speed is 7.7 km / hour according to [11] and the standard deviation generated using tracking when estimating the average speed is 4.61 km / hour so that it still meets the error tolerance limit.
The use of tracking in the estimation of the average vehicle has a good performance and can help detection when there is a failure of detection and car id that may be exchanged when more than one car crosses the video, so it can be concluded that the use of tracking can help the failure of detection on a car that is being drove off.
Research conducted by [Hua et all] with the theme of estimating vehicle speeds using deep learning resulted in an RMSE of 12,109 compared with research that had been conducted to obtain an RMSE of 7.14.