Dataset Splitting Techniques Comparison For Face Classification on CCTV Images

Ade Nurhopipah(1*), Uswatun Hasanah(2)

(1) Department of Informatics, Universitas Amikom Purwokerto
(2) Departement of Information Technology, Universitas Amikom Purwokerto
(*) Corresponding Author


The performance of classification models in machine learning algorithms is influenced by many factors, one of which is dataset splitting method. To avoid overfitting, it is important to apply a suitable dataset splitting strategy. This study presents comparison of four dataset splitting techniques, namely Random Sub-sampling Validation (RSV), k-Fold Cross Validation (k-FCV), Bootstrap Validation (BV) and Moralis Lima Martin Validation (MLMV). This comparison is done in face classification on CCTV images using Convolutional Neural Network (CNN) algorithm and Support Vector Machine (SVM) algorithm. This study is also applied in two image datasets. The results of the comparison are reviewed by using model accuracy in training set, validation set and test set, also bias and variance of the model. The experiment shows that k-FCV technique has more stable performance and provide high accuracy on training set as well as good generalizations on validation set and test set. Meanwhile, data splitting using MLMV technique has lower performance than the other three techniques since it yields lower accuracy. This technique also shows higher bias and variance values and it builds overfitting models, especially when it is applied on validation set.


Random Sub-sampling; Bootstrap; Moralis Lima Martin; k-Fold Cross Validation

Full Text:



[1] X. Ying, “An Overview of Overfitting and its Solutions,” J. Phys. Conf. Ser., vol. 1168, no. 2, 2019.

[2] B. Genç and H. Tunç, “Optimal training and test sets design for machine learning,” Turkish J. Electr. Eng. Comput. Sci., vol. 27, no. 2, pp. 1534–1545, 2019.

[3] Y. Xu and R. Goodacre, “On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning,” J. Anal. Test., vol. 2, no. 3, pp. 249–262, 2018.

[4] Suyanto, Machine Learning Tingkat Dasar dan Lanjut. Bandung: Informatika, 2018.

[5] M. J. Lakshmi and S. N. Rao, “Effect of K-fold cross validation on Mri brain images using support vector machine algorithm,” Int. J. Recent Technol. Eng., vol. 7, no. 6, pp. 301–307, 2019.

[6] M. R. Murty, S. K. Raju, M. V. Rao, and S. C. Satapathy, “Support Vector Machine with K-fold Cross Validation Model for Software Fault Prediction,” Int. J. Pure Appl. Math., vol. 118, no. 20, pp. 321–334, 2018.

[7] R. C. Sharma, K. Hara, and H. Hirayama, “A Machine Learning and Cross-Validation Approach for the Discrimination of Vegetation Physiognomic Types Using Satellite Based Multispectral and Multitemporal Data,” Scientifica (Cairo)., vol. 2017, 2017.

[8] A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, “Machine learning algorithm validation with a limited sample size,” PLoS One, vol. 14, no. 11, pp. 1–20, 2019.

[9] H. B. Moss, D. S. Leslie, and P. Rayson, “Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models,” pp. 2978–2989, 2018.

[10] C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “An Empirical Comparison of Model Validation Techniques for Defect Prediction Models,” IEEE Trans. Softw. Eng., vol. 43, no. 11, pp. 1091–1094, 2017.

[11] C. L. M. Morais, M. C. D. Santos, K. M. G. Lima, and F. L. Martin, “Improving data splitting for classification applications in spectrochemical analyses employing a random-mutation Kennard-Stone algorithm approach,” Bioinformatics, vol. 35, no. 24, pp. 5257–5263, 2019.

[12] C. A. Ramezan, T. A. Warner, and A. E. Maxwell, “Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification,” Remote Sens., vol. 11, no. 2, 2019.

[13] M. Schnaubelt, “A comparison of machine learning model validation schemes for non-stationary time series data,” FAU Discussion Papers in Economics, vol. 11. Friedrich-Alexander-Universität Erlangen-Nürnberg, Institute for Economics, Erlangen, Erlangen, 2019.

[14] A. Nurhopipah and A. Harjoko, “Motion Detection and Face Recognition For CCTV Surveillance System,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 12, no. 2, p. 107, 2018.

[15] Y. Wong, S. Chen, S. Mau, C. Sanderson, and B. C. Lovell, “Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., pp. 74–81, 2011.


Article Metrics

Abstract views : 4464 | views : 3541


  • There are currently no refbacks.

Copyright (c) 2020 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133 |

View My Stats1
View My Stats2