Klasifikasi Suara Untuk Memonitori Hutan Berbasis Convolutional Neural Network


Rizqi Fathin Fadhillah(1*), Raden Sumiharto(2)

(1) Program Studi Elektronika dan Instrumentasi, FMIPA UGM, Yogyakarta
(2) Departemen Ilmu Komputer dan Elektronika, FMIPA UGM, Yogyakarta
(*) Corresponding Author


Forest has an important role on earth. The need to monitor forest from illegal activities and the types of animals in there is needed to keep the forest in good condition. However, the condition of the vast forest and limited resource make direct forest monitoring by officer (human) is limited. In this case, sound with digital signal processing can be used as a tool for forest monitoring. In this study, a system was implemented to classify sound on the Raspberry Pi 3B+ using mel-spectrogram. Sounds that classified are the sound of chainsaw, gunshot, and the sound of 8 species of bird. This study also compared pretrained VGG-16 and MobileNetV3 as feature extractor, and several classification methods, namely Random Forest, SVM, KNN, and MLP. To vary and increase the number of training data, we used several types of data augmentation, namely add noise, time stretch, time shift, and pitch shift. Based on the result of this study, it was found that the MobileNetV3-Small + MLP model with combined training data from time stretch and time shift augmentation provide the best performance to be implemented in this system, with an inference duration of 0.8 seconds; 93.96% accuracy; and 94.1% precision.


Sound classification; Mel-spectogram; CNN; MLP

Full Text:



[1] Sutoyo, “Keanekaragaman Hayati Indonesia Suatu Tinjauan : Masalah dan Pemecahannya,” vol. 10, pp. 101–106, 2010.

[2] A. Damarraya and F. Bustomi Ahmad, Deforestasi Indonesia Tahun 2019-2020. 2021.

[3] Proffauna, “ProFauna’s Report WILDLIFE TRADE SURVEY,” 2009.

[4] A. Hamid and I. Amin, “Peranan Polisi Khusus Kehutanan Dalam Upaya Mencegah Dan Menanggulangi Penebangan Liar (Illegal Logging) Studi Di Kecamatan Moyo Hilir Kabupaten Sumbawa,” Ganec Swara, vol. 15, no. 2, pp. 1266–1272, 2021.

[5] L. Nanni, G. Maguolo, S. Brahnam, and M. Paci, “An ensemble of convolutional neural networks for audio classification,” Appl. Sci., vol. 11, no. 13, pp. 1–27, 2021, doi: 10.3390/app11135796.

[6] S. Jagannathan, V. Sathiesh Kumar, and D. Meganathan, “Design and implementation of in-situ human-elephant conflict management system,” J. Intell. Fuzzy Syst., vol. 36, no. 3, pp. 2005–2013, 2019, doi: 10.3233/JIFS-169912.

[7] A. Howard, W. Wang, G. Chu, L. Chen, B. Chen, and M. Tan, “Searching for MobileNetV3,” Int. Conf. Comput. Vis., pp. 1314–1324, 2019.

[8] K. J. Piczak, “ESC: Dataset for environmental sound classification,” MM 2015 - Proc. 2015 ACM Multimed. Conf., pp. 1015–1018, 2015, doi: 10.1145/2733373.2806390.

[9] J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” MM 2014 - Proc. 2014 ACM Conf. Multimed., no. October, pp. 1041–1044, 2014, doi: 10.1145/2647868.2655045.

[10] S. Kahl et al., “Overview of BirdCLEF 2021: Bird call identification in soundscape recordings,” CEUR Workshop Proc., vol. 2936, pp. 1437–1450, 2021.

[11] S. Guha, A. Das, P. K. Singh, A. Ahmadian, N. Senu, and R. Sarkar, “Hybrid feature selection method based on harmony search and naked mole-rat algorithms for spoken language identification from audio signals,” IEEE Access, vol. 8, no. March 2021, pp. 182868–182887, 2020, doi: 10.1109/ACCESS.2020.3028121.

[12] S. Kahl, T. Wilhelm-Stein, H. Klinck, D. Kowerko, and M. Eibl, “Recognizing Birds from Sound - The 2018 BirdCLEF Baseline System,” 2018, [Online]. Available: http://arxiv.org/abs/1804.07177.

[13] R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, “Convolutional neural networks: an overview and application in radiology,” Insights Imaging, vol. 9, no. 4, pp. 611–629, 2018, doi: 10.1007/s13244-018-0639-9.

[14] J. Geldmacher, C. Yerkes, and Y. Zhao, “Convolutional neural networks for feature extraction and automated target recognition in synthetic aperture radar images,” CEUR Workshop Proc., vol. 2819, pp. 86–91, 2020.

[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, 2015.

DOI: https://doi.org/10.22146/ijeis.79536

Article Metrics

Abstract views : 7049 | views : 7014


  • There are currently no refbacks.

Copyright (c) 2023 IJEIS (Indonesian Journal of Electronics and Instrumentation Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :
IJEIS (Indonesian Journal of Electronics and Instrumentations Systems)
ISSN 2088-3714 (print); ISSN 2460-7681 (online)
is a scientific journal the results of Electronics
and Instrumentations Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijeis.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijeis

View My Stats1
View My Stats2