Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM

Wening Mustikarini; Risanuri Hidayat; Agus Bejo

doi:10.22146/ijitee.49426

Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM

https://doi.org/10.22146/ijitee.49426

Wening Mustikarini⁽¹⁾, Risanuri Hidayat^(2*), Agus Bejo⁽³⁾

(1) Universitas Gadjah Mada
(2) Universitas Gadjah Mada
(3) Universitas Gadjah Mada
(*) Corresponding Author

Abstract

Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.

Keywords

Automatic Speech Recognition; Indonesian Language; MFCC; SVM

Full Text:

PDF

References

K. Precoda, “Non-mainstream Languages and Speech Recognition: Some Challenges,” CALICO Journal, Vol. 21, No. 2, pp. 229-243, 2004.

E. Cahyaningtyas and D. Arifianto, “Development of Under-resourced Bahasa Indonesia Speech Corpus,” Asia-Pacific Signal and Information
Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 1097-1101.

A. Winursito, “Peningkatan Akurasi Pengenalan Tutur Vokal Bahasa Indonesia Menggunakan Algoritma MFCC PCA/SVD,” Magister thesis, Universitas Gadjah Mada, Yogyakarta, Indonesia, 2018.

S.S. Stevens, J. Volkmann, and E.B. Newman, “A Scale for the Measurement of the Psychological Magnitude Pitch,” Journal of the Acoustical Society of America, Vol. 8, No. 3, pp. 185-190, 1937.

T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York, USA: Springer, 2001.

DOI: https://doi.org/10.22146/ijitee.49426