Bidirectional Long Short Term Memory Method and Word2vec Extraction Approach for Hate Speech Detection

https://doi.org/10.22146/ijccs.51743

Auliya Rahman Isnain(1*), Agus Sihabuddin(2), Yohanes Suyanto(3)

(1) Master Program of Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author

Abstract


Currently, the discussion about hate speech in Indonesia is warm, primarily through social media. Hate speech is communication that disparages a person or group based on characteristics such as (race, ethnicity, gender, citizenship, religion and organization). Twitter is one of the social media that someone uses to express their feelings and opinions through tweets, including tweets that contain expressions of hatred because Twitter has a significant influence on the success or destruction of one's image.

This study aims to detect hate speech or not hate Indonesian speech tweets by using the Bidirectional Long Short Term Memory method and the word2vec feature extraction method with Continuous bag-of-word (CBOW) architecture. For testing the BiLSTM purpose with the calculation of the value of accuracy, precision, recall, and F-measure.

The use of word2vec and the Bidirectional Long Short Term Memory method with CBOW architecture, with epoch 10, learning rate 0.001 and the number of neurons 200 on the hidden layer, produce an accuracy rate of 94.66%, with each precision value of 99.08%, recall 93, 74% and F-measure 96.29%. In contrast, the Bidirectional Long Short Term Memory with three layers has an accuracy of 96.93%. The addition of one layer to BiLSTM increased by 2.27%.

Keywords


Hate Speech; LSTM; BiLSTM; Word2vec; CBOW; Skipgram; Twitter

Full Text:

PDF


References

[1]      A. K. B. A. Putra, M. A. Fauzi, B. D. Setiawan, and E. Setiawati, “Identifikasi Ujaran Kebencian Pada Facebook Dengan Metode Ensemble Feature Dan Support Vector Machine,” J. Pengemb. Teknol. Inf. Dan Ilmu Komput., vol. 2, no. 12, 2018.

[2]      M. M. Munir, M. A. Fauzi, and R. S. Perdana, “Implementasi Metode Backpropagation Neural Network berbasis Lexicon Based Features dan Bag of Words Untuk Identifikasi Ujaran Kebencian Pada Twitter,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 10, pp. 3182–3191, 2018.

[3]      G. A. Buntoro, “Analisis Sentimen Hatespeech Pada Twitter Dengan Metode Naïve Bayes Classifier Dan Support Vector Machine,” J. Din. Inform., vol. 5, no 2, September 2016.

[4]      I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate Speech Detection in the Indonesian Language : A Dataset and Preliminary Study,” in 9th Int. Conf. Adv. Comput. Sci. Inf. Syst. (ICACSIS 2017).

[5]      N. Chetty and S. Alathur, “Aggression and Violent Behavior Hate speech review in the context of online social networks,” Aggress. Violent Behav., vol. 40, no. March 2017, pp. 108–118, 2018.

[6]      M. C. Anam and M. Hafiz, “Surat Edaran Kapolri Tentang Penanganan Ujaran Kebencian ( Hate Speech ) dalam Kerangka Hak Asasi Manusia,” J. Keamanan Nas., vol. I, 2015.

[7]      Z. Su, H. Xu, D. Zhang, and Y. Xu, “Chinese sentiment classification using a neural network tool—Word2vec,” in Multisensor Fusion and Information Integration for Intelligent Systems (MFI), 2014 International Conference on, pp. 1–6, 2014.

[8]      D. Li and J. Qian, “Text Sentiment Analysis Based on Long Short-Term Memory,” 2016 First IEEE Int. Conf. Comput. Commun. Internet, pp. 471–475, 2016.

[9]      A. Hassan and A. Mahmood, “Efficient Deep Learning Model for Text Classification Based on Recurrent and Convolutional Layers Efficient Deep Learning Model for Text Classification Based on Recurrent and Convolutional Layers,” IEEE Access, no. February, 2018.

[10]    E. Kang, “Long Short-Term Memory (LSTM): Concept,” 2017. [Online]. Available: https://medium.com/@kangeugine/long-short-term-memory-lstm-concept-cb3283934359. [Accessed: 18-Jan-2019].

[11]    O. Levy and Y. Goldberg, “Neural word embedding as implicit matrix factorization,” in Advances in neural information processing systems, pp. 2177–2185, 2014.

[12]    A. M. Ertugrul and P. Karagoz, “Movie Genre Classification from Plot Summaries Using Bidirectional LSTM,” Proc. - 12th IEEE Int. Conf. Semant. Comput. ICSC 2018, vol. 2018-Janua, pp. 248–251, 2018.



DOI: https://doi.org/10.22146/ijccs.51743

Article Metrics

Abstract views : 653 | views : 446

Refbacks

  • There are currently no refbacks.




Copyright (c) 2020 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



Copyright of :
IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
ISSN 1978-1520 (print); ISSN 2460-7258 (online)
is a scientific journal the results of Computing
and Cybernetics Systems
A publication of IndoCEISS.
Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281
Fax: +62274 555133
email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs



View My Stats1
View My Stats2