Hate Speech Detection in Indonesian Twitter using Contextual Embedding Approach


Guntur Budi Herwanto(1*), Annisa Maulida Ningtyas(2), I Gede Mujiyatna(3), Kurniawan Eka Nugraha(4), I Nyoman Prayana Trisna(5)

(1) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(2) Department of Health Information and Services, Universitas Gadjah Mada Yogyakarta, Indonesia
(3) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(4) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(5) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author


Hate speech develops along with the rapid development of social media. Hate speech is often issued due to a lack of public awareness of the difference between criticism and statements that might contribute to this crime. Therefore, it is very important to do early detection of sentences that will be written before causing a criminal act due to public ignorance. In this paper, we use the advancement of deep neural networks to predict whether a sentence contains a hate speech and abusive tone. We demonstrate the robustness of different word and contextual embedding to represent the semantic of hate speech words. In addition, we use a document embedding representation via a recurrent neural networks with gated recurrent unit as the main architecture to provide richer representation. Compared to syntactic representation of the previous approach, the contextual embedding in our model proved to give a significant boost on the performance by a significant margin.


hate speech; natural language processing; deep neural network; contextual embedding; recurrent neural network

