Mendeteksi Cyberhate pada Twitter Menggunakan Text Classification dan Crowdsourced Labeling
Abstract
During the 2019 presidential election campaign in Indonesia, a lot of support was made by the community with various forms of support, such as poster distribution or even content on social media. For example, in social media such as Twitter, there were many support tags during the presidential election, such as #2019gantipresiden, #2019tetapjokowi, and other hashtags related to the Indonesian presidential election. However, many hate speeches are contained in tweets with the related hashtag. Hate speech on the internet (cyberhate) could cause disputes between support groups of the two presidential candidates which cause conflicts such as riots and other actions that harm the country. This study uses the SVM algorithm to detect cyberhate that produces the best accuracy of 97%. Also, this study applies crowdsourced labeling in dataset labeling which results in 98% valid data.
References
I. Alfina, R. Mulia,M.I. Fanany, dan Y. Ekanata, “Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study,” 2017 Int. Conf. on Advanced Computer Science and Information Systems (ICACSIS), 2017, hal. 233-238.
H. Margono, X. Yi, dan G.K. Raikundalia, ”Mining Indonesian Cyberbullying Patterns in Social Networks,” Proc. of Thirty-Seventh Australasian Computer Science Conference, 2014, hal. 115-124.
S.H. Pratiwi, “Detection of Hate Speech against Religion on Tweet in the Indonesian Language Using Naïve Bayes Algorithm and Support Vector Machine,” B.Sc. Tesis, Universitas Indonesia, Jakarta, Indonesia, 2016.
I. Alfina, D. Sigmawaty, F. Nurhidayati, dan A.N. Hidayanto, “Utilizing Hashtags for Sentiment Analysis of Tweets in the Political Domain,” Proc. of the 9th Int. Conf. on Machine Learning and Computing, 2017, hal. 43-47.
A. Kahl, C. McConnell, dan W. Tsuma, “Crowdsourcing as a Tool in Conflict Prevention,” Conflict Trends, Vol. 2012, No. 1, hal. 27-34, Jan 2012.
(2018) “Pembobotan Kata atau Term Weighting TF-IDF,” [Online], https://informatikalogi.com/term-weighting-tfidf, tanggal akses: 3-Mei-2019.
J. Ramos, "Using TF-IDF to Determine Word Relevance in Document Queries," 1st Int. Conf. on Machine Learning, 2003, hal. 1-4.
A. Kontostathis, K. Reynolds, A. Garron dan L. Edwards, “Detecting Cyberbullying: Query Terms and Techniques,” Proc. of the 5th Annual ACM Web Science Conference (WebSci '13), 2013, hal. 195-204.
H. Nurrahmi dan D. Nurjanah, “Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility,” Int. Conf. on Information and Communications Technology (ICOIACT), 2018, hal 543-548.
I.E. Allen dan C.A. Seaman, (2007) "Likert Scale and Data Analyses," [Online], http://asq.org/quality-progress/2007/07/statistics/likertscaleand-data-analyses.html. tanggal akses: 20-Mei-2019.
K. Dinakar, B. Jones, C. Havasi, H. Lieberman, dan R. Picard, "Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying," ACM Transactions on Interactive Intelligent Systems, Vol. 2, No. 3, hal. 18:1-30, 2012.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.