Mendeteksi Cyberhate pada Twitter Menggunakan Text Classification dan Crowdsourced Labeling

Dana Sulistyo Kusumo; Hadi Kurniawan Sidiq; Indra Lukmana Sardi

Dana Sulistyo Kusumo Universitas Telkom
Hadi Kurniawan Sidiq Universitas Telkom
Indra Lukmana Sardi Universitas Telkom

Keywords: Crowdsourced Labeling, CyberhateTweets, Hate Speech Detection, Text Classification

Abstract

During the 2019 presidential election campaign in Indonesia, a lot of support was made by the community with various forms of support, such as poster distribution or even content on social media. For example, in social media such as Twitter, there were many support tags during the presidential election, such as #2019gantipresiden, #2019tetapjokowi, and other hashtags related to the Indonesian presidential election. However, many hate speeches are contained in tweets with the related hashtag. Hate speech on the internet (cyberhate) could cause disputes between support groups of the two presidential candidates which cause conflicts such as riots and other actions that harm the country. This study uses the SVM algorithm to detect cyberhate that produces the best accuracy of 97%. Also, this study applies crowdsourced labeling in dataset labeling which results in 98% valid data.

References

I. Alfina, R. Mulia,M.I. Fanany, dan Y. Ekanata, “Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study,” 2017 Int. Conf. on Advanced Computer Science and Information Systems (ICACSIS), 2017, hal. 233-238.

H. Margono, X. Yi, dan G.K. Raikundalia, ”Mining Indonesian Cyberbullying Patterns in Social Networks,” Proc. of Thirty-Seventh Australasian Computer Science Conference, 2014, hal. 115-124.

S.H. Pratiwi, “Detection of Hate Speech against Religion on Tweet in the Indonesian Language Using Naïve Bayes Algorithm and Support Vector Machine,” B.Sc. Tesis, Universitas Indonesia, Jakarta, Indonesia, 2016.

I. Alfina, D. Sigmawaty, F. Nurhidayati, dan A.N. Hidayanto, “Utilizing Hashtags for Sentiment Analysis of Tweets in the Political Domain,” Proc. of the 9th Int. Conf. on Machine Learning and Computing, 2017, hal. 43-47.

A. Kahl, C. McConnell, dan W. Tsuma, “Crowdsourcing as a Tool in Conflict Prevention,” Conflict Trends, Vol. 2012, No. 1, hal. 27-34, Jan 2012.

(2018) “Pembobotan Kata atau Term Weighting TF-IDF,” [Online], https://informatikalogi.com/term-weighting-tfidf, tanggal akses: 3-Mei-2019.

J. Ramos, "Using TF-IDF to Determine Word Relevance in Document Queries," 1st Int. Conf. on Machine Learning, 2003, hal. 1-4.

A. Kontostathis, K. Reynolds, A. Garron dan L. Edwards, “Detecting Cyberbullying: Query Terms and Techniques,” Proc. of the 5th Annual ACM Web Science Conference (WebSci '13), 2013, hal. 195-204.

H. Nurrahmi dan D. Nurjanah, “Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility,” Int. Conf. on Information and Communications Technology (ICOIACT), 2018, hal 543-548.

I.E. Allen dan C.A. Seaman, (2007) "Likert Scale and Data Analyses," [Online], http://asq.org/quality-progress/2007/07/statistics/likertscaleand-data-analyses.html. tanggal akses: 20-Mei-2019.

K. Dinakar, B. Jones, C. Havasi, H. Lieberman, dan R. Picard, "Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying," ACM Transactions on Interactive Intelligent Systems, Vol. 2, No. 3, hal. 18:1-30, 2012.

Journal Metrics (January 2025)
Acceptance Rate	26%
Submission to First Decision	± 36 days
Acceptance to Publication	± 30 days
Acreditation	Sinta 2
h-index	29
5 Year Citations	3549

Username
Password
Remember me
Register