Dataset Indonesia untuk Analisis Sentimen

Ridi Ferdiana; Fahim Jatmiko; Desi Dwi Purwanti; Artmita Sekar Tri Ayu; Wiliam Fajar Dicka

Ridi Ferdiana Universitas Gadjah Mada
Fahim Jatmiko Microsoft Innovation Center
Desi Dwi Purwanti Universitas Gadjah Mada
Artmita Sekar Tri Ayu Universitas Gadjah Mada
Wiliam Fajar Dicka Universitas Gadjah Mada

Keywords: Dataset, Analisis Teks, Analisis Sentimen,, Natural Language Processing

Abstract

This paper present a text dataset which can be used in the field of text analysis, especially sentiment analysis. This dataset covers the primary data which consists of 10,806 lines of Indonesian text data originated from Twitter social media, which categorized into three categories that are positive, negative, and neutral; and the raw data which consists of 454,559 lines of unprocessed data. Other than that, on the labeled data, the data is cleaned by removing many kind of noises in the data, such as symbols or urls. In this paper, the presented dataset is tested using a sentiment analysis model to make sure that this dataset is suitable to be used in the field of text analysis. The testing is done by measuring the model accuracy which is trained using this dataset and then comparing it to other model which is trained using already published dataset. After testing the data using various algorithm, such as SVM, KNN, and SGD, the accuracy result between our data and the comparison data are more or less equal with around 4% to 12% differences in accuracy, and prove that the dataset presented in this paper is feasible to be used in sentiment analysis. Dataset can be downloaded from link at conclusion section.

References

G. Vinodhini dan R. M. Chandrasekaran, "Sentiment Analysis and Opinion Mining: A Survey," Int. J. of Advanced Research in Computer Science and Software Engineering, Vol. 2, No. 6, hal. 282-292, 2012.

M. Nabil, M. Aly, dan A.F. Atiya, "ATSD: Arabic Sentiment Tweets Dataset," Conf. on Empirical Methods in Natural Language Processings, 2015, hal. 2515–2519.

T.A. Lee, D. Moeljadi, Y. Miura, dan T. Ohkuma, "Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets," Proc. 12th Workshop on Asian Language Resources, 2016, hal. 123-131.

M.S. Saputri, R. Mahendra, dan M. Adriani, "Emotion Classification on Indonesian Twitter Dataset," Int. Conf. on Asian Language Processing, 2018, hal. 90-95.

H. Wijaya, A. Erwin, A. Soetomo, dan M. Galinium, "Twitter Sentiment Analysis and Insight for Indonesian Mobile Operators," Information Systems Int. Conf., 2013, hal. 367-372.

O. Somantri, "Text Mining Untuk Klasifikasi Kategori Cerita Pendek Menggunakan Naive-Bayes (NB)," Jurnal Telematika, Vol. 12, No. 1, hal. 7-12, 2017.

Franky dan R. Manurung, "Machine Learning-based Sentiment Analysis of Automatic Indonesian Translations of English Movie Reviews," Proc. of the Int. Conf. on Advanced Computational Intelligence and Its Applications 2008 (ICACIA 2008), 2008, hal. 1-6.

S.M. Mohammad, M. Salameh, F. Bravo-Marquez, dan S. Kiritchenko, "SemEval-2018 Task 1: Affects in Tweets," Proc. of the 12th Int. Workshop on Semantic Evaluation (SemEval-2018), 2018, hal. 1-17.

E. Haddi, X. Liu, dan Y. Shi, "The Role of Text Pre-processing in Sentiment Analysis," Procedia Computer Science, Vol. 17, hal. 26-32, 2013.

R.H. Mohammad dan A. Ahmad, "Sentiment Analysis on Twitter Data using KNN and SVM," Int. J. of Advanced Computer Science and Applications, Vol. 8, No. 6, hal. 19-25, 2017.

Username
Password
Remember me
Register