Pengaruh Phrase Detection dengan POS-Tagger terhadap Akurasi Klasifikasi Sentimen menggunakan SVM

Hermawan Arief Putranto; Onny Setyawati; Wijono

Hermawan Arief Putranto Universitas Brawijaya
Onny Setyawati Universitas Brawijaya
Wijono Universitas Brawijaya

Keywords: analisis sentimen, deteksi frasa, HMM POS-Tagger, ROC, Support Vector Machine, tokenisasi

Abstract

Sentiment analysis or opinion mining, which is one of the application of Natural Language Processing (NLP), aims to find a method to facilitate human in communicating with a computer using their common language. To simplify the process of understanding human language, there are three important stages that must be carried out by a computer, which are tokenizing, stemming and filtering. The tokenizing that breaks down the sentence into a single word will make the computer assume all words (token) are the same. If there is a phrase formed from one of unimportant words, which is happened to be in the stoplist, the phrase will be deleted. Solution for the aforementioned problem is tokenizing based on phrase detection using Hidden Markov Model (HMM) POS-Tagger to improve classification performance using Support Vector Machine (SVM).
With this approach, computer will be able to distinguish a phrase from others, then store the phrase into a single entity. There is an increase in accuracy by approximately 6% on Dataset I and 3% on Dataset II in the classification process using phrase detection, due to reduction of missing features that usually occurs in the filtering process. In addition, the detection of the phrase-based approach also produces the most optimal classification model, as seen from the ROC value that reaches 0.897.

References

N. Saputra, T. B. Adji dan A. E. Permanasari, “Analisis Sentimen Data Presiden Jokowi dengan Preprocessing Normalisasi dan Stemming menggunakan metode Naive Bayes dan SVM,” Jurnal Dinamika Informatika, vol. 5, no. 1, 2015.

Harlili dan Y. Wibisono, "Sistem Analisis Opini Microblogging Berbahasa Indonesia", Bandung: UPI Bandung, 2013.

Helmy, “Pencarian Pola Akses Pengunjung Toko Online Menggunakan Weighted Graph Web Usage Mining,” JNTETI, vol. 3, no. 1, 2014.

G. A. Buntoro, "Sentiment Analysis Tweeter dengan Kombinasi Lexicon Based dan Double Propagation", CITEE, 2014.

W. H. Ian, E. Frank dan H. A. Mark, Data Mining: Practical Machine Learning Tools and Techniques, Elsevier, 2011.

A. S. Nugroho, A. B. Witarto dan D. Handoko, “Application of Support Vector Machine in Bioinformatics”, Proceeding of Indonesian Scientific Meeting, Gifu, Japan, 2003.

Tantiny, B. Susanto dan W. Hapsari, “Klasifikasi Email dengan Menggunakan Metode Naive Bayesian Studi Kasus: Mailing List www.tux.org,” Jurnal Informatika, 2007.

J. Putstejovsky dan A. Stubbs, Natural Language Annotation and Machine Learning, O'Riley, 2012.

T. Chandrawati, Pengembangan Part of Speech Tagger untuk Bahasa Indonesia Berdasarkan Metode Conditional Random Fields dan Transformation Based Learning, Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2008.

Y. Yusuf, S. Nurdiati dan B. P. Silalahi, “Analisis Pembentukan Pola Graf pada Kalimat Bahasa Indonesia menggunakan metode Knowledge Graph,” Lingua Jurnal Bahasa dan Sastra, vol. 10, No. 1, 2014.

V. Chandani, R. S. Wahono dan Purnomo, "Komparasi Algoritma Klasifikasi Machine Learning Dan Feature Selection pada Analisis Sentimen Review Film", Journal of Intelligent Systems, vol. 1, 2015.

I. F. Rozi, S. H. Pramono dan E. A. Dahlan, "Implementasi Opinion Mining (Analisis Sentimen) untuk Ekstraksi Data Opini Publik pada Perguruan Tinggi", EECIS, vol. 6, 2012.

B. Pang dan L. Lee, A sentimental education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, Itacha, New York: Cornell University, 2002.

A. A. Armana, A. B. Putra, A. Purwarianti dan Kuspriyanto, “Syntactic Phrase Chunking for Indonesian Language”, Science Direct, pp. 635-640, 2013.

A. Purwarianti dan A. F. Wicaksono, "HMM Based Part-Of-Speech Tagger for Bahasa Indonesia", Proceedings of 4th International MALINDO (Malay and Indonesian Language) Workshop, 2010.

W. H. Chih, C. C. Chih dan J. L. Chih, A Practical Guide to Support Vector Classification, Taipei: Department of Computer Science, National Taiwan University, 2003.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann dan I. H. Witten, “The WEKA Data Mining Software: An Update,” SIGKDD Explorations, vol. 11, no. 1, 2009.

Username
Password
Remember me
Register