Classification of Emotions in English Texts Using the Ensemble Bagging Approach
Abstract
This study highlights the importance of emotion classification in English text, particularly in human interaction on social media, which often involves unstructured data. Emotions play a crucial role in communication; a better understanding of these emotions can aid in analyzing user behavior. The main objective of this research is to enhance accuracy, recall, precision, and F1-score in emotion classification by applying an ensemble bagging approach, combining the naïve Bayes, logistic regression, and k-nearest neighbor (KNN) algorithms. The methodology used included data collection from various sources, followed by data cleaning and analysis using text mining and machine learning techniques. The collected data were then analyzed to detect emotions such as anger, happiness, sadness, surprise, shame, disgust, and fear. Performance evaluation was conducted by comparing the results of the ensemble bagging method with individual algorithms to measure its effectiveness. The findings reveal that the logistic regression method achieved the highest accuracy at 98.76%, followed by naïve Bayes and KNN. This ensemble method overcame the limitations of each individual algorithm, enhancing overall classification stability and reliability. These findings provide valuable insights into text-based emotion analysis techniques and demonstrate the potential of ensemble methods to improve classification accuracy. Future research directions can explore additional ensemble techniques and optimize model complexity for improved performance in emotion analysis across broader datasets.
References
A.T. Bagus, “Klasifikasi emosi pada teks menggunakan metode deep learning,” Undergraduate thesis, Universitas Islam Indonesia, Sleman, Indonesia, 2022.
H. Krishnan, M.S. Elayidom, and T. Santhanakrishnan, “Emotion detection of tweets using naïve Bayes classifier,” Int. J. Eng. Technol. Sci. Res.(IJETSR), vol. 4, no. 11, pp. 457–462, Nov. 2017.
A.N. Rohman, E. Utami, and S. Raharjo, “Deteksi kondisi emosi pada media sosial menggunakan pendekatan leksikon dan natural language processing,” J. Eksplora Inform., vol. 9, no. 1, pp. 70–76, Sep. 2019, doi: 10.30864/eksplora.v9i1.277.
A. Chatterjee, N. Narahari, M. Joshi, and P. Agrawal, “SemEval-2019 task 3: EmoContext contextual emotion detection in text,” in Proc. 13th Int. Workshop Semant. Eval. (SemEval-2019), 2019, pp. 39–48, doi: 10.18653/v1/S19-2005.
P.W.A. Wibawa and C. Pramartha, “Systematic literature review: Machine learning methods in emotion classification in textual data,” J. Sisfokom (Sist. Inf. Komput.), vol. 12, no. 3, pp. 425–433, Nov. 2023, doi: 10.32736/sisfokom.v12i3.1787.
D. Ariyanti and K. Iswardani, “Teks mining untuk klasifikasi keluhan masyarakat menggunakan algoritma naive Bayes,” IKRA-ITH Inform, J. Komput Inform., vol. 4, no. 3, pp. 125–132, Nov. 2020.
N. Andriani and A. Wibowo, “Implementasi text mining klasifikasi topik tugas akhir mahasiswa teknik informatika menggunakan pembobotan TF-IDF dan metode cosine similarity berbasis web,” in Proc. Semin. Nas. Mhs. Ilmu Komput. Apl. (SENAMIKA), 2021, pp. 130–137.
Bimananda W. et al., “Analisis text mining dari cuitan Twitter mengenai infrastruktur di Indonesia dengan metode klasifikasi naïve Bayes,” Eig. Math. J., vol. 2, no. 2, pp. 92–101, Dec. 2019, doi: 10.29303/emj.v1i2.36.
T.W.D. Sari, “Penerapan text mining dengan menggunakan algoritma TF-IF untuk klasifikasi genre novel,” Pelita Inform., Inf. Inform., vol. 10, no. 1, pp. 29–37, Jul. 2021.
M. Afdal and L.R. Elita, “Penerapan text mining pada aplikasi Tokopedia menggunakan algoritma k-nearest neighbor,” J. Ilm. Rekayasa Manaj. Sist. Inf., vol. 8, no. 1, pp. 78–87, Feb. 2022, doi: 10.24014/rmsi.v8i1.16595.
D. Ardiada, M. Sudarma, and D. Giriantari, “Text mining pada sosial media untuk mendeteksi emosi pengguna menggunakan metode support vector machine dan k-nearest neighbour,” Maj. Ilm. Teknol. Elekt., vol. 18, no. 1, pp. 55–60, Jan.-Apr. 2019, doi: 10.24843/mite.2019.v18i01.p08.
T. Ridwansyah, “Implementasi text mining terhadap analisis sentimen masyarakat dunia di Twitter terhadap Kota Medan menggunakan k-fold cross validation dan naïve Bayes classifier,” Kajian Ilm. Inform. Komput., vol. 2, no. 5, pp. 178–185, Apr. 2022, doi: 10.30865/klik.v2i5.362.
S.S. Berutu, “Text mining dan klasifikasi sentimen berbasis naïve Bayes pada opini masyarakat terhadap makanan tradisional,” J. Sist. Komput. Inform. (JSON), vol. 4, no. 2, pp. 254–262, Dec. 2022, doi: 10.30865/json.v4i2.5138.
D.T. Alamanda et al., “Sentiment analysis using text mining of Indonesia tourism reviews via social media,” Int. J. Humanit. Arts Soc. Sci., vol. 5, no. 2, pp. 43–53, Apr. 2019, doi: 10.20469/ijhss.5.10005-2.
F.F. Mailo and L. Lazuardi, “Analisis sentimen data Twitter menggunakan metode text mining tentang masalah obesitas di Indonesia,” J. Inf. Syst. Public Health, vol. 6, no. 1, pp. 44–51, Apr. 2021, doi: 10.22146/jisph.44455.
E. Indrayuni, “Klasifikasi text mining review produk kosmetik untuk teks bahasa Indonesia menggunakan algoritma naive Bayes,” J. Khatulistiwa Inform., vol. 7, no. 1, pp. 29–36, Jun. 2019, doi: 10.31294/jki.v7i1.5740.
W. Hermanto, B. Irawan, and C. Setianingsih, “Klasifikasi emosi pada lirik lagu menggunakan algoritma support vector machine dan optimasi particle swarm optimization,” e-Proc. Eng., vol. 8, no. 5, pp. 6307–6327, Oct. 2021.
N. Anggraini, E.S.N. Harahap, and T.B. Kurniawan, “Text mining-text analysis related to COVID-19 vaccination issues,” J. IPTEK-KOM (J. Ilmu Pengetah. Teknol. Komun.), vol. 23, no. 2, pp. 141–153, Dec. 2021, doi: 10.33169/iptekkom.23.2.2021.141-153.
H.P. Doloksaribu and Y.T. Samuel, “Komparasi algoritma data mining untuk analisis sentimen aplikasi Pedulilindungi,” J. Teknol. Inf., J. Keilmuan Apl. Bid. Tek. Inform., vol. 16, no. 1, pp. 1–11, Jan. 2022, doi: 10.47111/jti.v16i1.3747.
S.D. Pramukti, A. Nugroho, and A.S. Sunge, “Analisis sentimen masyarakat dengan metode naïve Bayes dan particle swarm optimization,” Techno.Com, vol. 21, no. 1, pp. 62–75, Feb. 2022, doi: 10.33633/tc.v21i1.5332.
R. Fajar, “Implementasi algoritma naive Bayes terhadap analisis sentimen opini film pada Twitter,” J. Inovtek Polbeng Seri Inform., vol. 3, no. 1, pp. 50–59, Jun. 2018, doi: 10.35314/isi.v3i1.335.
S. Budi, “Text mining untuk analisis sentimen review film menggunakan algoritma k-means,” Techno.Com, vol. 16, no. 1, pp. 1–8, Feb. 2017, doi: 10.33633/tc.v16i1.1263.
A.K. Fauziyyah, “Analisis sentimen pandemi COVID19 pada streaming Twitter dengan text mining python,” J. Ilm. Sinus, vol. 18, no. 2, pp. 31–42, Jul. 2020, doi: 10.30646/sinus.v18i2.491.
R. Wahyudi and G. Kusumawardhana, “Analisis sentimen pada review aplikasi Grab di Google Play Store menggunakan support vector machine,” J. Inform., vol. 8, no. 2, pp. 200–207, Sep. 2021, doi: 10.31294/ji.v8i2.9681.
R. Siringoringo and Jamaluddin, “Text mining dan klasterisasi sentimen pada ulasan produk toko online,” J. Penelit. Tek. Inform., vol. 2, no. 1, pp. 314–319, Apr. 2019, doi: 10.34012/jutikomp.v2i1.456.
H.P. Koapaha and N. Ananto, “Bagging based ensemble analysis in handling unbalanced data on classification modelling,” Klabat Account. Rev., vol. 2, no. 2, pp. 165–178, Sep. 2021, doi: 10.60090/kar.v2i2.589.165-178.
R. Siringoringo and I.K. Jaya, “Ensemble learning dengan metode SMOTEBagging pada klasifikasi data tidak seimbang,” J. Inf. Syst. Dev. (ISD), vol. 3, no. 2, pp. 75–81, Jul. 2018.
P. Arsi, I. Prayoga, and M. H. Asyari, “Klasifikasi sentimen publik terhadap jenis vaksin COVID-19 yang tersertifikasi WHO berbasis NLP dan KNN,” J. Media Inform. Budidarma, vol. 7, no. 1, pp. 260–266, Jan. 2023, doi: 10.30865/mib.v7i1.5418.
A.K. Santoso et al., “Klasifikasi persepsi pengguna Twitter terhadap kasus COVID-19 menggunakan metode logistic regression,” J. Inform. Kaputama (JIK), vol. 5, no. 2, pp. 234–241, Jul. 2021, doi: 10.59697/jik.v5i2.247.
F. Fanesya, R.C. Wihandika, and Indriati, “Deteksi emosi pada Twitter menggunakan metode naïve Bayes dan kombinasi fitur,” J. Pengembangan Teknol. Inf. Ilmu Komput., vol. 3, no. 7, pp. 6678–6686, Jul. 2019.
S. Shofiyah, “Studi komparasi metode naive Bayes dan metode ensemble bagging,” Undergraduate thesis, Universitas Islam Negeri Sunan Kalijaga, Yogyakarta, Indonesia, 2020.
W.O. Simanjuntak, A.B.P. Negara, and R. Septriana, “Perbandingan algoritma logistic regression dan random foret (Studi kasus: Klasifikasi emosi tweet),” J. Apl. Ris. Inform., vol. 1, no. 2, pp. 160–164, Aug. 2023, doi: 10.26418/juara.v2i1.69682.
A. Helmut, Adiwijaya, and D.T. Murdiansyah, “Klasifikasi email multi kelas menggunakan ensemble bagging,” e-Proc. Eng., vol. 7, no. 1, pp. 2498–2505, Apr. 2020.
Mursyidah, H.T. Hidayat, and D.M. Sari, “Klasifikasi teks emosi bahasa Aceh menggunakan metode term frekuensi / invers dokument frekuensi,” J. Infomedia, Tek. Inform. Multimed. Jar., vol. 2, no. 1, pp. 14–19, Mar. 2017, doi: 10.30811/.v2i1.462.
H. Susana, N. Suarna, Fathurrohman, and Kaslani, “Penerapan model klasifikasi metode naive Bayes terhadap penggunaan akses internet,” J. Ris. Sist. Inf. Teknol. Inf. (JURSISTEKNI), vol. 4, no. 1, pp. 1–8, Jan. 2022, doi: 10.52005/jursistekni.v4i1.96.
I. Destuardi and S. Sumpeno, “Klasifikasi emosi untuk teks bahasa Indonesia menggunakan metode naive Bayes,” presented at Semin. Nas. Pascasarj. IX – ITS, Surabaya, Jawa Timur, Indonesia, 12 Aug. 2015.
A. Ansori, “Kepribadian dan emosi,” J. Literasi Pendidik. Nusant., vol. 1, no. 1, pp. 41–54, Jan.-Jun. 2020.
Y.A. Jatmiko, S. Padmadisastra, and A. Chadidjah, “Analisis perbandingan kinerja cart konvesional, bagging dan random forest pada klasifikasi objek: Hasil dari dua simulasi,” Media Stat., vol. 12, no. 1, pp. 1–12, Jul. 2019, doi: 10.14710/medstat.12.1.1-12.
A. Priyanto and M.R. Ma’arif, “Implementasi web scraping dan text mining untuk akuisisi dan kategorisasi informasi laman web tentang hidroponik,” Indones. J. Inf. Syst., vol. 1, no. 1, pp. 25–33, Aug. 2018, doi: 10.24002/ijis.v1i1.1664.
M.P.R. Putra and K.R.N. Waedani, “Penerapan text mining dalam menganalisis kepribadian pengguna media sosial,” JUTIM (J. Tek. Inform. Musirawas), vol. 5, no. 1, pp. 63–71, Jun. 2020, doi: 10.32767/jutim.v5i1.
S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning : From Theory To Algorithms. New York, NY, USA: Camb. Univ. Press, 2014.
A. Fathurohman, “Machine learning untuk pendidikan: Mengapa dan bagaimana,” J. Inform. Teknol. Komput., vol. 1, no. 3, pp. 57–62, Nov. 2021, doi: 10.55606/jitek.v1i3.306.
R.N. Devita, H.W. Herwanto, and A.P. Wibawa, “Perbandingan kinerja metode naive Bayes dan k-nearest neighbor untuk klasifikasi artikel berbahasa Indonesia,” J. Teknol. Inf. Ilmu Komput., vol. 5, no. 4, pp. 427–434, Sep. 2018, doi: 10.25126/jtiik.201854773.
P.D. Nugraha, S.A. Faraby, and Adiwijaya, “Klasifikasi dokumen menggunakan metode k-nearest neighbor (KNN) dengan information gain document,” e-Proc. Eng., vol. 5, no. 1, pp. 1541–1550, Mar. 2018.
A. S. Rezeki, “Klasifikasi emosi pada Twitter dengan metode k-nearest neighbor (KNN),” Undergraduate thesis, Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia, 2021.
Y. Kustiyahningsih and N. Syafa’ah, “Sistem pendukung keputusan untuk menentukan jurusan pada siswa SMA menggunakan metode KNN dan smart,” J. Sist. Inf. Indones., vol. 1, no. 1, pp. 19–28, Apr. 2015.
I.F. Ramadhy and Y. Sibaroni, “Analisis trending topik Twitter dengan fitur ekspansi fasttext menggunakan metode logistic regression,” JURIKOM (J. Ris. Komput.), vol. 9, no. 1, pp. 1–7, Feb. 2022, doi: 10.30865/jurikom.v9i1.3791.
A.K. Santoso, “Analisis sentimen Twitter bahasa Indonesia menggunakan pendekatan machine learning,” J. Inform. Kaputama (JIK), vol. 6, no. 2, pp. 129–136, Jul. 2022.
R. Susetyoko, W. Yuwono, E. Purwantini, and N. Ramadijanti, “Perbandingan metode random forest, regresi logistik, naïve Bayes, dan multilayer perceptron pada klasifikasi uang kuliah tunggal (UKT),” J. Infomedia, Tek. Inform. Multimed. Jar., vol. 7, no. 1, pp. 8–16, Jun. 2022, doi: 10.30811/jim.v7i1.2916.
N.K. Hasibuan, S. Dur, and I. Husein, “Faktor penyebab penyakit diabetes melitus dengan metode regresi logistik,” G-Tech, J. Teknol. Terap., vol. 6, no. 2, pp. 257–264, Oct. 2022, doi: 10.33379/gtech.v6i2.1696.
D. Ghazi, D. Inkpen, and S. Szpakowicz, “Detecting emotion stimuli in emotion-bearing sentences,” in Comput. Linguist. Intell. Text Process., 2015, pp. 152–165. doi: 10.1007/978-3-319-18117-2_12.
S. Khairunnisa, K. Adiwijaya, and S.A. Faraby, “Pengaruh text preprocessing terhadap analisis sentimen komentar masyarakat pada media sosial Twitter (Studi kasus pandemi COVID-19),” J. Media Inform. Budidarma, vol. 5, no. 2, pp. 406–414, Apr. 2021, doi: 10.30865/mib.v5i2.2835.
R.I. Pristiyanti, M.A. Fauzi, and L. Muflikhah, “Sentiment analysis peringkasan review film menggunakan metode information gain dan k-nearest neighbor,” J. Media Inform. Budidarma, vol. 2, no. 3, pp. 1179–1186, Mar. 2018.
J. Petrus, “Kerangka kerja tokenisasi berdasarkan struktur kalimat bahasa Indonesia,” Undergraduate thesis, Universitas Sriwijaya, Palembang, Indonesia, 2023.
I. Gotama, S. Hariyanto, and H. Wijaya, “Klasifikasi berita hoaks topik COVID-19 dengan klasifikasi Rocchio dan cosine similarity,” Algor, vol. 2, no. 1, pp. 84–92, Nov. 2020.
E. Junianto and R. Rachman, “Implementation of text mining model to emotions detection on social media comments using particle swarm optimization and naive Bayes classifier,” in 2019 7th Int. Conf. Cyber IT Serv. Manag. (CITSM), 2019, pp. 1–6, doi: 10.1109/CITSM47753.2019.8965382.
J.A. Septian, T.M. Fachrudin, and A. Nugroho, “Analisis sentimen pengguna Twitter terhadap polemik persepakbolaan Indonesia menggunakan pembobotan TF-IDF dan k-nearest neighbor,” J. Intell. Syst. Comput., vol. 1, no. 1, pp. 43–49, Aug. 2019, doi: 10.52985/insyst.v1i1.36.
A.C. Darmawan, “Pengembangan aplikasi berbasis web dengan python flask untuk klasifikasi data menggunakan metode decision tree C4.5,” Undergraduate thesis, Universitas Islam Indonesia, Sleman, Indonesia, 2023.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.