Synonym Recognition Influence in Text Similarity Detection Using Winnowing and Cosine Similarity

  • Santi Purwaningrum Politeknik Negeri Cilacap
  • Agus Susanto Politeknik Negeri Cilacap
  • Ari Kristiningsih Politeknik Negeri Cilacap
Keywords: Synonym Recognition, Winnowing, Cosine Similarity, Plagiarism

Abstract

Plagiarism is an act of imitating, quoting and even copying or acknowledging the work of others as one’s own work. A final project is one of the mandatory requirements for students to complete learning at college. It must be written by the students based on their own ideas. However, there is much plagiarism because it is easy to carry out just by simply copying the text of other people’s ideas and then pasting it into a worksheet and admitting that the ideas are theirs. In addition, replacing some words in other people’s sentences with their own language style without properly acknowledging the original source of the quotation is also an act of plagiarism. A manual check for the final project also becomes an issue for the final project coordinator, i.e., it needs high accuracy and a relatively long time to check the plagiarism in the final project document. Therefore, implementing plagiarism detection mechanisms is necessary to mitigate the escalation of plagiarism occurrences. In response to those matters, this study aims to design a system capable of identifying textual similarities by focusing on sentences containing synonymous words. One of the used algorithms is synonym recognition, which detects words that possess synonymous meanings by comparing each term with the entries in a dictionary. The synonym recognition is combined with the winnowing method, functioning as a fingerprint-based text weighting. After the weight of each document is obtained, the similarity level between documents is calculated with the cosine similarity algorithm. The inclusion of synonym recognition in conjunction with the winnowing weighting method resulted in a notable gain of 3.11% in the average similarity scores for title and abstract detection, compared to the absence of synonym recognition. The results show that the used algorithms are accurate with accuracy testing and root mean squared error (RMSE).

References

M.H.P. Swari, C.A. Putra, and I.P.S. Handika, “Plagiarsm Checker pada Sistem Manajemen Data Tugas Akhir,” J. Sains, Inform., Vol. 7, No. 2, pp. 192–201, Nov. 2021, doi: 10.34128/jsi.v7i2.338.

M.H.P. Swari and C.A. Putra, “Sistem Manajemen Data Skripsi (Studi Kasus: Perpustakaan Fakultas Ilmu Komputer UPN “Veteran” Jawa Timur),” J. Pendidik. Teknol., Kejuru., Vol. 17, No. 2, pp. 198–209, Jul. 2020, doi: 10.23887/jptk-undiksha.v17i2.25436.

F.E. Kurniawati and W.M. Pradnya, “Implementasi Algoritma Winnowing pada Sistem Penilaian Otomatis Jawaban Esai pada Ujian Online Berbasis Web,” J. Tek. Komput. AMIK BSI, Vol. 6, No. 2, pp. 169–175, Jul. 2020, doi: 10.31294/jtk.v6i2.7838.

I. Ahmad, R.I. Borman, G.G. Caksana, and J. Fakhrurozi, “Implementasi String Matching dengan Algoritma Boyer-Moore untuk Menentukan Tingkat Kemiripan pada Pengajuan Judul Skripsi/TA Mahasiswa (Studi Kasus: Universitas XYZ),” SINTECH (Sci., Inf. Technol. J.), Vol. 4, No. 1, pp. 53–58, Apr. 2021, doi: 10.31598/sintechjournal.v4i1.699.

N. Alamsyah and M. Rasyidan, “Deteksi Plagiarisme Tingkat Kemiripan Judul Skripsi pada Fakultas Teknologi Informasi Menggunakan Algoritma Winnowing,” Technologia, Vol. 10, No. 4, pp. 197–201, Oct.-Dec. 2019, doi: 10.31602/tji.v10i4.2361.

M. Novak, M. Joy, and D. Kermek, “Source-Code Similarity Detection and Detection Tools Used in Academia: A Systematic Review,” ACM Trans. Comput. Educ., Vol. 19, No. 3, pp. 1-37, May 2019, doi: 10.1145/3313290.

N.P. Putra and Sularno, “Penerapan Algoritma Rabin-Karp dengan Pendekatan Synonym Recognition Sebagai Antisipasi Plagiarisme pada Penulisan Skripsi,” J. Teknol., Sist. Inf. Bisnis, Vol. 1, No. 2, pp. 130–140, Jul. 2019, doi: 10.47233/jteksis.v1i2.52.

S. Fauziati et al., “Regresi Linear untuk Mengurangi Bias Sistem Penilaian Uraian Singkat,” J. Nas. Tek. Elekt., Teknol. Inf., Vol. 10, No. 3, pp. 221–228, Aug. 2021, doi: 10.22146/jnteti.v10i3.1983.

I. Mufiid, S. Lestanti, and N. Kholila, “Aplikasi Penilaian Jawaban Esai Otomatis Menggunakan Metode Synonym Recognition dan Cosine Similarity Berbasis Web,” J. Mnemonic: J. Tek. Inform., Vol. 4, No. 2, pp. 31–37, Sep. 2021, doi: 10.36040/mnemonic.v4i2.4067.

B. Sari and Y. Sibaroni, “Deteksi Kemiripan Dokumen Bahasa Indonesia Menggunakan Algoritma Smith-Waterman dan Algoritma Nazief & Andriani,” Ind. J. Comput., Vol. 4, No. 3, pp. 87–98, Dec. 2019, doi: 10.21108/indojc.2019.4.3.365.

M.R. Parvez, W. Hu, and T. Chen, “Comparison of the Smith-Waterman and Needleman-Wunsch Algorithms for Online Similarity Analysis of Industrial Alarm Floods,” 2020 IEEE Elect. Power, Energy Conf. (EPEC), 2020, pp. 1–6, doi: 10.1109/EPEC48502.2020.9320080.

V. Kumar, C. Bhatt, and V. Namdeo, “A Framework for Document Plagiarism Detection Using Rabin Karp Method,” Int. J. Innov. Res. Technol., Manage., Vol. 5, No. 4, pp. 17–30, Aug. 2021.

T. Wahyuningsih, Henderi, and Winarno, “Text Mining an Automatic Short Answer Grading (ASAG), Comparison of Three Methods of Cosine Similarity, Jaccard Similarity and Dice’s Coefficient,” J. Appl. Data Sci., Vol. 2, No. 2, pp. 45–54, May 2021, doi: 10.47738/jads.v2i2.31.

L. Meilina, I.N.S. Kumara, and N. Setiawan, “Literature Review Klasifikasi Data Menggunakan Metode Cosine Similarity dan Artificial Neural Network,” Maj. Ilm. Teknol. Elekt., Vol. 20, No. 2, pp. 307–314, Jul.–Dec. 2021, doi: 10.24843/mite.2021.v20i02.p15.

M.N. Cholis, E. Yudaningtyas, and M. Aswin, “Pengaruh Penggunaan Synonym Recognition dan Spelling Correction pada Hasil Aplikasi Penilaian Esai dengan Metode Longest Common Subsequence dan Cosine Similarity,” InfoTekJar (J. Nas. Inform., Teknol. Jar.), Vol. 3, No. 2, pp. 242–246, Sep. 2019, doi: 10.30743/infotekjar.v3i2.1061.

Sunardi, A. Yudhana, and I.A. Mukaromah, “Indonesia Words Detection Using Fingerprint Winnowing Algorithm,” J. Inform., Vol. 13, No. 1, pp. 7–15, Jan. 2019, doi: 10.26555/jifo.v13i1.a8452.

M.R. Faisal, D. Kartini, A.R. Arrahimi, and T.H. Saragih, Belajar Data Science: Text Mining Untuk Pemula I. Banjarbaru, Indonesia: Scripta Cendekia, 2023.

H.A. Rouf, A. Wijayanto, and A. Aziz, “Deteksi Plagiarisme Skripsi Mahasiswa dengan Metode Single-link Clustering dan Jaro-Winkler Distance,” J. Pilar Teknol., Vol. 5, No. 1, pp. 26–31, Mar. 2020, doi: 10.33319/piltek.v5i1.50.

C.D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge, England: Cambridge University Press, 2008.

S.P. Gunawan, L. Dwika, and A.R. Chrismanto, “Analisis Fitur Stilometri dan Strategi Segmentasi pada Sistem Deteksi Plagiasi Intrinsik Teks,” J. RESTI (Rekayasa Sist., Teknol. Inf.), Vol. 4, No. 5, pp. 988–997, Oct. 2020, doi: 10.29207/resti.v4i5.2486.

N.C. Haryanto, L.D. Krisnawati, and A.R. Chrismanto, “Temu Kembali Dokumen Sumber Rujukan dalam Sistem Daur Ulang Teks,” J. Teknol., Sist. Komput., Vol. 8, No. 2. pp. 140–149, Apr. 2020. doi: 10.14710/jtsiskom.8.2.2020.140-149.

I.M.S. Putra, P. Jhonarendra, and N.K.D. Rusjayanthi, “Deteksi Kesamaan Teks Jawaban pada Sistem Test Essay Online dengan Pendekatan Neural Network,” J. RESTI (Rekayasa Sist., Teknol. Inf.), Vol. 5, No. 6, pp. 1070–1082, Dec. 2021, doi: 10.29207/resti.v5i6.3544.

N.L.W.S.R. Ginantra and N.W. Wardani, “Implementasi Metoda Naïve Bayes dan Vector Space Model dalam Deteksi Kesamaan Artikel Jurnal Berbahasa Indonesia,” J. Infomedia, Vol. 4, No. 2, pp. 94–100, Dec. 2019, doi: 10.30811/jim.v4i2.1530.

R.P. Nuristiqomah and Y. Anistyasari, “Pengembangan Kamus Istilah Basis Data Berbasis Website Menggunakan Algoritma Cosine Similarity untuk Meningkatkan Hasil Belajar Siswa,” J. IT-EDU, Vol. 5, No. 2, pp. 621–630, 2021.

R. Nishiyama, “Adaptive Use of Semantic Representations and Phonological Representations in Verbal Memory Maintenance,” J. Mem. Lang., Vol. 111, pp. 1–11, Apr. 2020, doi: 10.1016/j.jml.2019.104084.

S. Inturi and S. Dusa, “Assessment of Descriptive Answers in Moodle-Based E-Learning Using Winnowing Algorithm,” J. Contemp. Issues Bus. Gov., Vol. 27, No. 3, pp. 2759–2769, 2021, doi: 10.47750/cibg.2021.27.03.331.

E. Siswanto and Y.C. Giap, “Implementasi Algoritma Rabin-Karp dan Cosine Similarity untuk Pendeteksi Plagiarisme Pada Dokumen,” J. ALGOR, Vol. 1, No. 2, pp. 16–22, May 2020.

Y. Nurdiansyah, A. Andrianto, and L. Kamshal, “New Book Classification Based on Dewey Decimal Classification (DDC) Law Using TF-IDF and Cosine Similarity Method,” J. Phys. Conf. Ser., Vol. 1211, pp. 1–9, 2019, doi: 10.1088/1742-6596/1211/1/012044.

R.N. Harahap and K. Muslim, “Peningkatan Akurasi pada Prediksi Kepribadian Mbti Pengguna Twitter Menggunakan Augmentasi Data,” J. Teknol. Inf., Ilmu Komput., Vol. 7, No. 4, pp. 815–822, Aug. 2020, doi: 10.25126/jtiik.2020743622.

J. Xu, Y. Zhang, and D. Miao, “Three-Way Confusion Matrix for Classification: A Measure Driven View,” Inf. Sci. (Ny)., Vol. 507, pp. 772–794, Jan. 2020, doi: 10.1016/j.ins.2019.06.064.

Y. Zhang and J.T. Yao, “Gini Objective Functions for Three-Way Classifications,” Int. J. Approx. Reason., Vol. 81, pp. 103–114, Feb. 2017, doi: 10.1016/j.ijar.2016.11.005.

E. Sutoyo and A. Almaarif, “Educational Data Mining untuk Prediksi Kelulusan Mahasiswa Menggunakan Algoritme Naïve Bayes Classifier,” J. RESTI (Rekayasa Sist., Teknol. Inf.), Vol. 4, No. 1, pp. 95–101, Feb. 2020, doi: 10.29207/RESTI.V4I1.1502.

Published
2023-08-31
How to Cite
Santi Purwaningrum, Agus Susanto, & Ari Kristiningsih. (2023). Synonym Recognition Influence in Text Similarity Detection Using Winnowing and Cosine Similarity. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 12(3), 219-226. https://doi.org/10.22146/jnteti.v12i3.6375
Section
Articles