Linear Regression forReducing theBias of a Short Essay Scoring System

  • Silmi Fauziati Universitas Gadjah Mada
  • Adhistya Erna Permanasari Universitas Gadjah Mada
  • Indriana Hidayah Universitas Gadjah Mada
  • Eko Wahyu Nugroho Universitas Gadjah Mada
  • Bobby Rian Dewangga Universitas Gadjah Mada


This study is aimed to improve the performance of a short essay scoring system. The improvement is executed by integrating a simple linear regression to the output of a combined cosine similarity method (with weighted term frequency using Term Frequency –Inverse Document Frequency (TF-IDF) method) and term-matching mechanism.The linear regression is conducted by taking the short essay score (resulting from the combined cosine similarity and termmatching) as a regressor variable. In order to demonstrate the effectivenessof the proposedscoring system, the performance of the scoring system is measured relative to manual scoring by a lecturer.The results show that prior to linear regression, the scoring system tends to give higher score(biased score) compared to the manual score,which is problematic. The following scoring system with linear regression tackles this problem as the scoring bias is significantly reduced, that is, no tendency to givehigher or less scorecompared to the manual score.That the scoring bias is significantly reduced using a simple approach, linear regression,is expected to contribute in the acceleration of implementingautomatedessay scoring system on online learning technologiessuch as e-learning.


Z. Melicheríková dan A. Busikova, “Adaptive E-learning - A tool to Overcome Disadvantages of E-learning,” Proc. Int. Conf. Emerg. eLearning Technol. Appl., 2012, hal. 263–266.

Y. Li dan Y. Yan, “An Effective Automated Essay Scoring System Using Support Vector Regression,” Proc. - 2012 5th Int. Conf. Intell. Comput. Technol. Autom. ICICTA, 2012, hal. 65–68.

A.R. Lahitani, A.E. Permanasari, dan N.A. Setiawan, “Cosine Similarity to Determine Similarity Measure: Study Case in Online Essay Assessment,” Proc. 2016 4th Int. Conf. Cyber IT Serv. Manag. CITSM 2016, 2016, hal. 1-6.

K.P.N.V. Satya dan J.V.R. Murthy, “Clustering Based on Cosine Similarity Measure,” Int. J. Eng. Sci. Adv. Technol., Vol. 2, No. 3, hal. 508–512, 2012.

U. Hasanah, A.E. Permanasari, S.S. Kusumawardani, dan F.S. Pribadi, “A Scoring Rubric for Automatic Short Answer Grading System,” Telkomnika (Telecommunication Comput. Electron. Control.), Vol. 17, No. 2, hal. 763–770, 2019.

U. Hasanah dan D.A. Mutiara, “Perbandingan Metode Cosine Similarity dan Jaccard Similarity untuk Penilaian Otomatis Jawaban Pendek,” Semin. Nas. Sist. Inf. dan Tek. Inform., 2019, hal. 1255–1263.

U. Hasanah, T. Astuti, R. Wahyudi, Z. Rifai, dan R.A. Pambudi, “An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian,” Proc. - 2018 3rd Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. (ICITISEE), 2018, hal. 230–234.

K. Subbu dan V. Gurusamy, “Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Networks, Vol. 5, No. 1, hal. 7–16, 2014.

J.K. Raulji dan J.R. Saini, “Stop-word Removal Algorithm and Its Implementation for Sanskrit Language,” Int. J. Comput. Appl., Vol. 150, No. 2, hal. 15–17, 2016.

V. Gupta, N. Joshi, dan I. Mathur, “Design and Development of a Rule-based Urdu Lemmatizer,” Proc. Int. Conf. on ICT for Sustain. Dev., 2016, hal. 161–169.

A.P. Wibawa, F.A. Dwiyanto, I.A.E. Zaeni, R.K. Nurrohman, dan A. Afandi, “Stemming Javanese Affix Words Using Nazief and Adriani Modifications,” J. Inform., Vol. 14, No. 1, hal. 36–42, 2020.

S. Qaiser dan R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., Vol. 181, No. 1, hal. 25–29, 2018.

C.P. Medina dan M.R.R. Ramon, “Using TF-IDF to Determine Word Relevance in Document Queries,” New Educ. Rev., Vol. 42, No. 4, hal. 40–51, 2015.

B. Laurensz dan E. Sediyono, “Analisis Sentimen Masyarakat terhadap Tindakan Vaksinasi dalam Upaya Mengatasi Pandemi Covid-19,” J. Nas. Tek. Elektro dan Teknol. Inf., Vol. 10, No. 2, hal. 118–123, 2021.

R. Johansson, System Modeling and Identification. New Jersey, USA: Prentice Hall, 1993.

K.H. Zou, K. Tuncali, dan S.G. Silverman, “Correlation and Simple Linear Regression,” J. Vet. Clin., Vol. 27, No. 4, hal. 427–434, 2010.

J.L. Fleiss, B. Levin, dan M.C. Paik, Statistical Methods for Rates and Proportions, Hoboken, USA: John Wiley and Sons, 2013.

D. Rumsey, Statistics II for Dummies. Hoboken, USA: Wiley Publishing, Inc., 2009.

How to Cite
Fauziati, S., Adhistya Erna Permanasari, Indriana Hidayah, Eko Wahyu Nugroho, & Bobby Rian Dewangga. (2021). Linear Regression forReducing theBias of a Short Essay Scoring System. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 10(3), 221-228.