Automatic Text Summarization Based on Semantic Networks and Corpus Statistics

Winda Yulita; Sigit Priyanta; Azhari SN

doi:10.22146/ijccs.38261

Automatic Text Summarization Based on Semantic Networks and Corpus Statistics

https://doi.org/10.22146/ijccs.38261

Winda Yulita^(1*), Sigit Priyanta⁽²⁾, Azhari SN⁽³⁾

(1) Master Program of Computer Science; FMIPA UGM, Yogyakarta
(2) Departement of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Departement of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author

Abstract

One simple automatic text summarization method that can minimize redundancy, in summary, is the Maximum Marginal Relevance (MMR) method. The MMR method has the disadvantage of having parts that are separated from each other in summary results that are not semantically connected. Therefore, this study aims to compare summary results using the MMR method based on semantic and non-semantic based MMR. Semantic-based MMR methods utilize WordNet Bahasa and corpus in processing text summaries. The MMR method is non-semantic based on the TF-IDF method. This study also carried out summary compression of 30%, 20%, and 10%. The research data used is 50 online news texts. Testing of the summary text results is done using the ROUGE toolkit. The results of the study state that the best value of the f-score in the semantic-based MMR method is 0.561, while the best f-score in the non-semantic MMR method is 0.598. This value is generated by adding a preprocessing process in the form of stemming and compression of a 30% summary result. The difference in value obtained is due to incomplete WordNet Bahasa and there are several words in the news title that are not in accordance with EYD (KBBI).

Keywords

automatic text summarization; MMR method; semantic; non-semantic

Full Text:

PDF

References

[1] A. Khan and N. Salim, “A Review on Abstractive Summarization Methods,” J. Theor. Appl. Inf. Technol., vol. 59, no. 1, pp. 64–72, 2014.

[2] N. Andhale and L. A. Bewoor, “An Overview of Text Summarization Techniques,” 2016 Int. Conf. Comput. Commun. Control Autom., pp. 1–7, Aug. 2017.

[3] P. Krishnaveni and S. R. Balasundaram, “Automatic Text Summarization by Local Scoring and Ranking for Improving Coherence,” 2017 Int. Conf. Comput. Methodol. Commun., pp. 59–64, 2017.

[4] M. Afsharizadeh, H. E. Komleh, and A. Bagheri, “Query-oriented Text Summarization using Sentence Extraction Technique,” 2018 4th Int. Conf. Web Res., pp. 128–132, 2018.

[5] P. M. Sabuna and D. B. Setyohadi, “Summarizing Indonesian Text Automatically By Using Sentence Scoring And Decision Tree,” 2017 2nd Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng., pp. 1–6, 2017.

[6] P. P. Tardan, A. Erwin, and I. T. Faculty, “Automatic Text Summarization Based on S emantic Analysis Approach for Documents in Indonesian Language,” 013 Int. Conf. Inf. Technol. Electr. Eng., pp. 1–6, 2013.

[7] R. Reztaputra and M. L. Khodra, “Sentence Structure-based Summarization for Indonesian News Articles,” 2017 Int. Conf. Adv. Informatics, Concepts, Theory, Appl., pp. 0–5, 2017.

[8] D. Annisa and M. L. Khodra, “Query-based Summarization for Indonesian News Articles,” 2017 Int. Conf. Adv. Informatics, Concepts, Theory, Appl., 2017.

[9] J. Carbonell and J. Goldstein, “The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries,” in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’98, 1998, pp. 335–336.

[10] G. Yapinus, A. Erwin, M. Galinium, and W. Muliady, “Automatic Multi-Document Summarization for Indonesian Documents Using Hybrid Abstractive- Extractive Summarization Technique,” Inf. Technol. Electr. Eng. (ICITEE), 2014 6th Int. Conf. , pp. 1–5, 2014.

[11] D. Cao and L. Xu, “Analysis of Complex Network Methods for Extractive Automatic Text Summarization,” in 2016 2nd IEEE International Conference on Computer and Communications (ICCC), 2016, pp. 2749–2756.

[12] Y. Li, D. McLean, Z. A. Bandar, J. D. O’Shea, and K. Crockett, “Sentence Similarity Based on Semantic Nets and Corpus Statistics,” IEEE Trans. Knowl. Data Eng., vol. 18, no. 8, pp. 1138–1150, Aug. 2006.

[13] N. H. M. Noor, S. Sapuan, and F. Bond, “Creating the Open Wordnet Bahasa,” 25th Pacific Asia Conf. Lang. Inf. Comput., pp. 255–264, 2011.

[14] A. Dinakaramani, F. Rashel, A. Luthfi, and R. Manurung, “Designing an Indonesian Part of Speech Tagset and Manually Tagged Indonesian Corpus,” Proc. Int. Conf. Asian Lang. Process. 2014, IALP 2014, pp. 66–69, 2014.

[15] K. Shetty and J. S. Kallimani, “Automatic Extractive Text Summarization Using K-Means Clustering,” Electr. Electron. Commun. Comput. Optim. Tech. (ICEECCOT), 2017 Int. Conf., 2017.

DOI: https://doi.org/10.22146/ijccs.38261

Article Metrics

Abstract views : 4502 |

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright of :IJCCS (Indonesian Journal of Computing and Cybernetics Systems)ISSN 1978-1520 (print); ISSN 2460-7258 (online)is a scientific journal the results of Computingand Cybernetics Systems
A publication of IndoCEISS.Gedung S1 Ruang 416 FMIPA UGM, Sekip Utara, Yogyakarta 55281Fax: +62274 555133email:ijccs.mipa@ugm.ac.id | http://jurnal.ugm.ac.id/ijccs

View My Stats1View My Stats2

Username
Password
Remember me