Automatic Text Summarization Based on Semantic Networks and Corpus Statistics

Winda Yulita(1*), Sigit Priyanta(2), Azhari SN(3)

(1) Master Program of Computer Science; FMIPA UGM, Yogyakarta
(2) Departement of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(3) Departement of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author


One simple automatic text summarization method that can minimize redundancy, in summary, is the Maximum Marginal Relevance (MMR) method. The MMR method has the disadvantage of having parts that are separated from each other in summary results that are not semantically connected. Therefore, this study aims to compare summary results using the MMR method based on semantic and non-semantic based MMR. Semantic-based MMR methods utilize WordNet Bahasa and corpus in processing text summaries. The MMR method is non-semantic based on the TF-IDF method. This study also carried out summary compression of 30%, 20%, and 10%. The research data used is 50 online news texts. Testing of the summary text results is done using the ROUGE toolkit. The results of the study state that the best value of the f-score in the semantic-based MMR method is 0.561, while the best f-score in the non-semantic MMR method is 0.598. This value is generated by adding a preprocessing process in the form of stemming and compression of a 30% summary result. The difference in value obtained is due to incomplete WordNet Bahasa and there are several words in the news title that are not in accordance with EYD (KBBI).


automatic text summarization; MMR method; semantic; non-semantic

