Pengembangan Engine Integrasi Tabel HTML pada Halaman Web

  • Memen Akbar Institut Teknologi Bandung
  • Fazat Nur Azizah Institut Teknologi Bandung
  • G. A. Putri Saptawati Institut Teknologi Bandung
Keywords: integrasi data, tabel HTML, ontologi, integrasi tabel, halaman web

Abstract

Two problems are arisen while integrating number of tables from number of web pages, i.e. structural conflict and semantic conflict. To tackle those problems, the proposed study combines some existing methods that are already proven to solve problems in integrating process. The proposed integration process of HTML table consists of 4 phases: (1) locating the table in web pages, (2) separating attributes and data values, (3) integrating the table scheme, (4) migrating the data values into integrated scheme. Table location in web page is determined using heuristic approach. This approach also can separate the attributes and the data values of the table. Semantic conflict that is apparent while integrating the table scheme is handled using domain specific ontology. The resulted data value, then, is migrated to table scheme in line with duplication data checking using vector space model. Result of the integration is presented as single HTML table. This approach is implemented as an engine that is coded using Phyton language. Result of experiment shows that the proposed approach can be used to integrate number of HTML table from number of web pages into a single integrated table.

References

Chen Kerui, Zhao Jinchao, Zuo Wanli, He Fengling, and Chen Yongheng, "Automatic table integration by domain-specific ontology," International Journal of Digital Content Technology and Its Application, vol. 5, no. 1, pp. 218-226, January 2011.

Eko Prasetyo, Lukito Edi Nugroho, and Marcus Nurtiantara Aji, "Perancangan Data Warehouse Sistem Informasi Eksekutif untuk Data Akademik Program Studi," JNTETI, vol. 1, no. 3, pp. 13-20, November 2012.

Shijun Li, Zhiyong Peng, and Mengchi Liu, "Extraction and integration information in HTML tables," in Fourth International Conference on Computer and Information Technology (CIT), 2004.

Yeon-Seok Kim and Kyong-Ho Lee, "Extracting logical structures from HTML tables," Computer Standards and Interfaces (Elsevier), vol. 30, no. 5, pp. 296-308, August 2007.

Seung-Jin Lim, Yiu-Kai Ng, and Xiaochun Yang, "Integrating HTML tables using semantic hierarchies and meta-data sets," in International Database Engineering and Application Symposium (IDEAS), 2002.

Kumi Itai, Atsuhiro Takasu, and Jun Adachi, "Information extraction from HTML pages and its integration," in Symposium on Application and the Internet Workshops (SAINT-w), 2003, pp. 1-6.

David W. Embley, Cui Tao, and Stephen W. Liddle, "Automating the extraction of data from HTML tables with unknown structure," Data & Knowledge Engineering (Elsevier), vol. 54, pp. 3-28, November 2004.

Agny Ismaya, "Algoritma Ekstraksi Informasi Berbasis Aturan," JNTETI, vol. 3, no. 4, pp. 242-247, November 2014.

How to Cite
Memen Akbar, Fazat Nur Azizah, & G. A. Putri Saptawati. (1). Pengembangan Engine Integrasi Tabel HTML pada Halaman Web. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi, 5(3), 177-183. Retrieved from https://jurnal.ugm.ac.id/v3/JNTETI/article/view/2931
Section
Articles