Data Benchmark for Google Big Query and Elasticsearch
Abstract
Nowadays,the cloud is not only a data storage medium but can be used as a medium for managing or analyzing data. Google offers Google BigQueryas a platform capable of managing and analyzing data,while Elasticsearch itself is a search and analysis engine that can be used to analyze data using Kibana. Using a dataset in the form of tweets crawled through http://netlytic.org/,containing the hashtags #COVID19 and #coronavirus, the data will be analyzed and used to compare its performance with benchmarks. Benchmark is a process used to measure and compare performance against an activity so that the desired level of performance is achieved. Data benchmark is performed on both platforms to generate or determine the workload of the platforms. The result obtained in this study is that Google BigQueryhas superior results, both from the upload container for larger datasets than Elasticsearch and with two query testing models.The query management time on Google BigQueryis also shorter and faster than Elasticsearch. Meanwhile, the visualization results from these two platforms have the same percentage amount.
References
E. Pratama dan I.P. Agus, Handbook Data Warehouse. Bandung, Indonesia: Informatika Bandung, 2018.
C. Dobre dan F. Xhafa, “Intelligent Services for Big Data Science,” Futur. Gener. Comput. Syst., Vol. 37, hal. 267–281, 2014.
V.K. Jain dan S. Kumar, “Big Data Analytic Using Cloud Computing,” Proc. - 2015 2nd IEEE Int. Conf. Adv. Comput. Commun. Eng. (ICACCE 2015), 2015, hal. 667–672.
H. Tankovska (2021) “Global Social Networks Ranked by Number of Users 2021,” [Online], https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/, tanggal akses: 31-Mar- 2021).
P. Srivastava dan R. Khan, “A Review Paper on Cloud Computing,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., Vol. 8, No. 6, hal. 17-20, 2018.
J. Tigani dan S. Naidu, Google BigQuery Analytics, Hoboken, USA: Wiley, 2014.
O. Dawelbeit dan R. McCrindle, “Efficient Dictionary Compression for Processing RDF Big Data Using Google BigQuery,” Proc. 2016 IEEE Glob. Commun. Conf. (GLOBECOM 2016), 2016, hal. 1–6.
V.A. Zamfir, M. Carabas, C. Carabas, dan N. Tapus, “Systems Monitoring and Big Data Analysis Using the Elasticsearch System,” Proc. - 2019 22nd Int. Conf. Control Syst. Comput. Sci. (CSCS 2019), 2019, hal. 188–193.
Y.C. Tay, “Data Generation for Application-Specific Benchmarking,” Proc. VLDB Endow., Vol. 4, No. 12, hal. 1470–1473, 2011.
A. Bog, Benchmarking Transaction and Analytical Processing Systems. Cham, Switzerland: Springer, 2014.
D.O. Baskoro, Big Data Benchmark pada Hadoop 2, Spark, dan Presto Menggunakan Metode Perbandingan Waktu Respon Query, Skripsi, Universitas Gadjah Mada, Yogyakarta, Indonesia, 2015.
P.P.I. Langi, Widyawan, W. Najib, dan T.B. Aji, “An Evaluation of Twitter River and Logstash Performances as Elasticsearch Inputs for Social Media Analysis of Twitter,” Proc. 2015 Int. Conf. Inf. Commun. Technol. Syst. ICTS 2015, 2016, hal. 181–186.
S. Gupta dan R. Rani, “A Comparative Study of Elasticsearch and CouchDB Document Oriented Databases,” Proc. Int. Conf. Inven. Comput. Technol. (ICICT 2016), 2016, hal. 1–4.
A.U. Abdullahi, R. Ahmad, dan N.M. Zakaria, “Big data: Performance Profiling of Meteorological and Oceanographic Data on Hive,” Proc. 2016 3rd Int. Conf. Comput. Inf. Sci. (ICCOINS 2016), 2016, hal. 203–208.
C. Kotas, T. Naughton, dan N. Imam, “A Comparison of Amazon Web Services and Microsoft Azure Cloud Platforms for High Performance Computing,” 2018 IEEE Int. Conf. Consum. Electron. (ICCE 2018), 2018, hal. 1–4.
© Jurnal Nasional Teknik Elektro dan Teknologi Informasi, under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.