Application Performance Monitoring System Design Using Opentelemetry and Grafana Stack
The increasingly massive use of digital technology requires that the application architecture be designed to have high availability and reliability. This is because when an application cannot be accessed, it will cause no small loss to the organization. Therefore, the development and operation teams must be able to detect when their system is not working well. For that, we need a system that can monitor application performance. In this research, a system is developed to collect telemetry data, namely metrics and traces from an online donation backend application based on the REST API. OpenTelemetry produces telemetry as an open-source telemetry instrumentation tool. Then the telemetry data is collected by the OpenTelemetry Collector which is then stored on the backend of each telemetry. Metrics are sent to Prometheus and traces are sent to Jaeger. The data metrics collected are throughput, request latency, and error rate which are visualized using the Grafana dashboard. The test results show that the monitoring system can collect real-time metrics data with an average delay of 13,8 seconds. The system can also detect when an anomaly occurs in the app and sends notifications via Slack. In addition, the trace data collected can be used to simplify the debugging process when an error occurs in the application. However, the implementation of OpenTelemetry in a REST API-based backend application to monitor metrics and traces has a negative impact on the performance of the application itself, which can reduce the number of request throughput with an average decrease of 23.32% and increase request latency with an average increase of 22.80%.