Optimization of ARIMA Forecasting Model Using Firefly Algorithm

Time series prediction aims to control or recognize the behavior of the system based on the data in a certain period of time. One of the most widely used method in time series prediction is Autoregressive Integrated Moving Average (ARIMA). However, ARIMA has a weakness in determining the optimal model. This research utilizes firefly algorithm to optimize ARIMA model (p, d, q) by finding the smallest Akaike Information Criterion (AIC) value in determining the best ARIMA model. The data used in the study are daily stock data of Indonesia Composite Index (IHSG) for the period of January 2013 until August 2016 and data of foreign tourist visit to Indonesia for the period of January 1988 to November 2017. The study finds, for IHSG data, the prediction result obtained with ARIMA Box-Jenkins model produced Root Mean Square Error (RMSE) 49.72, whereas the prediction with the ARIMA Optimization produced RMSE 49.48. For the data of foreign tourist visit, the prediction result with ARIMA Box-Jenkins model generated RMSE 46088.9, whereas the prediction with ARIMA Optimization generated RMSE 44678.4. From these results it can be concluded that the optimization of ARIMA model with firefly algorithm produces better forecasting model than ARIMA model without optimization. Keywords— Optimization, Forecasting, ARIMA, Firefly Algorithm, AIC, RMSE.  ISSN (print): 1978-1520, ISSN (online): 2460-7258 IJCCS Vol. 13, No. 2, April 2019 : 127 – 136 128


INTRODUCTION
Time series analysis is an important tool for predicting the future based on past history.Forecasting is a powerful assistant for decisions making and planning for the effective management of modern organizations.It is an essential part of econometric analysis [1], for some people perhaps the most important, to estimate economic variables, such as gross domestic product, inflation, stock prices, exchange rates and unemployment rates.Time series forecasting is a growing field of interest, playing an important role in many practical fields such as economics, finance, marketing, planning, meteorology and telecommunications.
One of the most widely used method in time series prediction is Autoregressive Integrated Moving Average (ARIMA).[2] In comparison to Autoregressive Conditional Heteroskedasticity (ARCH) and Generalized Autoregressive Conditional Heteroskedasticity (GARCH), ARIMA produced smallest error in predicting the exchange rate of IDR against USD for the period of January 3 rd , 2000 to July 7 th , 2014.[3] Forecasting using the ARIMA method on ISO 14001 certification data in the Americas and its 13 countries between 1996 and 2015 showed that the prediction results were close to realization.
Nevertheless, although ARIMA is better at forecasting than the ARCH and GARCH models, the method has a weakness in determining the optimal model [2].Therefore, a supporting algorithm is needed to optimize the ARIMA model.[4] A combination of linear and nonlinear methods could improve the accuracy of the time series data.[5] Application of Modified Firefly Algorithm (MFA) obtained Support Vector Regression (SVR) parameters accurately and effectively on the data of electrical load demand in Fars, Iran.[6] A multilayer perceptron (MLP) hybrid model integrated with the Firefly Optimizer Algorithm (MLP-FFA) to predict wind speed in northwestern Iran resulted in a lower RMSE score than the classic MLP.The firefly algorithm developed [7] is the best metaheuristic algorithm for optimization.This research combines two methods which are ARIMA [8] and Firefly Algorithm [7] to analyze two time series data of the Indonesia Composite Index (IHSG) [9] and foreign tourist visit to Indonesia [10].From these two data testing, it is presumed that the ARIMA model with firefly algorithm could find the most optimal model.The ARIMA Optimization process uses firefly algorithm to search for the smallest AIC value, [11] which is the criteria to find the best ARIMA (p,d,q) models [11] and determines the parameters of the firefly algorithm to perform optimization [8].The ARIMA optimization model obtained is then used for forecasting.Forecasting results are calculated based on its accuracy using Root Square Mean Error (RMSE) and its model quality using Akaike Information Center (AIC) [12].The results will be compared with the search results of the ARIMA Box-Jenkins model [13].The ARIMA Optimization model is expected to improve the accuracy of forecasting.

METHODS
The research methods in this study involves the process of literature studies, data collection and selection, ARIMA Optimization model platform development, data testing on the optimization model and its comparing model and result analysis.The IHSG data were taken from Yahoo Finance for the period of January 2013 to August 2016, i.e. 888 days [9]; whilst foreign tourist visit data were acquired from the Central Bureau of Statistics for the period of January 1988 to November 2017, i.e. 359 months [10].The ARIMA Optimization model was developed using R-Studio package which are R-Studio 3.

ARIMA Model Box-Jenkins
Autoregressive Integrated Moving Average (ARIMA), popularly known as Box-Jenkins Methodology, searches the ARIMA model using an iterative approach in identifying the most appropriate model over various models.The temporary model that has been selected is tested again with the observed data to see whether the model is adequate or not.Model is considered adequate if the residuals, i.e. the predicted difference with the observed data, are distributed randomly, small and independent of each other.Stages of designing ARIMA Box-Jenkins model implemented is described in Figure 1  Preprocessing of stationary identification data model.Identification of correlation among residuals.Ljung-Box test is used upon an assumption in time series that residuals follow a white noise process which means it must be independent (uncorrelated) and normally distributed with an average of close to 0. If there is no lag out of the significance line, it can be said that there is no autocorrelations.

2 Estimation of ARIMA Model using Firefly Algorithm
The search of ARIMA Optimization model using firefly algorithm is based on the best parameters in the firefly algorithm.The process of designing ARIMA Optimization model using firefly algorithm is portrayed in Figure 2 as follows: 1.
Selecting the time series data used for searching the ARIMA Optimization model.

2.
Initializing the population and parameters of the firefly algorithm.Initialization process is done by determining the number of population and iterations to be performed, determining the values of β0 (base beta), γ (gamma) and α (alpha) variables to calculate the level of interest, distance and movement between i, in which the number of population is the number of solution candidates that has been determined in the determination of the candidate solution combination, as well as Max (p,d,q) and Min (p,d,q) on the determination of ARIMA model.
Determining the firefly dimensions and generating random numbers as firefly initial position value (p,d,q).This value is a combination of previously generated candidate solutions, in which there are two dimensions, i.e. dimension i and dimension j.The number of i and j dimension is determined by the number of firefly population, that is the number of desired candidate solutions.
Calculating the light intensity, i.e. the AIC value, of each firefly using equation (3).To get the light intensity value, the result value from the objective function evaluation (p,d,q) is required.Therefore, the value of fitness that has been obtained previously is used as the light intensity value of each firefly.Because the purpose of the problem is to find the minimum AIC value, the smaller the value of the function, the higher the intensity value.AIC = -2 (3) 5.
Determining the Global best (Gbest) value, which is the brightest light intensity value of all firefly, i.e. the smallest AIC value.Having obtained its intensity value, each firefly is then compared to find firefly with the smallest AIC value using equation (4).The position value of firefly with the highest light intensity will be used as the updated Gbest for the AIC value.x i ' = x i + β 0 e -yr2 (x j -x j ) + α (rand -) (4) 6.
Calculating the distance position and attraction between firefly with equation ( 5), using base beta, gamma and distance constants, the distance of each firefly to firefly with the highest intensity of light, i.e.Gbest, is calculated using Euclidea method.Performing movement on firefly.The position of each firefly moves to firefly with brightest light intensity; hence producing a new position as in equation ( 4) except for Gbest which remains because it does not move.8.
Ranking the firefly based on its new light intensity by reducing the scrambler parameter value [14].The fitness value generated by the new candidate solution will be used as the new firefly light intensity value.9.
Checking the candidate solution convergence.The value is said to be convergent if the firefly position reaches the goal position or the best position.However, if the new position value exceeds the minimum and maximum value limits on each dimension or it has not reached the maximum iteration, then its position should be return back in the range of values in that dimension and repeat the step of looking for new Gbest firefly until Max Generation iteration is complete.

Generation = MAX Generation
AIC Firefly I > AIC Firefly J

Determination of Firefly Algorithm Parameters
The parameters proposed to optimize the ARIMA model based on the most significant parameters that influence the movement of search model are presented in Table 1.Parameters that have significant influence are base beta and alpha because base beta is used to determine the random starting point while alpha is to calculate the distance between firefly in finding the brightest light intensity.The search for ARIMA Optimization model (p,d,q) is limited with values (0,0,0) to (7,7,7) for calculating the smaller AIC values with higher probability to get an optimal model of optimization.
Table 1 Testing of Firefly Algorithm Parameters

ARIMA Model Optimization on IHSG Data
Results from five testing conducted on IHSG data with firefly algorithm parameters, as shown in Table 2, found the same model that is (0,1,7) with AIC -5535.6.Therefore, the parameters used is the first test Generation 10, Population 100, Alpha 0.1, Gamma 1.0 and Base Beta 0.1.The process of model optimization was done within 3 minutes and 48 seconds.

ARIMA Model Optimization on Foreign Tourist Visit Data
Results from five testing conducted on foreign tourist visit data with firefly algorithm parameters, as shown in Table 3, found the same model that is (0,2,7) with AIC -675.3268.Therefore, the parameters used is the first test Generation 10, Population 100, Alpha 0.1, Gamma 1.0 and Base Beta 0.1.The process of model optimization was done within 5 minutes and 58 seconds.For foreign tourist visit data, the ARIMA Optimization model, that is (0,2,7), produced RMSE value of 44678.49. Figure 3 shows the forecasting results of the 359 months foreign tourist visit data from January 1988 to November 2017 with blue line represents the actual data and red line represents the forecasting results.(p,d,q) based on the smallest RMSE and AIC value.Thus, to evaluate whether the process of model optimization found an ARIMA model (p,d,q) that could produce more optimal forecasting than the ARIMA Box-Jenkins model.

4 . 1 ,
MetaheuristicOpt 1.0.0,Tseries 0.10-44 and Forecast 8.3.The final stage of the research is testing both data in the ARIMA Optimization model and ARIMA Box-Jenkins model as its comparing model.The results are then analyze to find the best forecasting model with the smallest RMSE and AIC values.IJCCS ISSN (print): 1978-1520, ISSN (online): 2460-7258  Optimization of ARIMA Forecasting Model Using Firefly ... (Ilham Unggara) 129

Figure 1
Figure 1 Model of ARIMA Box-Jenkins

Figure 2
Figure 2 Design of ARIMA-Firefly Algorithm Model Optimization

133 4 . 1 .
Forecasting Results Using ARIMA Optimization Model on IHSG and Foreign Tourist Visit Data Models that have been obtained from the optimization of ARIMA model are used for forecasting.From the IHSG and foreign tourist visit data used in searching for the optimization model, the forecasting accuracy is calculated based on RMSE value.The ARIMA Optimization model obtained from IHSG data, i.e. (0,1,7), generated RMSE value of 49.48. Figure 2 depicts the forecasting results with blue line represents the actual data and red line represents the forecasting results of the 888 days IHSG data from January 2013 to August 2016.

Figure 5 AIC
Figure 5 AIC Search Movement on IHSG data

Figure 6 AIC
Figure 6 AIC Search Movement on Foreign Tourist Data the significance of the estimated coefficients and the null hypothesis for the Autoregressive (AR) model parameters.3.Selection of the best model.Prasimony, or simplicity, principle is a criteria for best model selection by choosing the most minimum AR (p) and moving average (MA) (q) parameter values.For example, between AR (1) and AR (2), the best model according to prasimony principle is AR (1) model.4.
Identification of time series data model is done by stationary test of time series data through Augmented Dickey-Fuller (ADF) test.Information about data trend and stationary could be obtained from the ACF and PACF plot generated.2. Estimation of parameters in the model.Estimation includes coefficients of the ARIMA model (φ and θ) and variance values of the residuals.The statistical t test is  ISSN (print): 1978-1520, ISSN (online): 2460-7258 IJCCS Vol. 13, No. 2, April 2019 : 127 -136 130 used to test

Table 2 ARIMA
Optimization Model Search Trial on IHSG Data

Table 3 ARIMA
Optimization Model Search Trial on Foreign Tourist Visit Data Optimization of ARIMA Forecasting Model Using Firefly ... (Ilham Unggara)