An Optimal Stock Market Portfolio Proportion Model Using Genetic Algorithm

To reduce the amount of loss due to investment risk, an investor or stockbroker usually forms an optimal stock portfolio. This technique is done to get the maximum return on investment on shares to be purchased. However, in forming a stock portfolio required a fairly complex calculations and certain skills. This work aims to provide an alternative solution in the problem of forming the optimal and efficient stock portfolio composition by designing a system that can help decision making of investors or stockbrokers in preparing stock portfolio in accordance with the policy and risk investment. In this work, determination of optimal stock portfolio composition is constructed by using Genetic Algorithm. The data used in this work are the 4 selected stocks listed on the LQ45 index in 2017. Meanwhile, the calculation of profit and loss rate utilizes a single index model theory. The efficiency of the algorithm has been examined against the population size and crossover and mutation probabilities. The experimental results show that the proposed algorithm can be used as one of the solutions to select the optimal stock portfolio.


INTRODUCTION
Investment is a commitment of investors in the form of some funds or other resources currently undertaken to obtain some profits in the future.The purpose of investors in investing is to get the maximum return on the stock they bought.However, various problems arise due to a large number of stock investment instruments circulating in the capital market.These instruments have risks to be considered by each investor, while the analytical capability of investors is still relatively limited.Thus, this limitation is very influential on stock investment decisions.
To reduce risk investment, investors should have more stock options to invest.One of the strategies for avoiding risk on an investment is through diversification of stocks by forming a portfolio, called stock portfolio.An optimal stock portfolio will produce optimal returns with a moderate risk that can be accounted for.However, investors are faced with uncertainty selection when it comes to choosing the stock to be formed into their preferred portfolio.With thousands of stocks in the market, deciding which portfolio should be chosen is quite difficult.The combination of multiple stocks with different qualities arises difficulty for investors to determine the proportion of funds to be invested.On the other hand, creating the mathematical model is more difficult and very complex.Therefore, the artificial intelligence approach is needed to solve the problem.
For solving major problems in forming of an optimal stock portfolio, a recommendation system [1] is required by utilizing the appropriate algorithm.This algorithm should be able to be used in selecting the investment portfolio composition in accordance with the investment policy of an investor [2].There are several methods can be used for this problem, such as linear programming, mathematical modeling, data mining, and optimization using a genetic algorithm (GA).Among these algorithms, the GA is preferred to be chosen as it has a cability to find the optimal and high-quality solution in solving both constrained and unconstrained optimization problems within large solution space [3].There have been previous studies that have implemented genetic algorithms [2,4,5,6] as well as data mining [7] on a stock portfolio recommendation system.Unfortunately, the proposed method is still a decision whether to take or not a particular stock.In contrast to previous research, the proposed method provides a recommendation system in the form of the proportion of funds to be invested in certain shares.In this work, the Genetic Algorithm is applied to get the best chromosome in the form of the proportion of each stock to get the portfolio composition with the optimum level of profit and the level of loss that can be accounted for.The data used in this project are 4 selected stocks listed on the LQ45 index in 2017.LQ45 is a stock market index in Indonesia Stock Exchange (BEI) consisting of 45 companies that meet certain criteria.Meanwhile, the calculation of profit and loss rate using single index model theory.
The rest of this paper is organized as follows.In Section 2, the detail explanation of stock portfolio and how to compute the risk and return rate are provided.In Section 3, the proposed framework of the genetic algorithm is explained.The the results of our proposed method are offered in Section 4. Lastly, Section 5 presents the conclusion of this paper.

Stock Portfolio
The selection of stock data is based on the companies listed in the LQ45 index.In this case, PT Telekomunikasi Indonesia Tbk (TLKM), PT Jasa Marga (Persero) Tbk (JSMR), PT Perusahaan Gas Negara (Persero) Tbk (PGAS), and PT Semen Indonesia (Persero) Tbk (SMGR) were selected.The data were taken in the form of value of close price and stock dividend in the range between January 1, 2017 and April 30, 2017.Therefore, there will be 82 data from each company.
A portfolio is a grouping of financial assets such as stocks, bonds, commodities, currencies and cash equivalents, as well as their fund counterparts, including mutual, exchangetraded and closed funds.The stock portfolio is an investment consisting of different stocks of different companies in the hope that if one stock decreases while the other increases, the investment will not incur losses [4].In addition, the correlation of return between one stock and other stock will also reduce the variance of the portfolio return.
Investors should construct an investment portfolio in accordance with their risk tolerance and their investing objectives.The investment objective is to maximize return/profit, without neglecting the risk factor (risk) of investment.Return is one of the factors that motivates investors to invest and is a reward for the courage investors bear the investment risk.
To determine the return and risk of the stock portfolio, the following systematic steps are considered [8]: 1. Determination of profit rate of actual return each year on each company.To determine the actual return, the following formula is used: ( where = The profit level of individual stocks for an investment term = The closing price of the stock period t = Stock closing price of previous period = Shared share dividends 2. Calculating the expected profit value (expected return) of each company.
where ( ) = The expected rate of return = The profit level of each individual stock = The number of investment period

Risk Calculation
After calculating the actual and expected returns, the next step is to determine the value of risk by calculating the value of variance and standard deviation.The risk is expressed by a standard deviation which can be calculated by the following equation:

Stock portfolio combination from stock index LQ45.
In this stage, the company's stocks are combined in forming portfolios.It then determines the proportion of funds to be invested in each company.The proportion of funds is determined randomly and should be equal to one.The calculation of the proportion of funds will be used to calculate expected returns (profit) from a combination of the stock portfolio.5. Determining expected return of portofolio risk.
where = Correlation coefficient = The number of stock periods = The profit level of the first stock = The profit level of the second stock 7. Determining variance and standart deviation of stocks 8. Calculation of coefficient of variance (CV)

Genetic Algorithm
The genetic algorithm is one of computationally intelligence techniques, an optimization technique based on natural genetics.In a genetic algorithm, the search process is carried out between a number of alternative optimal points based on a probabilistic function to produce an optimal solution [3].Compared to other optimization algorithms, genetic algorithms in many cases provide fairly high accuracy as in the case of scheduling [9], financing [10], and even in robot control systems [11].So the use of genetic algorithm in the case of stock portfolio selection is expected to provide optimal results with high accuracy.The framework of the GA used in this work for selecting optimal portfolio proportion is shown in Algorithm 1.The selection of an optimal portfolio relies on the investment policy of the investor.The first stage is to generate n initial chromosomes, where each chromosome consists of l genes.Each gene is representing the proportion of fund to be invested from i th For each chromosome, the fitness value is calculated.Thus, the chromosomes can be sorted out based on their fitness value.Fixed number of iteration is used as the termination criteria.In each iteration step, koffspring (children) are generated based on the crossover and mutation processes.These process will be explained further in section 2.3.4 and 2.3.5.These new offspring are added to the population and then sorted out together with the previously n chromosomes in descending order.To maintain the quality of the population, k worst chromosomes are removed from the population.Hence, the number of population member is not changing in each iteration step (i.e.generation).
Reproduction process is done by using crossover and mutation operators according to their probability values.Furthermore, individuals from the initial and new offspring populations of crossover and mutation results are combined for the selection process.Selection is done by calculating the fitness of each individual.The best individuals are individuals who have the best fitness after a fixed number of iteration.Factors affecting fitness are the expected return (profit) and risk portfolios.

2.1 Parameters Initialization
At the initialization stage, there are five parameters which should be considered such as population size (n), generation size, probabilities of crossover and mutation (pc and pm), and the length of chromosomes (l).The population size is the number of chromosomes in a population.Generation sizes (iteration) are used to determine how many generations to apply.The probabilities of crossover and mutation are used to determine the number of offspring (chromosomes) produced.In addition, the length of the chromosome is equal to the number of stock to be included in the portfolio.In this paper, the length of chrosomosomes is set to be four.

2.2 Chromosome Representation
In this work, the chromosome is represented by real-values, which represents the proportion of funds for certain stock.The length of the chromosome is the number of stocks in the portfolio.A chromosome is formed by generating a random value within interval [0,1].In this case, the total gene values should be equal to 1. Otherwise, the chromosome should be classified as invalid chromosome.Table 1 is an example of a chromosome representation used in this work.
Table 1 Chromosome Representation Sample TLKM JSMR PGAS SMGR 0,35 0,15 0,27 0,23 As can be seen in Table 1, there are 4 stocks to be formed into a stock portfolio, the chromosome has a length of 4 genes.The contents of each gene shown by index 1 st to the 4 th index indicate the proportion of stock.

2.3 Fitness Function Calculation
In this case, the calculation of the profit and loss rate is performed based on the single index model theory [8].Thus, the profit rate of the stock portfolio can be calculated by the following equation: The purpose of forming a stock portfolio is to obtain an optimal level of profit with lowlevel risk.Therefore, the formula in equation ( 10) is used as fitness function, where he higher value of fitness is better. (10)

2.4 Crossover
Crossovers are used to generate new individuals with different genes from the previous individual i.e. parents.In this work, the crossover is done with extended intermediate crossover.Extended intermediate crossover produces offspring from the combination of two parent chromosome [3].The number of offspring generated in the crossover process is pc x n.Suppose that P 1 and P 2 are two chromosome parents, then offspring C 1 and C 2 can be generated as follows: The value of α is randomly set at predetermined intervals [0, 1].In some cases, the total gene values can be more than one.To overcome this problem, the obtained offspring must be normalized using equation (12).Table 2 illustrates how the crossover process is performed with value ∑ (12)

2.5 Mutation
The mutation operator on the genetic algorithm modifies an offspring (a child's chromosome) after crossover operation.The mutation is performed based on probability.To perform this process, reciprocal exchange mutation is utilized.It selects two positions (exchange point / XP) randomly on the parent chromosomes and then exchanges the value at each position [3].For example, let the chromosome (0.34, 0.121, 0.277, 0.262) be selected for mutation, and genes 0.34 and 0.121 are swapped in cluster 1, and genes 0.277 and 0.262 are swapped in cluster 2; then the mutated chromosome becomes (0.121, 0.34, 0.262, 0.277).

2.6 Selection
The selection process is done to obtain the best chromosome in the population for next generation.The selection process is performed by elitism, which sorts out the chromosomes based on the highest to the lowest fitness values and then keeps n best chromosome as a new population in the next generation, while the remaining chromosomes will be removed.After the selection process, the best chromosome with the highest fitness value is selected as optimal solution of stock portfolio proportion in each generation.

Experiment Setting
The proposed method has been implemented using Java programming language using PC under Windows operating system.The experiments were conducted by evaluating the effect of population size as well as the effect of crossover probability and mutation probability.

Stock Market Data Acquisition
The selection of stock data is based on the companies listed in the LQ45 index.In this case, PT Telekomunikasi Indonesia Tbk (TLKM), PT Jasa Marga (Persero) Tbk (JSMR), PT Perusahaan Gas Negara (Persero) Tbk (PGAS), and PT Semen Indonesia (Persero) Tbk (SMGR) were selected.The values of close price and stock dividend in the range between 1 January 2017 and 30 April 2017 are selected as the data.Therefore, there are 82 data from each company.

Population Size Effect
The first experiment was conducted by using probability crossover 0.5 and probability mutation 0.2 to obtain optimal population size.The population size is set to be between 20 until 140 chromosomes, within increment 20.The genetic algorithm is a stochastic method which will produce different results each time it is run.So that, each population size will be performed 5 times and compute the average of the best fitness value on each running.The experimental results using several population sizes are shown in Table 3 and Figure 1.The best average fitness is 3.81 when the population size is equal to 100.The obtained population sizes in this experiment is the optimal population that can be used for the determination of the optimal stock portfolio proportion.This population size will be used for the next experiment.

Crossover and Mutation Probability Effect
In the second experiment, the crossover probability (pc) and the mutation probability (pm) were examined by combining the probability value of both operators.Both probabilities were set to the combination of 0.1 until 1 with increment 0.1, respectively.So there will be 10 experiments.
To evaluate the effect of crossover and mutation probability, the population size is set to be 100 population, the best population size previously obtained.Each combination pc and pm sizes is tested 5 times and we select the average of the highest fitness value.From Table 4 and Figure 2, it can be seen that the highest fitness value is 3.9 with a crossover probability value of 0.3 and a mutation probability of 0.7 or 0.5: 0.5.
It can be seen on the experimental results in Figure 3, an individual with the highest fitness values has chromosomes {0; 1; 0; 0}, so the portfolio formed from the chromosome is 100% in JSMR stock, while the proportion of the other three stocks are 0 (zero ).It can be concluded that it is not recommended to take the stocks of TLKM, PGAS, and SMGR.

Figure 2
Figure 2 Results on combination of crossover and mutation probabilities

Table 4
Results in Combinantion of Crossover and Mutation Probability