Identification of Water Quality Significant Parameter with Two Transformation / Standardization Methods on Principal Component Analysis and Scilab Software

Water quality monitoring is prone to encounter error on its recording or measuring process. The monitoring on river water quality not only aims to recognize the water quality dynamic, but also to evaluate the data to create river management policy and water pollution in order to maintain the continuity of human health/sanitation requirement, and biodiversity preservation. Evaluation on water quality monitoring needs to be started by identifying the important water quality parameter. This research objected to identify the significant parameters by using two transformation/standardization methods on water quality data, which are the river Water Quality Index, WQI (Indeks Kualitas Air, Sungai, IKAs) transformation/standardization method and transformation/standardization method with mean 0 and variance 1; so that the variability of water quality parameters could be aggregated with one another. Both of the methods were applied on the water quality monitoring data which its validity and reliability have been tested. The PCA, Principal Component Analysis (Analisa Komponen Utama, AKU), with the help of Scilab software, has been used to process the secondary data on water quality parameters of Gadjah Wong river in 2004-2013, with its validity and reliability has been tested. The Scilab result was cross examined with the result from the Excel-based Biplot Add In software. The research result showed that only 18 from total 35 water quality parameters that have passable data quality. The two transformation/standardization data methods gave different significant parameter type and amount result. On the transformation/standardization mean 0 variances 1, there were water quality significant parameter dynamic to mean concentration of each water quality parameters, which are TDS, SO4, EC, TSS, NO3N, COD, BOD5, Grease Oil and NH3N. On the river WQI transformation/standardization, the water quality significant parameter showed the level of Gadjah Wong River pollution, which are EC, DO, BOD5, COD, NH3N, Fecal Coliform, and Total Coliform. These seven parameters is the minimal amount of water quality parameters that has to be consistently measured on predetermined time and location, and also become the indicator of human health and environment health quality. The result of Scilab multivariate analysis was not different with the result from Biplot Add In multivariate analysis, in which the results of water quality significant parameter has been verified with bio-monitoring.


INTRODUCTION
The water quality monitoring is aimed to recognize the water quality dynamic, in order to maintain the continuity of human health/sanitation requirement, and biodiversity preservation (Karr, 1991).The Gadjah Wong River and its tributaries in Special Region of Yogyakarta have been long monitored by its regional Environmental Agency since the declaration of the Prokasih Program (Clean River Monitoring Program) in 1995, with at least 35 water quality parameters every 3 or 4 times every year in 13 locations (BLH, 2013).The water quality data is very dynamic, prone to measurement and recording mistakes (Berthouex and Brown, 2002).Therefore evaluation is needed, started with identifying parameters of water quality that significantly impact the human health and river environment health.Significant parameter becomes the parameter type that needed to be measured consistently in every spatial monitoring, by periodically and constantly if expense limitation and laboratory facility are a hindrance.The quality of water quality monitoring data needed to be maintained since pre-sampling, at the time of sampling, and post-sampling (Resh and McElravy, 1993).
The water quality data has different unit and measurement procedure dealing with the difficulties in evaluating the water quality parameters; therefore it needs the data transformation/standardization method (Dillon and Goldstein, 1984).This research used two transformation/standardization method, which is the transformation/standardization method with mean 0 variance 1 (McBridge, 2005), and the river WQA transformation/standardization method which has been developed earlier by Saraswati (2015), after previously being cross examined with another transformation/standardization method (Cao et al., 1999).The water quality data transformation/ standardization methods that were being used were very affected the multivariate analysis result on water quality; therefore it has the ecological relevance.
The multivariate analysis for water quality data has been used previously by Zang et al. (2009), Zhou et al. (2006), andFataei (2011).The Principal Component Analysis as one of the multivariate analysis is used to reduce the huge amount of complex data variables into several values representing the entire data variable set, yet still, maintain the character of the data.The river pollution problem needs plenty variables to monitor, including its water quality.The variables are interdependent and usually correlated with each other, yet in the statistical analysis, all variables must be random, and independent with each other.Therefore, the evaluation and monitoring of water quality parameter used the multivariate statistical analysis (Putranda, 2015).
There were an excessive amount of water quality data that would be processed; therefore this research needed the help from computer software, which is Scilab (Baudin, 2010).The Principal Component Analysis used two transformation/standardization methods on the secondary data of water quality in 2004-2013, then aided by Biplot Add In (Lipkovich and Smith, E.P., 2002), being cross examined with result of multivariate analysis on water quality data of 1997-2012, which the biomonitoring result, ex-situ and in-situ, were verified (Saraswati, 2015).

Data Set
This research used the secondary data of water quality of Gadjah Wong River, which is the monitoring result of the Environmental Agency of Yogyakarta in 2004-2013(BLH, 2010).In the period of the 10 years, there were total 35 water quality parameters that have been measured in 9 monitoring locations in the main river, which are Tanen Bridge, Pelang Bridge, IAIN Bridge, Muja-Muju Bridge, Rejowinangun Bridge, Tegalgendu Bridge, Tritunggal Bridge, Wirokerten Bridge, and Wonokromo Bridge (see Figure 1).
The dry season starts from May, while the rainy season begins in November.(Jovan, 2015).

Data Transformation/standardization
There are two types of transformation/standardization that used in the water quality data analysis.

Standardization mean 0 variance 1
This standardization is the most commonly used in the statistical data processing.This standardization changed every water quality variables until it has The river WQI standardization proposed by Saraswati (2015) changed every water quality variable that has score limit of -1 to +1.Score (-) stated the water quality variable is polluted, and score (+) stated that the variable is in good condition or not polluted.Whereas the value of 0 is the score of every water quality variable if the concentration is equal the standard water quality.The river WQI standardization distinguished the water quality variable type according to the water quality variable.
If the water quality decline because of the increasing pollution (for example, DO), If the water quality increase because of the increasing pollution (for example, BOD5, COD, and others), (5) whereas yi is result of transformation/standardization variable to i, xi is water quality variable concentration to i, Stani is quality standard of water quality variable to i, Stani-mean is (maximum concentration of water quality standard + minimum concentration of water quality standard)/2, Stani-max is maximum concentration of water quality standard span, Stani-min is minimum concentration of water quality standard span, yi will have transformation/standardization result interval between (-)1 and (+)1; whereas (-) means pollution has occurred, and (+) means the water quality is good, while 0 is the concentration of water quality equals the concentration of standard water quality.

Standard Water Quality of River
Healthy River can be described from the quality of its water, which is not polluted and not toxic for its biota.Healthy water is the background condition of river water (Lumb et al., 2006).The benchmark of water quality conservation was arranged according to references, local condition on the river water, and bioassay result that ever existed (Saraswati, 2015).In Table 1, the benchmark of water quality conservation is shown with the water quality standard of class I water body, standard water for drink, according to the Government Regulation No.82 Year 2001 on Water Monitoring and Water Pollution Controlling.

Reliability Test and Validity Test of the Water Quality
Data on the water quality monitoring need to be maintained its data quality assurance in order to be able to be processed further, because it may affect the conclusion on the water quality data which does not depict the real condition on the field.The reliability test and data smoothing were conducted on the raw monitoring data, as in the missing value, censored data, and outlier data, on a parameter, and between the water quality parameters (Saraswati et al., 2013).

Principal Component Analysis
A significant variable in the PCA is water quality parameter that has a dominant impact on the criteria of water quality condition dynamic data.Determining the parameter was by observing the eigenvector value of component 1 and component 2 on the Principal Component Analysis (Smith, 2002).The chosen parameters were decided with the requirement of the component loading value > 0.5 (Hair et al., 2009); which showed that the component has able to represent the component analysis in the significantly trusted level.The eigenvector is a matrix that shows multiplier coefficient from origin variable into a PCA score on certain main component.Eigenvalue (explained variance) is the coefficient number that depicts total variances that are explained by each component of the identity matrix (Legendre and Legendre, 1998).The basic value of eigenvalue was used in determining the total number of main component, which is the new variable of water quality.

Scilab Software
Scilab is a numerical computational package that has been developed since 1990 by the researcher from INRIA and ENPC.Scilab has functional similarity with MATLAB, yet it is available to be downloaded without license fee (open source).As non-licensed software, Scilab can be used for various Operating Systems (OS), it is easy to preview and modify source code, distributing source code, and used the software for various purposes (Annigeri, 2004) The locations picked for this research were 9 locations that are located in the main river of Gadjah Wong, also because the measurement data was consistently available.The condition on the data of the 19 chosen water quality parameters for the next processing can be seen in Table 2.The KMO value and Bartlett's test on Gadjah Wong River were 0.671 and significant of 0. With KMO value above 0.5 and significant below 0.05, this value already meet the requirement for a further analysis.In Gadjah Wong River, it was the PO43-parameter that was issued for further data processing, since it has value of MSA < 0.5.The characteristic of water quality data resulted from smoothing is shown in Table 3.
3.2 Water Quality Significant Parameter Those results were used to determine the quality parameter that was considered important or significant on the PCA analysis with result data of transformation/standardization mean 0 variance method and river WQI transformation/standardization.

Transformation/standardization Method
The cross examination result of both methods showed that the eigenvalue value and component loadings of transformation/standardization mean 0 variance method was different from the result of river WQI transformation/standardization method.Each of it produced different total amount and water quality significant parameter type.
On the PCA with transformation/standardization mean 0 variance 1 method, the total amount of eigenvalue value for Gadjah Wong River was equal to its variable amount, meanwhile, the river WQI transformation/standardization method resulted the total amount of eigenvalue value was equal to the total amount of each variable's variance value.
Based on the component loading value requirement, which is more than 0.5, the significant parameters in the Gadjah Wong River according to the transformation/standardization mean 0 variance 1 method was 9 variables, on the Component 1 are TDS (0.872), EC (0.773), BOD5 (0.536), COD (0.619), NH3N (0.507), and SO4 (0.862); and on the Component 2 are TSS (0.740), NO3N (0.688), and Oil & Grease (0.578).From the river WQI transformation/standardization method, the water quality significant parameter was 7 variables, on Component 1 are the EC (2.907), DO (2.505), BOD5 (0.559), COD (1.676), NH3N (1.433), Fecal Coliform (1.559) and Total Coliform (1.675); while on the Component 2 are EC (-2.393),NH3N (-0.783),Fecal Coliform (2.471), and Total Coliform (2.495).The differences between both transformation/ standardization methods that were shown by the commonality level are shown in Table 5, as follows: The commonality value is the sum of squared factors of the component loadings value.As seen in Table 5 on the transformation/standardization mean 0 variance 1 method, the commonality value for each variable was equal 1, while on the river WQI transformation/standardization method, the commonality value for each variable was equal to the variable variance value.
The Figure 2 (a) shows the scree plot result of PCA multivariate analysis using the Scilab software on the secondary data of water quality of 2004-2013, with transformation/standardization mean 0 variance 1 method; and the Figure 2 (b) is the result analysis from Biplot Add In with the same transformation/standardization method and on the same year data.The center point of the graphic is the mean value of concentration of each water quality parameters which has been transformed/ standardized, as the equation ( 1) showed.
Based on the PCA scree plot graphic, 9 water quality parameters deviate significantly from its center point, based on weight to the component 1 and 2 consecutively are TDS, SO4, EC, TSS, NO3N, COD, BOD5, Grease and Oil, and NH3N.The deviation was measured from the measured data difference to bench mark, in form of the mean value of concentration of each water quality parameters.The Large deviation is not necessarily meant that the water quality is "polluted", for the meaning of "polluted" water is if the measured concentration has deviated from the bench mark of the water health quality.The significant weight in consecutively is TDS, TSS, EC, BOD5, COD, NH3N, SO4, NO3N, Grease and Oil, showed by its vector direction and length which moved away from the center point.
Scree plot result of the PCA used the river WQI transformation/standardization method; secondary data of 2004-2013 is shown in Figure 3 (a), while the Figure 3 (b) showed the result analysis of Biplot Add In of the same secondary data.Scree plot result of PCA analysis with Scilab on 2004-2013 data showed the same result with scree plot figure on Biplot Add In result analysis on secondary data of Gadjah Wong River water quality in 1997-2012 (Saraswati, 2015).
As for the status of Gadjah Wong River water quality, which was measured with water quality index by using the significant parameter that was resulted from river WQI transformation/standardization, this has been confirmed with the water quality status that was resulted from biomonitoring on 2012 by Saraswati (2015).Therefore it can be concluded that 7 water quality significant parameters, the EC, DO, BOD5, COD, NH3N, Fecal Coliform and Total Coliform, is the water quality parameters that most influential to the dynamic of "pollution" level on the Gadjah Wong river water.These parameters were significantly deviated away from the bench mark (Table 1) of water quality concentration with chemical physic Bacteriology parameter on the river water health.2) -( 5), the scree plot center point of Figure 3.a and Figure 3.b are the water quality bench mark value, with the water health criteria that very considers the impact to the biotic aspect in the river.This water quality parameter is the total amount and minimal water quality parameter type which must be measured consistently in its observation field; monitor the water pollution that caused by hydro-climatology natural change, domestic and industry activities.This water quality parameter can detect the impact on human health by using the water quality indicators, Fecal Coliform, and Total Coliform, and detect the impact on water environment health through the indicator of the water quality parameter, DHL or EC, DO, BOD5, COD, NH3N.Based on the scree plot result, the bacteriology indicator showed that the sanitation condition in the river is increasingly worse, compared to that of in 2004.According to the parameter indicators, COD, DO, EC, and NH3N, further the downstream the water quality is increasingly worse, compared to in the upstream; yet the river middle segment (location 3 and 4) in Yogya City is the most polluted one, caused by the organic and inorganic wastes.There is an indication that the pollution moves to upstream, with the increasing settlement and domestic activity in the location.
The scree plot on Figure 3a and 3b, using the water healthy bench mark, the total parameters that significantly deviate were only 7. The significant weight consecutively is EC, DO, COD, Total Coliform, Fecal Coliform, NH3N, and BOD5 as the smallest one.It is shown by each parameter vector length with its direction moving away from the graphic's center point.Wong river water pollution level.The water quality parameters, Fecal Coliform, and Total Coliform were to detect the human health/sanitation condition, while the EC or DHL, DO, BOD5, COD, and NH3N were to monitor the water healthy environment, which is affected by the hydro-climatology condition change, organic/inorganic waste from the domestic activity, industrial activity, and others.These water quality parameters are needed to be consistently monitored in the entire monitor field.d) The PCA study with two transformation/standardization methods on the 2004-2013 data resulted on the same water quality significant parameters with the PCA study on secondary data from 1997-2012 monitoring.e) The scree plot result in the Gadjah Wong River according to bacteriology parameter indicator showed increasing trend of higher water quality pollution in each year.Using the parameter indicator of COD, DO, EC, and NH3N, the water quality is worsening in the further downstream, yet the river middle segment in Yogya City is the most polluted.The tendency is moving to the upstream, result from the increasing settlement development and domestic activity in the locations.f) Scilab was proven to be quite effective as a statistical data processing tool, because of the available function for PCA calculation on the software, and the easily applied programming language.g) The result from multivariate analysis method using the Scilab tool was not different with the result of multivariate analysis using the Biplot Add In.

Figure 1 .
Figure 1.Monitoring location of water quality in Gadjah Wong River (indicated by black dots) Figure 2. (a)Screeplot of PCA with river WQI transformation/standardization Gadjah Wong River.Scilab; (b)Biplot Add In the research result are as follows, a) There were only 18 parameters that are considered reliable and valid, from the 35 water quality parameters that were monitored in 2004-2013.b) TDS, SO4, EC, TSS, NO3N, COD, BOD5, Grease Oil and NH3N were the 9 significant parameters that able to explain the dynamic of water quality concentration on each mean concentration of the water quality parameters.These nine water quality parameters did not explain the dynamic of river water pollution level.c) EC, DO, BOD5, COD, NH3N, Fecal Coliform, and Total Coliform were the 7 significant parameters that impact the dynamic of Gadjah

Table 3 .
Characteristic of data on Gadjah Wong river water quality, observation of2004-2013 (continued)