Home Articles Estimation of monthly silt load using bootstrap-based artificial neural networks (BANNs) for...

Estimation of monthly silt load using bootstrap-based artificial neural networks (BANNs) for upper damodar valley catchments

S. K. Sharma, K. N. Tiwari
Department of Agricultural and Food Engineering
Indian Institute of Technology, West Bengal, 721 302

Abstract

Estimation of silt load is a prerequisite for many applications involving conservation and management of water resources. This study is undertaken in the Upper Damodar Valley Catchment (UDVC) having a drainage area of 17513.08 km2 for prediction of monthly silt load. Thirty one microwatersheds and fifteen sub watersheds were selected from a total 716 microwatersheds in the catchment area for this study. The feasibility of using different soil attributes (particle size distribution, organic matter content, apparent density), topographic attributes (primary, secondary and compound), geomorphologic attributes (basin, relief and network indices) and vegetation attributes (NDVI) on prediction of monthly silt load were explored in this study.

Principal Component Analysis (PCA) was applied to minimize the data redundancy of the input parameters. Ten significant input parameters namely; watershed length (km), elongation ratio, bifurcation ratio, area ratio, coarse sand (%), fine sand (%), elevation (m), slope(o), profile curvature (rad/m) and NDVI were selected. These parameters were added in hierarchy with monthly rainfall (mm) as inputs for prediction of monthly silt load using bootstrap based artificial neural networks (BANN). The performance of the models was tested using spearman’s correlation coefficient (r), coefficient of efficiency (COE), Root mean square error (RMSE) and Mean absolute Error (MAE). Increase in number of input parameters did not necessarily yield better performances of the BANN models. Selection of relevant inputs and their combinations were found to be key elements in determining the performance of BANN models. Annual silt load map was generated for all the microwatersheds utilizing the weights of the best performing BANN model. This study reveals that the specific combinations of soil, topography, geomorphology and vegetation inputs can be utilized for better prediction of monthly silt load.

INTRODUCTION
Estimation, conservation and management of available water play a vital role in achieving higher productivity to sustain the increasing requirements. Since 1930s, numerous linear and non-linear hydrological models have been developed to simulate rainfall-silt load relationships. The flexibility of ANNs in inclusion of several parameters and its success in capturing the non-linearity of dynamic systems has made it an attractive tool for modelling the hydrological process (Hsu et al., 1995).

Extensive reviews on ANN applications in hydrologic simulation and forecasting have been reported in ASCE (2000a, b), Dawson and Wilby (2001), Maier and Dandy (2000). These studies have recommended combining ANNs with relevant statistical principles to achieve greater confidence and reliability from their results. Determination of adequate model inputs and development of suitable network architecture have been identified as key aspects requiring further attention Maier and Dandy (2000).

This study attempts to address some of these issues in the context of estimating monthly silt load from catchments subjected to precipitation inputs.

Nagy et al. (2002) reported that using three-layer back propagation ANN model provided better results than other formulas used for estimation of sediment and riverbed information. Zhang and Govindaraju (2003) developed a geomorphology-based ANN (GANN) for prediction of watershed runoff and concluded that GANNs have a potential scope for estimating direct runoff. Sarangi and Bhattacharya (2005) extended the application of GANNs for the prediction of silt load for Banha watershed in Upper Damodar Valley Catchments.

The indirect influence of topography on soil properties and processes has been documented (Moore et al (1993) and Carter and Ciolkosz (1991)). Topographical attributes are widely available with the advent of digital elevation models (DEMs) and digital terrain analysis techniques. The fine resolution of DEMs could help in better capturing of flow patterns over the catchments. Sharma et al. (2006) reported that certain combinations of topography (DEM) and vegetation attributes (NDVI) performed better compared to using basic soil properties alone as inputs for prediction of soil hydraulic properties. These studies indicate the potential of using topographic and vegetation attributes as predicators for estimation of monthly runoff from catchments.

The increase in number of input variables results in complex handling of data for developing models. The possibility of failure of parameter identification from large number of input variables could lead to numerical instability due to the strong mutual correlations of input data. This issue is addressed in this study by applying principal component Analysis (PCA) for reducing the dimensionality of dataset with large number of interrelated variables, while retaining most of the variability of the original datasets. Specifically the objectives of this study are to: a) explore the inclusion of soil, topographic, geomorphologic and vegetation (STGV) attributes for estimation of monthly silt load b) minimize the number and intercorrelations of STGV attributes using PCA and c) determine the significant combinations of STGV attributes by hierarchical bootstrap based ANN (BANN) modeling for prediction of monthly silt load for the Upper Damodar Valley Catchments.

THEORETICAL CONSIDERATIONS

Neural Network Analysis
In this study, neural network analysis was performed using Neuropack software Minasny and McBratney (2002). Mathematically, neural networks can be represented by a set of simple functions linked together by weights. A network with an input vector of elements xl (l = 1…., Nl) is transmitted through a connection that is multiplied by weight Wjl to give the hidden units zj (j = 1,…,Nh) : (1)

where Nh is the number of hidden units and Ni is the number of input units. The hidden units consist of the weighted input (wjl) and the bias (w0). The outputs from hidden layer pass another layer of filters with weights (ukj) and bias (u0) and are fed into another activation function F to produce output y (k = 1,….N0): (2)

The weights are adjustable parameters of the network and are determined from a set of data though the process of training. The NL2SOL adaptive nonlinear least squares algorithm Dennis et al (1981) implemented in the Neuropath software was used for training the networks. The objective of training was to minimize the sum of squares of the residuals between the measured and predicted outputs.

DESCRIPTION OF STUDY AREA AND METHODOLOGY

Study Area
Upper Damodar Valley Catchment (UDVC) lying between 23o4’00” and 24o4’00” N lat, and 84o7’00” and 85o7’00” E long in the states of Jharkhand and West Bengal, India (Fig.1) has a drainage area of 17513.08 km2. The UDVC is shaped like a cone with the point of confluence of Barakar and Damodar forming the apex of the cone. Due to the parallel flow paths of the two rivers, their catchments are shaped like long and narrow strips of land running from west to east in the case of Damodar and north-west to south-east in case of the Barakar. The fragile shape of the landform along with high density drainage network within the narrow strip of Damodar catchment poses a great threat in removing the top fertile soil and depositing it into the five big reservoirs namely Tilaiya, Konar, Maithon, Panchet and Tenughat constructed by Damodar Valley Corporation.

Topographic Attributes
The topography of UDVC was characterized with DEM of 90 m by 90 m resolution ). Terrain analysis system (TAS) software was used to derive primary topographic attributes (elevation, slope and aspect), secondary topographic attributes (profile curvature, plan curvature, tangent curvature, and surface curvature index) and compound topographic attributes (sediment transport capacity index, wetness index and relative stream power index) from DEM. Table 1 shows the basic statistics of the derived topographic attributes from the selected watersheds.

The slope (in degrees) is a measure of the maximum rate of change of elevation between each cell and its neighbors. Aspect represents the direction of slope and identifies the maximum rate of change in down-slope direction. Profile curvature is a measure of rate of change of gradient and controls the acceleration and deceleration of near surface flows. Plan curvature refers to the rate of change of aspect and determines the divergence and convergence of near surface flow identifying features including ridges (positive convexity) and gullies (negative convexity). Tangent curvature is determined by multiplying plan curvature with slope. Surface Curvature Index is measure of total curvature and magnitude of slope gradient. It indicates predominantly convex slope (positive values) or a predominantly concave slope (negative values). The Sediment Transport Capacity Index is considered to be equivalent of Length-Slope factor in the Revised Universal Soil Loss equation. Wetness Index is a measure for predicting zones of saturation during generation of runoff. Relative Stream Power Index is a measure of the erosive power of flowing stream network.

Geomorphologic Attributes
On the basis of natural drainage, the UDVC was divided into 39 sub-watersheds with their areas ranging from 174.36 km2 to 1112.85 km2. Based on the guidelines laid by the India Soil and Land Use Survey (A.I.L.U.S) for integrated soil and water conservation planning and execution, the UDVC was further categorized into 716 micro-watersheds ranging from 100 to 1000 ha in area (Fig. 2). The boundaries of micro watersheds and river networks were digitized using ArcViewTM software. Thirty one micro-watersheds and fifteen sub-watersheds were selected for this study (Figure.3).

TAS was used to generate geomorphological parameters using the boundaries of the watershed and DEM as inputs. The parameters were derived based on the watershed shape characteristics (form factor, basin shape, length area, circularity ratio, elongation ratio and lemniscates ratio), relief characteristics (maximum relief, divide average relief, relief ratio, relative relief and hypsometric integral) and stream network characteristics (drainage density, bifurcation ratio, length ratio, slope ratio, area ratio, strahler order, shreve magnitude and highest order channel length) of the watershed. Basic statistics of the derived geomorphologic parameters are given in Table.1. The details of extraction of geomorphologic parameters from TAS software are given by Lindsay (2005). Table 2 shows the formulae used for the derivation of geomorphologic parameters implemented in TAS software.

Soil Attributes
Soil texture distribution across UDVC was obtained from the Soil maps (1: 250 000 scale) procured from National Bureau of Soil Survey and Land Use Planning, (NBSS &LUP) Kolkata, India. The maps were digitized and projected using ArcViewTM software. Soil core data of forty four soil samples was obtained from Soil Laboratory, DVC, Hazaribagh, India. The data consisted of particle size distribution, organic matter content and bulk density derived from twenty sub-watersheds. Information from the soil cores was interpolated for entire UDVC based on the location and textural distribution of sub-watersheds (Table 1).

Vegetation Attribute
The NDVI was used to quantify the vegetation in UDVC. Two multispectral images of IRS LISS III from May, 1998 were used to derive NDVI. The NDVI is a greenness index that is related to the proportion of photosynthetically absorbed radiation. Increase in NDVI value signifies an increase in green vegetation. The values of NDVI typically ranges from -1.0 to 1.0.

Hydrologic Attributes
Monthly rainfall-silt load data were collected from silt observation posts (SOPs) of selected watersheds. These SOPs were installed at the outlets of the watersheds to evaluate the effect of soil conservation measures undertaken in the area. Only first year records prior to undertaking of soil conservation activities were utilized in this study to obtain the natural trends of rainfall-runoff in the watersheds. A total of 188 sets of monthly rainfall-silt load observations were obtained from the selected watersheds.

Data Processing
The parameters derived from STGV attributes along with rainfall and silt load values were standardized using equation 6.

(3) Where xi is the original values, xnew is the standardized value; xmax and xmin are respectively, the maximum and minimum values. Topographic attributes and NDVI values were averaged for the respective watersheds using zonal operations carried out in ArcView 3.2.

Data reduction of the parameters was carried out by PCA out using SPSS statistical software. The results of PCA are discussed in the next section. A total of 188 datasets containing all the input information of standardized monthly rainfall, soil, topography, geomorphology and NDVI were used for prediction of monthly silt load for the selected watersheds. The data was split into two sets (i.e., training and validation). Each training set of 126 datasets was generated by random resampling and the remaining 62 datasets were used as validation set. Fifty random replications of training sets were used for bootstrapping. The final output was generated by bootstrapping aggregation in Neuropath, which averages each bootstrap estimates for all the iterations. The number of iterations for each prediction was set to 100.

Neural network models were developed following a hierarchical approach of inputs for estimation of monthly silt load. The outputs were destandardized to generate the predicted values of silt load for comparison with the observed values. Performance of the neural networks was evaluated by spearman’s correlation coefficient (r) and model efficiency factor (COE) for the validation datasets. The uncertainty of model predictions was quantified by using root mean square error (RMSE) and mean absolute error (MAE).

Optimum number of neurons in the hidden layer for each model was determined by varying the number of neurons and simulating outputs until the best performance was obtained from the validation datasets. The best model for given number of inputs was selected based on the values of r and E approaching one.

RESULTS AND DISCUSSION
Principal Component Analysis
Thirty nine components corresponding to 39 inputs were generated using PCA analysis. Varimax method was applied for orthogonal rotation of components to minimize the number of variables having high loadings on the factors. The first component captured the largest variation in datasets. The remaining components successively captured lesser variations with the increase in component number. Ten components having eigen value greater than one were selected. The remaining components having eigenvalues less than one were rejected since it signified that they explained less variance compared to the original input variables. The ten selected components were able to capture 86.072 % of cumulative variation of the entire input dataset. Table. 3 shows the correlation matrix of inputs with the selected components. Data reduction of input space was carried out by selecting the highest correlated input for each component. Selected inputs corresponding to principal components were; L, Re, RB, Ra, Sc, Sf, NDVI, E, S and Cpr.

Neural Network Analysis
The input architecture of the best performing BANN for a given hierarchy is shown in Table. 4. Model M1 (R-Cpr) with only two inputs showed the least performance (r = 0.778, COE = 0.577). The performance of the models increased with the increase in the inputs from 2 to 6. Model M5 (R-Cpr-Ra-NDVI-L-Sc) showed the best performance (r = 0.897, COE = 0.802) compared to the rest of the models. Further increase in the input, decreased the prediction ability of the models as shown by Model M6 (R-Cpr-Ra-NDVI-L-Sc-RB) and M7 (all inputs). RMSE and MAE indicative of the spread of data decreased with increase in the values of COE and r.

The weights from the BANN model M5 (R-Cpr-Ra-NDVI-L-Sc) were utilized to generate the annual silt load map for the year 2004. Monthly rainfall values of the year 2004 from 28 SOPs were first interpolated in ArcView software using Inverse Distance Weighted (IDW) method of interpolation and were averaged for the microwatersheds using zonal operations. Values of Cpr and Sc were obtained from DEM and digitized soil layers. Ra and L values were derived using the microwatershed boundary layer. Remote sensing data of IRS LISS III corresponding to May, 2004 was used to derive the NDVI. The monthly silt load maps were combined to produce the annual silt load map of 716 microwatersheds in UDVC shown in Figure 8.

The annual silt load map was able to capture the spatial variation within zones of uniform rainfall distributions seen in the map of annual rainfall. This map also depicts the indirect influence of the selected parameters on the distribution of silt load within the DVC catchment. The fine resolutions of the topographic and vegetation attributes and their influence on silt load show a huge potential for more accurate mapping of silt load and subsequently for prioritization of watersheds.

CONCLUSIONS
The mapping of annual silt load for UDVC demonstrates the usefulness of incorporating topography and vegetation parameters along with watershed geomorphologic and soil inputs This study recommends the coupling of statistical techniques (PCA), soft computing tools (BANN) and the use of remote sensing and GIS platforms for better simulations of rainfall-silt load relationships.

REFERENCES
ASCE Task Committee, 2000a. Artificial neural networks in hydrology I: preliminary concepts. Journal of Hydrologic Engineering 5 (2), 115-123.

ASCE Task Committee, 2000b. Artificial neural networks in hydrology II: hydrologic applications. Journal of Hydrologic Engineering 5 (2), 124-137.

Carter, b.J., Ciolkosz, E. J., 1991. Slope gradient and aspect effects on soil development from sandstone in Pennsylvania. Geoderma 49, 199-213.

Dawson, C. W., Wilby, R. L., 2001. Hydrological modelling using artificial neural networks. Progress in Physical Geography 25 (1) , 80-108.

Dennis, J.E., Gay, D.M., Welsch, R.E., 1981. NL2SOL – An adaptive nonlinear least squares algorithm. ACM Transactions on Mathematical software 7, 348-368.

Hsu, K.-L., Gupta, H.V., Sorooshian, S., 1995. Artificial neural network modeling in rainfall-runoff process. Water Resources Research 31 (10), 2517-2530.

Lindsay J.B. 2005. The Terrain Analysis System: A tool for hydro-geomorphic applications. Hydrological Processes 19, 1123-1130.

Maier, H. and Dandy, G.C., 2000. Neural networks for the prediction and forecasting of water resources variables: a review of modeling issues and applications. Environmental Modeling Software 15, pp. 101-124

Minasny, B., McBratney, A. B., 2002. The neuro-m method for fitting neural network parametric pedotransfer functions. Soil Science Society of America Journal 66, 352-361.

Moore, I.D., Gessler, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soil attributes prediction using terrain analysis. Soil Science Society of America Journal 57, 443-452

Nagy, H.M., Watanabe, K., Hirano, M., 2002. Prediction of sediment load concentration in rivers using Artificial Neural Network Model. J. Hydraulic Eng. (ASCE) 128 (6), 588-595.

Sarangi, A., Bhattacharya, A.K., 2005. Comparison of Artificial Neural Network and regression models for sediment loss prediction from Banha watershed in India Agric. Water Manage. 78 ,195-208. Sharma, S. K., Mohanty, B.P., Zhu. J., 2006. Including Topography and Vegetation Attributes for Developing Pedotransfer Functions. Soil Science Society of America Journal 70, 1430-1440.

Zhang, B., Govindaraju, R., 2003. Geomorphology-based artificial neural networks (GANNs) for estimation of direct runoff over watersheds. Journal of Hydrology 273 (1), 18-34