Home Articles Surface approximation of Point Data using different Interpolation Techniques – A GIS...

Surface approximation of Point Data using different Interpolation Techniques – A GIS approach

Priyakant, Niva Kiran Verma, L.I.M. Rao and A.N. Singh
Remote Sensing Applications Centre, Uttar Pradesh
Kursi Road, Lucknow – 226 021
Email : [email protected]

1.0 Project Background
Uttar Pradesh Sodic Land Reclamation Project (UPSLRP) is a major reclamation programme to improve agricultural productivity over nearly 2.4 lakhs Ha of ‘USAR’ lands (Sodic Lands) in the state. RSAC-UP was assigned task of identification and mapping of B+ (Double Cropped Sodic area), B (Single Crop Sodic area) and C (Barren Sodic Land) categories of sodic land based on Remote Sensing data. The identified plots were transferred on Cadastral maps of the Villages for U.P. Bhumi Sudhar Nigam (UPBSN) to take up for reclamation. In addition to this RSAC-UP were also assigned to other functions like project Durability studies, Expansion/Reduction of Sodic land studies, Environmental monitoring to assess the impact of reclamation on Soils, Ground water, Surface Water and Biodiversity. This paper is based on part of environmental monitoring studies carried out from Ground Water datasets for six years i.e. from 1995-2000 in both pre and post monsoon seasons using Geographical Information System (GIS).

2.0 Introduction
The advent of electronic and computing techniques coupled with GIS has increased the potential of creating and maintaining databases using geographic space as a key field. A GIS database has proved very useful especially for GW studies when vital decisions are required which are going to affect the entire ecology; economic and socio-economic setup of the area. Such application based on the holistic approach through the integration of various spatial and non-spatial elements and understanding of their dynamic inter-relationships.

Quantitative assessment of ground water quality and depth conditions require a highly organized data collection programme that includes statistical evaluation of monitoring results (Nelson and Ward, 1981; Ward, 1989). However, with the use of GIS technique, the study got the potential to complete the monitoring programme in time, accurately and to evaluate the results. This paper describes how GIS have been used in GW monitoring programme to improve the overall effectiveness and minimize the complexity of the results.

Many kinds of geographic data are often collected at irregular intervals. e.g. Ground Water data, elevation data etc. require a great deal of effort. At the same time, it is not possible to make such measurements at all the desired places and hence there is often a need to estimate values for locations where there are no measurements. These estimates are based on the data available and preferably understanding of the spatial variation of the phenomena. GIS is one such tool, which statistically estimates such survey to maximize the amount of information. This information is based on the fact that objects that are near are more important than those that are far away. With the limited numbers of sampled data, the values for places where there are no measurements, can be interpolated using suitable interpolation technique depending on the physiography of the area.

In any interpolation, the basic decision is to choose a model for the statistical relationship between data inputs. The most common models for such relationship are weighted function and Spline (Nagy and Wagle, 1979). Both functions are analytical description of how to use known information to estimate, that which is unknown. Suppose we wish to estimate the Ground Water (GW) depth at a point C in-between the two measurement points A & B (Fig-1). If a line is drawn joining A and B then this line describes a locus of points that corresponds to an explicit interpolation model. This simple model is based on the rule that – Change in GW depth between any two adjacent control points are linear with distance. Thus, it is simple matter to calculate the predicted depth at any point between the two control points.


Figure 1 Principal of Interpolation
In GIS the estimated results are stored as raster data (grids) that represents geographical feature and variable by dividing the world into discrete squares called cells, for which a value is stored. These cells are often called grids. Each cell knows its location implicitly from the origin point and its location relative to origin. The exact location of each cell is not stored, just the origin, cell size and no. of cells from the origin are stored. Continuous variables are represented in GIS as surface, where the value of other locations within the same cell can be interpolated from the cell center and the centers of the neighboring cells. Since real surfaces vary continuously, it is impossible to record all the locations that define them. Therefore the surface models, which approximate them, take representative samples of the infinite number of locations possible, using a mathematical technique called interpolation to fill the gaps between the samples. How well these gaps are filled depends on how good the source data is and type of interpolation technique used. The resulting surface (grid) theme is the best estimate of the trend on the actual surface for each location. The surface interpolator makes certain assumptions about how to determine the best estimated values.

  Surface is an abstract class used to represent continuous spatial phenomena. Although the underlying surface data is continuous, a surface can represent nested holes and islands using masks. Only one z-coordinate value is allowed for each planimetric location on a Surface. In a context of spatial mathematics, each surface object can be represented by a single-valued function z = f (x, y), where z can be an elevation value or any other kind of measurement with respect to the application domain. Factor to be used to correct mismatch between X/Y and Z units. In the context of data structuring, a surface object is a structure that maintains neighborhood relation useful for computational-based functions similar to the finite-element approach. Using the neighborhood relation, a computation unit can be formed to facilitate and simplify various calculation, e.g. spatial gradient (slope), aspect, sun reflectance, flow, isolines, line of sight, surface area, volume, etc.

Different interpolator produces good estimates, however, the selection of appropriate interpolator entirely depends upon the physiography and distribution of points in the area. Resulting surface defined over a two dimensional region, is one of the most important object in GIS. Continuous variables such as terrain elevation and atmospheric temperature are approximately modeled by TIN or grid in GIS (Sadahiro, 2001). Scalar fields modeled as surfaces in GIS are analyzed not only visually but also mathematically and even statistically. There have been proposed numerous methods in the literature which are classified as- a) Method so called geostatistics including variogram, correlogram, and krigging (Isaaks and Srivastava 1989, Cressie 1993) used for spatial analysis as well as for spatial interpolation. b) The second type of method focuses on modeling surfaces by mathematical functions. It includes trend surface analysis (Tobler 1964, Bailey and Gatrell 1995) and its extensions (Hagget 1968, Griffith 1981) used for modeling simple surfaces.

3.0 Methodology:
Visiting every location in a study area to measure the height magnitude or concentration of phenomena is usually difficult or expensive. Instead, strategically dispersed sample point locations are selected and using the surface creation function, an estimated value is assigned to all other locations. The input points can be either randomly or regularly spaced points containing heights, concentration or magnitude measurements.

Ground water depth was measured from operational open as well as tube wells and samples were collected for quality analysis in pre and post monsoon seasons for a period of six years i.e. 1995-2000 to monitor the changes in the given period of time. The analysis for various water qualities like EC (Electrical Condutivity), pH and SAR (Sodium Absorption Ratio) was carried out in the laboratory.

Ground Water sampling locations were first marked over topographic sheets of 1: 50,000 scale and latitude and longitude were calculated for each point. These locations were entered into GIS as point coverage through digitization and transformed into geographical coordinate systems. These points were used as input in interpolation process. Each point location was assigned a unique code in their feature attribute table (PAT). The ground water depth and corresponding chemical data (EC, pH and SAR) for each point were entered as separate database file (.dbf). These associated information were linked to the corresponding point data through a common field (sampling code) for the surface approximation. Arcview (ver.3.2a) GIS software with spatial extension module was used for surface interpolation.

Two types of models were used in the study to represent the surfaces: Grids and TINs so that the results of which best suits our application can be carried further for more advanced studies. Fig: 2 shows the various steps involved in surface approximation using different interpolation techniques.


Figure 2 Flow Chart Showing different steps involved in surface approximation using Grid and TIN methods
3.1 Grid Analysis:
Grids represent surface using a mesh of regularly spaced points. One can estimate value anywhere within the mesh by anything nearby mesh point’s value, giving more weight to those that are close. The resulting grid theme is the best estimate of what the quantity is on the actual surface for each location. Surface interpolator makes certain assumptions about how to estimate the best values, that inturn depends how the sample points are distributed in the area. In this study two-grid surface interpolator (IDW and Spline) were used and the results from which were compared to best suit our application.

3.1.1 Inverse Distance Weighted (IDW):
This interpolation technique weights the contribution of each input (control) points by a normalized inverse of the distance from the control point to the interpolated point. IDW assumes that each input point has a local influence that dimiminishes with distance. It weights the points closer to the processing points, greater than those farther away. A specified number of points, or all points within a specified radius is used to determine the output value for each location. The power parameter in the IDW interpolator controls the significance of the surrounding points upon the interpolated value. A higher power results in less influence from distant points. Each line in a barrier input line theme is used as break that limits the search for input sample points. A line can represent a cliff, ridge or some other interruption in a landscape. A choice of number of barriers will use all points specified in the No. of Neighbors or within the identified radius. (Fig-3e) shows the surface created from sampled GW height data for the year 2000 post monsoon using IDW technique by keeping output cell size of 0.000796 mm, No. of Neighbors = 12, Power = 2 and No Barrier options in order to get the surface containing 250 rows and 313 columns. This result was kept uniform for all the data processed using various techniques.


Figure 3 Comparative view of results obtained from TIN, Spline and IDW
3.1.2 Spline:
This technique fits a minimum curvature surface through the input points. Conceptually, it is like bending a sheet of rubber to pass through the points, while minimizing the total curvature of the surface. It fits a mathematical function to a specified number of nearest input points, while passing through the sample points. This method is best for gently varying surfaces where change in physiography or other phenomenon is not abrupt. It is not appropriate if there are large changes in the surface within a short horizontal distance because it can overshoot estimated values. The regularized method yield a smooth surface as the weight parameters defines the weight of the third derivative of the surface in the curvature minimization expression. The estimated surface for GW depth data for the year 2000 post monsoon (Fig-3d) contains output grid cell size of 0.000796, 250 rows, 313 columns, weight 0.1 and No. of points = 12 to get surface containing 250 rows and 313 columns.

Both the method produces good estimates, but neither estimates every unknown value with perfect accuracy. The interpolation solely depends on the no. of sampling locations and how evenly and effectively it is distributed over the study area. In the present, study an average of 20 sample points were taken in the entire study area.

3.1.3 Validation of Result:
In order to validate the results obtained through spline, of the total available samples, few samples were randomly selected and kept as test samples. Remaining samples were used for surface approximation using spline method. It has been observed that values at test samples were nearly similar to the values at corresponding points over surface generated without considering those test samples. Again this result was found to be similar to those considering all the sampling points. This confirms that the sampling locations available were able to predict or interpolate values at points beyond or at locations where sampling points were not available.

3.2 Triangular Irregular Network (TIN) analyses:
A TIN is an object used to represent a surface. Since representation of a surface can be done in many different ways, TIN (Triangulated Irregular Network) also implies a specific storage structure of surface data. TIN partitions a surface into a set of contiguous, non-overlapping, triangles. A height value is recorded for each triangle node. Heights between nodes can be interpolated thus allowing for the definition of a continuous surface. TINs can accommodate irregularly distributed as well as selective data sets. This makes it possible to represent a complex and irregular surface with a small data set. The TIN Edge (An index of a triangle edge, value 0 to 2) class provides access to information about edges of a TIN for surface analysis purposes. Each TIN Edge object is an edge of a triangle that contains the edge elements of a TIN. Z-Factor is used to correct mismatch between X/Y and Z units.

TIN was created for ground water height for the year 2000 post monsoon period (fig 3b). The second step involved the conversion of TIN into grid in order to visualize the surface keeping the cell size (0.000796) and no. of rows (250) and columns (313) same to that of surface generated from grid analysis (fig 3c).
 
4.0 Results and Discussions:
Overlaying functionality of GIS was used for comparison of results obtained from Spline, IDW and TIN techniques.

4.1 IDW Vs Spline
IDW estimates grid cell values by assigning the values of sample data points in the vicinity of the cell. The closer a point is to the center of cell being estimated, the more influence or weight it has in the averaging process. On the other hand, Spline estimates grid cell values by fitting a minimum curvature surface to the sample data. The results from the two interpolation techniques are different even for the study area, which is very small. Considering the example of surface creation for GW depth for post monsoon 2000, both methods produce good estimates and classify the area into four predefined classes (5m). However, percentage occurrence of each class in the study area was found to be different. GW classes 5m were found to be 8.4%, 22%, 60.8% and 8.8% respectively using IDW whereas they were 9.5%, 25.5%, 52% and 13% respectively by spline method (Fig-4). Results from two techniques suggest that neither estimation process could be generalized for a particular application. Thus, the choice of proper interpolation technique is highly area specific. Again spline suits to those areas where there are little variations in the data within a short horizontal distance. IDW on the other hand suits for such kind of data, in which the value has a local influence that diminishes with distance. Since there are little variations in the GW value in the entire area and with the prior knowledge of the physiography of the area, which is nearly level to gently sloping, the results obtained from spline interpolation is found to be more closer to the expected value. Therefore spline interpolation technique was found more appropriate for our application.


Figure 4 Class view GW depth variation from Spline and IDW Interpolation Techniques
4.2 Grid Vs TIN
Since TIN is a triangulation method it excludes the area outside the point location extent whereas grid assumes entire rectangular area displayed. It means there is no extrapolation in case of TIN. Therefore to use TIN, in addition to well-distributed sampling locations in the area, appropriate number of points should also lie on the periphery or even outside the study area. Grid model is simple and processes on theme tend to be more efficient than TIN, which is two-step process, first generation of TIN and second conversion of TIN into grid.

Results from TIN are nearly similar to that of spline interpolation as compared to that of IDW in the study area. Moreover the rigid mesh structure of grid does not adapt to the variability of terrain (losing information between mesh points), source data may not be captured and reflected properly in resulting analysis like interpolation. However, if there is slight variation in the terrain information grid (spline) analysis is best-suited interpolation technique. The resolution of TIN can vary, that is, they can be more detailed in the areas where the surface is more complex and less detailed in the areas where the surface is simpler. The coordinates of the source data are maintained as part of the triangulation so subsequent analysis like interpolation results in no information loss. Linear feature like roads and streams can be represented by enforcing them in the model as triangle edges. Grids on the other hand being used if source data’s positioned accuracy is not very high or there is no need of using linear feature like roads and streams exactly. On the other hand, if the source data is very accurate and one would like to maintain the accuracy or to represent linear features TIN can be used.

Since the study area has very little variation in topography and very high positional accuracy is not required in the GW study the surface approximation based on grid analysis was carried out for further processing for pre and post seasons from 1995-2000. The resulting surfaces from grid were used for weighted analysis giving overall situation of the area in a span of six years.

4.3 Weighted Analysis
Since the project deals with vast amount of data to be processed, a model was designed to use results from various analysis to get a cumulative result that facilitates the dynamism within the change at the same point of time. Surfaces created for Depth to GW and other GW quality parameters such as EC, pH and SAR in GW in both pre and post monsoon periods from 1995 to 2000 were used in model. The idea was to get a scenario that represents the overall situation of the area in context of above parameters in a span of six years. GIS have not only made it possible to process, analyze and combine such type of data but they have also made easy to organize and integrate spatial data into larger system to accomplish various correlation studies.

Surfaces created for Depth to GW in each year were used as input theme for the model. The input data were reclassified which replaces the input grid values with another set of values depending upon the ranges fixed for ground water heights. In this study, four ranges of GW heights were set which are 5m. When the grid values are reclassified by range (four ranges here) each new value in output grid is assigned a range of numbers based on the values in input grid.

The reclassification step was done for all the six years data in order to get homogeneity in the input values. A weighted overlay function was performed over the reclassified values to create a single output that combines all six years data into a single value depending upon the weightage assigned to each theme. This is based on the fact that usually the theme or factors in the input values are not equally important. Thus one has to prioritize the themes depending upon their importance by assigning weightage to them. The weightage function overlay lets all those issues into considerations. It again reclassifies the values in the input grid theme onto a common evaluation scale of suitability. The input grids are weighted by importance and added to produce an output grid. In this study, all the six-year data have given equal importance (weightage). This implies that 16.5 percent of influence for each year data over the derived output to make the total influence for all the six year themes equal to 100 percent.

The resulting cell values are added to produce the output grid, which represent the overall trend of Depth to GW from 1995-2000.

The same sequence of steps was repeated for Ground Water quality like EC, pH and SAR. Four ranges for all the three categories were set. For EC the range was 2250. For pH the ranges were 9.5. For SAR the ranges were 10.

Fig 5: shows the model constructed to get the weighted map and Fig 6: shows the year wise and weighted overlay map of GW depth in post monsoon period from 1995-2000.


Figure 5 Model designed to compute the weighted average of 5 years GW data

Figure 6 Temporal variation of depth to ground water levels
The resulting weighted grid was converted into vector format and clipped with respect to study area giving rise to overall situation of the study area in a span of six years (1995-2000). The same process was repeated for other GW parameters for both pre and post monsoon periods.

Conclusions:
On the basis of above studies and the results obtained, following concluding remarks can be made:

  • Both grid and TIN can be used for surface approximation.
  • In grid, spline is best suited for gently varying topography in contrast to IDW, which is more suitable for area with larger topographical variation.
  • TIN is computationally complex and is a two-step process as compared to grid, which is one-step process.
  • The spatial variation of results from Spline and TIN are nearly similar in the study area. Only difference being in no data area in TIN, which has been extrapolated in spline.
  • Although both grid and TIN methods produce good estimates, neither estimation process can be generalized for a particular application.
  • The choice of proper interpolation technique is highly area specific.
  • Grids are usually used more for region, small-scale applications, while TINs are used for more detailed and large-scale applications.
  • In the present study Spline interpolation results has been found to be in good agreement with the expected results.
  • Weighted overlay analysis helps in incorporating multiple data sets for various correlation studies.
  • The above study realizes the potential of GIS for various spatial analyses to get a scenario that cannot be perceived with independent primary layers or non-spatial databases and can be easily comprehended to arrive at a proper decision.

References:

  • Batty, M et al. (1994a). Modeling inside GIS. Part 1: Model structures, exploratory spatial data analysis and aggregation. International journal of GIS, 8: 291-307
  • Doucette P et al. (2000). Exploring the capability of some GIS surface interpolators for DEM gap fill. PE & RS, 66: 881-888
  • Hutchinson, M. F. (1995). Interpolating mean rainfall using thin plate smoothing Spline. International Journal of GIS, 9: 385-404
  • (1994b). Modeling inside GIS. Part 2. Selecting and calibrating urban models using Arc/Info. . International journal of GIS, 8: 451-470
  • Mitasova et.al (1995). Modeling spatially and temporally distributed phenomena: new methods and tools for GRASS GIS. International Journal of GIS, 9: 433-446
  • Varekamp, C et al. (1996). Spatial interpolation using public domain software. PE & RS, 62: 845-854.
  • Sadahiro Y (2001). Analysis of surface changes using primitive events. International Journal of Geographical Information Science, 15: 523-538

4.0 Results and Discussions:
Overlaying functionality of GIS was used for comparison of results obtained from Spline, IDW and TIN techniques.

4.1 IDW Vs Spline
IDW estimates grid cell values by assigning the values of sample data points in the vicinity of the cell. The closer a point is to the center of cell being estimated, the more influence or weight it has in the averaging process. On the other hand, Spline estimates grid cell values by fitting a minimum curvature surface to the sample data. The results from the two interpolation techniques are different even for the study area, which is very small. Considering the example of surface creation for GW depth for post monsoon 2000, both methods produce good estimates and classify the area into four predefined classes (5m). However, percentage occurrence of each class in the study area was found to be different. GW classes 5m were found to be 8.4%, 22%, 60.8% and 8.8% respectively using IDW whereas they were 9.5%, 25.5%, 52% and 13% respectively by spline method (Fig-4). Results from two techniques suggest that neither estimation process could be generalized for a particular application. Thus, the choice of proper interpolation technique is highly area specific. Again spline suits to those areas where there are little variations in the data within a short horizontal distance. IDW on the other hand suits for such kind of data, in which the value has a local influence that diminishes with distance. Since there are little variations in the GW value in the entire area and with the prior knowledge of the physiography of the area, which is nearly level to gently sloping, the results obtained from spline interpolation is found to be more closer to the expected value. Therefore spline interpolation technique was found more appropriate for our application.


Figure 4 Class view GW depth variation from Spline and IDW Interpolation Techniques
4.2 Grid Vs TIN
Since TIN is a triangulation method it excludes the area outside the point location extent whereas grid assumes entire rectangular area displayed. It means there is no extrapolation in case of TIN. Therefore to use TIN, in addition to well-distributed sampling locations in the area, appropriate number of points should also lie on the periphery or even outside the study area. Grid model is simple and processes on theme tend to be more efficient than TIN, which is two-step process, first generation of TIN and second conversion of TIN into grid.

Results from TIN are nearly similar to that of spline interpolation as compared to that of IDW in the study area. Moreover the rigid mesh structure of grid does not adapt to the variability of terrain (losing information between mesh points), source data may not be captured and reflected properly in resulting analysis like interpolation. However, if there is slight variation in the terrain information grid (spline) analysis is best-suited interpolation technique. The resolution of TIN can vary, that is, they can be more detailed in the areas where the surface is more complex and less detailed in the areas where the surface is simpler. The coordinates of the source data are maintained as part of the triangulation so subsequent analysis like interpolation results in no information loss. Linear feature like roads and streams can be represented by enforcing them in the model as triangle edges. Grids on the other hand being used if source data’s positioned accuracy is not very high or there is no need of using linear feature like roads and streams exactly. On the other hand, if the source data is very accurate and one would like to maintain the accuracy or to represent linear features TIN can be used.

Since the study area has very little variation in topography and very high positional accuracy is not required in the GW study the surface approximation based on grid analysis was carried out for further processing for pre and post seasons from 1995-2000. The resulting surfaces from grid were used for weighted analysis giving overall situation of the area in a span of six years.

4.3 Weighted Analysis
Since the project deals with vast amount of data to be processed, a model was designed to use results from various analysis to get a cumulative result that facilitates the dynamism within the change at the same point of time. Surfaces created for Depth to GW and other GW quality parameters such as EC, pH and SAR in GW in both pre and post monsoon periods from 1995 to 2000 were used in model. The idea was to get a scenario that represents the overall situation of the area in context of above parameters in a span of six years. GIS have not only made it possible to process, analyze and combine such type of data but they have also made easy to organize and integrate spatial data into larger system to accomplish various correlation studies.

Surfaces created for Depth to GW in each year were used as input theme for the model. The input data were reclassified which replaces the input grid values with another set of values depending upon the ranges fixed for ground water heights. In this study, four ranges of GW heights were set which are 5m. When the grid values are reclassified by range (four ranges here) each new value in output grid is assigned a range of numbers based on the values in input grid.

The reclassification step was done for all the six years data in order to get homogeneity in the input values. A weighted overlay function was performed over the reclassified values to create a single output that combines all six years data into a single value depending upon the weightage assigned to each theme. This is based on the fact that usually the theme or factors in the input values are not equally important. Thus one has to prioritize the themes depending upon their importance by assigning weightage to them. The weightage function overlay lets all those issues into considerations. It again reclassifies the values in the input grid theme onto a common evaluation scale of suitability. The input grids are weighted by importance and added to produce an output grid. In this study, all the six-year data have given equal importance (weightage). This implies that 16.5 percent of influence for each year data over the derived output to make the total influence for all the six year themes equal to 100 percent.

The resulting cell values are added to produce the output grid, which represent the overall trend of Depth to GW from 1995-2000.