Home Articles Quality control in database input for GIS

Quality control in database input for GIS

M. D. Joshi, R. Sivakumar
Sardar Vallabh Bhai Patel Institute of Technology
Vasad Indian Institute of Technology Delhi
[email protected], [email protected]

Abstract
Accuracy is very important aspect of GIS. This is one of the main factors governing the reliability of information and hence the decision making. To assess the magnitude of errors at different stages of computerisation of information from the source data, it is necessary to consider various factors likely to affect the accuracy. Five indicators of accuracy given in US Spatial Standards (Clarke, 1992) are Lineage, Positional Accuracy, Attribute Accuracy, Logical Consistency and Completeness. These have been considered in the present study.

The error analysis was carried out by various tests on the data available. Above factors were taken into account. In view of the results obtained, it appears that achieving reasonable accuracy is possible with the present data conversion in most of the cases, however, within respective limitations. Although, some times such accuracy is difficult to achieve due to various deviations, inherited with the data or introduced due to methods of conversion / techniques adopted, even then to a greater extent it is found to be possible.

It has been attempted in this study to incorporate as many data conversion from different sources as possible and the results obtained were found to be related to aspects of quality control in a given situation.

Introduction
Reliability of a Geographical Information System (GIS) mainly depends upon its accuracy with which the data is arranged and the way it is integrated and displayed for the purpose of extracting information for decision making. Since decision depends on the information contents, the accuracy of such information is hardly required to be over emphasised. In order to be sure about the accuracy, it is necessary that the data conversion is tested and compared against the data sources, considering prescribed accuracy standards.

The quality standards are such that some of the aspects are related with the figures and ratios whereas some others emphasize on relative quality. However, there are five quality indicators of accuracy given in the US Spatial Data Standards (Clarke, 1992) to consider a good quality GIS.

It is necessary that the data conversion from the data source to the computer database is tested taking these factors into account. The same has been attempted in the present study to present a case for developing a Quality Control method.

Testing of data and errors
The vector data was tested for accuracy when compared with the source data. The quality indicator given in terms of the scales were considered for testing, specially in grading into numbers, ranging from 0 to 9 scale. This provides an opportunity to get an idea about the relative accuracy of the data. In this case a vector data (the source was considered to be the accurate one).

Above factors were tried in a sequence. Some of the basic examinations were necessary to be carried out on the data, which are as follows:

  • Lineage was the basic factor considered for giving quality indicators.
  • The vector data was tested for accuracy for all the well defined details and found to be 90% accurate within an error of 0.5 mm and 100% within an error of 1.00 mm.
  • Attribute accuracy was tested based on polygon overlay, and it was found that 95% of the polygons tested are accurate.
  • Positional accuracy and completeness were tested by super-imposing the raster and vector data in the digitisation stage itself. And found to be satisfactory.
  • The attribute accuracy was also taken care while giving quality indicator.
  • Logical consistency was checked on the database generated by performing topological tests that were possible in the MGE environment. Topologically clean data extraction was one of the main aims of the study. This was achieved by ensuring that all the chains intersect at nodes and remains consistent around the polygons. Also, inner rings embed consistently in closing the polygon. This is a rigorous testing and took considerable time to ensure that the data is topologically clean even in small polygons.

The basic steps in testing quality control were by way of generating a check plot on a vector plotter for comparison with the input data to ensure completeness and integrity of data. A sample vector plot is provided in which the Quality Control was carried out and common errors found which were of the nature of ‘missing features’. The following table gives a sample of two different jobs, which resulted in passing the minimum quality assured in one case and failing in the other.

Table 1: Sample Quality Control Data

Item Case I Case II
Total Features / Files 8561 / 30 43227 / 1297
Sampled Database Features 8561 43227
Database Errors 16 42
Database Errors % 0.19 0.1
Sampled Files 30 128
Sampled Graphic Features 2320 5300
Graphic Errors 4 526
Graphic Error % 0.18 9.92
Status Pass Fail

Error Analysis
The accuracy of data in a GIS is prone to errors from many sources. Right from the stage of aerial photography till the data is burdened with creeping errors. Considering that the maps are derivatives of photogrammetry, the final accuracy of a map may be expressed by law of propagation of variances; 

sMap = (s2 Control + s2 Aerial Triangulation + s2 Orientation + s2 Plotting + 
s2 Cartography + s2 Printing)1/2 

The basic input for the digital topographic database is map and hence the accuracy of digital data by the law of propagation of errors will be: 

sDigital Data = ( s2 Map + s2 Digitisation)1/2 

sMap = 12.5 meters on ground in the case of a 1:50,000 scale map. As we take the film positives the error in digitisation is minimised compared to that of digitisation from the paper map. So it will not be hyperbola if it is stated that the accuracy of digital vector data derived from the film based colour separates on scale 1:50,000 is of the order of 12.5 meters.

The errors in the digital data were investigated. The three basic entities such as point, line and area were investigated. For established points sharp and well defined in nature were chosen. For each type of features 10 objects were chosen and the mean difference is tabulated. For this purpose, the coordinates read from the film positives were assume to be accurate and used for comparison, except in cases where the coordinates are available from previous surveys. For some points GPS coordinates were also recorded from the displayed coordinates on Microstation design files. The variations are in terrain terms in meters. These were found to be different, e.g. for points features the error was ranging 0.30 in Map co-ordinates, up to 0.28 /0.30 in Raster / Vector co-ordinates, and to 0.35 when compared with GPS co-ordinates.

Similar study was conducted for line and area features also. For line features the maximum deviation from the true alignment was taken for each feature, and same procedure adopted. For linear features, the center-line was taken as the true position. The map coordinates were read through coordinatograph. These are found to be different in magnitude varying between -2.00 units to +2.00 units when compared with the map co-ordinates for different features of the map. Similarly, area features were studies and it was found that there is a variation ranging from -3.45 to +1.80 in the areas.

RMSE (Root Mean Square Error) in photogrammetry was found to be varying up to 4.60m in plan and 1.30m in height. This error was found to be maximum in the case of forest boundaries running through undulating terrain. Spot satellite data provided an error of about 38.1m in plan and 22.2m in height. On comparing with DEM, the difference was 26.1m and -20m respectively in plan and height.

Conclusion
From above study, it was observed that the laid down quality control factors as indicated in the US Spatial data standards can be used and a suitable method can be devised to check the adherence to the standards. In the proposed method a suitable sequence was adopted considering the original source data as an accurate data whereas Raster and Vector data as erroneous data due to digitisation and other inherited errors. Although the source data itself may be erroneous by certain amount and needs to be given due consideration. However, in view of the map standards laid down by USGS, the map accuracy could be considered fairly good for accepting the position of ground points as references for such comparison. In view of this fact, the present results obtained from the studies appear to be satisfactory within their own limitation of data used.

References

  • Clarke, Andrew L., 1992, Data Quality Reporting in the Proposed Australian Data
  • Transfer Standard, Proceedings of the Symposium on Spatial Database Accuracy
  • Melbourne, Australia, June 19-20, 199, pp 256-259.