Home Articles Some aspects of accuracy in GIS

Some aspects of accuracy in GIS

M.D. Joshi
Associate Professor Civil Engineering Department
Sultan Qaboos University MUSCAT (OMAN) 
[email protected]

R. Siva Kumar
Director, Military Operations Directorate of the Army HQ. India 
[email protected]

Uncertainty is a significant problem in GIS because spatial data tend to be used for purposes for which they were never intended, and because the accuracy problems in GIS require considerations of both object oriented and field oriented views of geographic variations. Map accuracy is relatively a minor issue in cartography, and map users are rarely aware of the problem. Maps use a very constrained technology of pen and paper to communicate a view of the world to their users. Cartographers feel little need to communicate information on accuracy, except indirectly through map quality statement or in detailed legends. But when the same map is digitised and input to a GIS, the mode of use changes. The new uses extend well beyond the domain for which the original map was intended and designed.

A quantitative analysis of maps is not new to GIS, but GIS brings accuracy issues into focus. Moreover, machines used to make measurements in GIS (digital computers) are inherently far more precise than the machines of conventional map analysis. The bitter truth is that all geographical data are inherently inaccurate, and these inaccuracies will propagate through GIS operations in ways that are difficult to predict.

Some of the experiments carried out in the laboratory with map and its accuracy have been discussed in the succeeding paragraphs. Error analysis in spatial databases is very important, with a direct bearing on the accuracy of GIS and hence requires a due consideration.

Map accuracy is an important issue consisting of two aspects viz. (i) Attributes, and (ii) Spatial position, whether relative or absolute. The accuracy of spatial database is a derivative of the map and all results obtained are compared to the accuracy of the map. The map data, raster data and vector data requires to be compared for all the three graphic primitives viz. Point, Line and Area. One of the major requirements in the Digital Topographic Data Bases (DTDB) of a large country is consideration of a variety of features. These features may vary according to the topographical conditions of the regions of the country; and, in a large country, there could be many types of topography. As in the case of India, there could be three different types like snow-clad high hills, vast plains and deserts.

Maintenance of digital topographic database is an aspect that needs to be investigated. Digital stereo photogrammetry may be helpful in extraction of cartographic features with a greater accuracy with the inception of new knowledge available now. The digital elevation data in the form of contours, thus digital mono plotting can be considered as a quicker method of plotting the data for maintenance of DTDB. Keeping in mind the methodology and the availability of technology, some specific applications for printing of maps are to be exploited.

Map as a Basic Input for DTDB
Considering the available source of data for developing DTDB, topographical maps are most suitable in view of their rich information contents. Along with maps, aerial photography and satellite images are extra sources for collecting data. However, maps remain the main source of data in view of the abundantly available labelled information, an inherent part of maps. The scale of a map is a very important aspect since the information content depends mainly on the scale of the map. It has been found that 1:50,000 maps are the most suitable for making DTDB in Indian conditions as these maps are readily available and the information contents in such maps are sufficient to serve many purposes.

In order to digitise these maps, the easiest method was found to be scanning of paper maps. However, it has its own pitfalls. The maps printed in India are as old as 70 years, a thorough investigation was made using paper maps. For this purpose six copies of the same map printed in 1929, 1947, 1953, 1966, 1971 and 1975 were used.

Paper used in India for map printing is appropriately called map litho paper. Certain specifications for the paper were decided by the national mapping agency so that the line work printed on a map appears well and there is minimum distortion in the dimension of the paper due to weathering. Usually the weight of the paper used should be 95 grams per square metre (GSM).

  • Thickness 95 m
  • Moisture : 8% maximum
  • Ash content : 15% maximum
  • Brightness : 75 to 80%
  • Opacity : 80 to 90%
  • Dimensional stability 0.4%maximum. Stability with change of humidity 20% to 75%
  • pH : 5.5 to 6.5%

These specifications are standardised by the Indian Standards in IS:12765:1989:specifications for paper used for printing of maps.

The study was carried out in two ways. Firstly, the dimensions of a map were checked with the theoretical dimensions. Secondly, the map was scanned and the raster data was brought to the theoretical dimension by rubber sheeting/warping. Then 10 nos. each of different primitives viz. Point, Line and Polygons were computed and the results are tabulated below:

Dimension of the Map: The theoretical dimension of the map on scale 1” = 1 mile, is given in Fig.1.

The actual dimensions of the six maps are tabulated below. As the FPS system was being used in India then, the dimensions are given in inches.

Table 1 Actual Dimensions

Sl No Dimension 1929 1947 1953 1966 1971 1975 Theoretical
1 X1 14.67 14.64 14.71 14.74 14.70 14.74 14.677
2 X2 14.62 14.58 14.68 14.68 14.63 14.67 14.637
3 Y 17.16 17.15 17.17 17.19 17.17 17.18 17.18
4 Diagonal 22.53 22.52 22.60 22.63 22.58 22.58 22.58

From the above, it is observed that the maximum change in X1 was 0.063” in the maps printed in 1966 and 1975. The maximum change in X2 was 0.057” in the map printed in 1947. The maximum change in the diagonal was 0.060” in the map printed in 1947.

As seen from above, the dimension does not really form a pattern due to vagaries of nature, quality of paper and mode of preservation. Hence the error pattern may not be modelled. The maximum distortion in all the four dimensions is of the order of 0.06” which amounts to about 76 metres in ground terms.

Effect of Graphic Primitive after Warping:
The maps were scanned at 300 dots per inch (dpi) resolution and the raster data was warped/ rubber sheeted to the theoretical dimensions. Ten numbers of well distributed points, lines and polygons were chosen. The co-ordinates/lengths/areas of points, lines and polygons are tabulated in tables 2, 3,and 4 respectively. For brevity, only the significant values of the co-ordinates are shown i.e. hundreds. All the values depicted in the following tables are in metres.

Point : The co-ordinates X and Y are read from the monitor for the same 10 points on the six editions of one map. From the results tabulated in table 2, it is noticed that the error is more in the Y direction which is 300 metres in ground terms.

Table 2 Variations in co-ordinates of Points

Point No 1929 1947 1953 1966 1971 1975
Coords X Y X Y X Y X Y X Y X Y
1 238 140 246 158 223 159 235 143 203 158 209 187
2 005 637 021 626 022 648 020 630 037 641 988 616
3 730 809 745 800 732 838 850 784 865 800 774 675
4 976 679 983 666 968 689 963 639 973 637 119 801
5 860 124 873 123 867 128 879 096 882 107 847 027
6 709 248 722 248 720 241 728 235 714 218 767 203
7 334 810 390 814 372 770 373 769 387 764 375 827
8 287 361 302 334 297 359 298 314 290 331 161 302
9 330 051 338 023 326 055 316 992 311 003 352 828
10 838 491 838 462 833 477 837 434 123 062 850 360

Line : Lengths of different lines in metres were measured on the vector data and the results are given in Table 3. The variation in lengths of lines measured is at times even as high as about 100m in a length of about 300 metres.

Table 3. Variations in Lengths of Lines

Line 1929 1947 1953 1966 1971 1975
1 581 566 551 566 562 569
2 888 884 894 886 897 834
3 498 503 490 478 495 528
4 1197 1203 1209 1206 1175 1171
5 375 388 445 402 375 471
6 954 950 940 964 937 955
7 842 848 825 842 856 854
8 290 292 282 285 308 354
9 423 421 420 417 428 440
10 1317 1320 1311 1308 1310 1347

Areas : A similar exercise was carried out for measurement of areas/polygons and the outcome is shown in Table 4. The areas are in sq.m.

Table 4. Variation in Areas of Polygons

No. 1929 1947 1953 1966 1971 1975
1 25126 24998 22444 26284 24416 28632
2 731518 683405 682953 580579 575955 569547
3 225167 236327 22866 22761 21853 269547
4 16952 16860 15209 9973 9399 18608
5 27486 28280 24834 29192 27851 25418
6 24145 23066 23523 22093 19358 6986
7 11281 11701 9853 10693 9642 10245
8 22871 19674 18530 16739 20827 17015
9 6188 7444 9187 5981 6115 6213
10 6755 5164 5768 6180 5134 16395

From the results tabulated above, it is noticed that the areas are in error up to 40% times. The errors are not only due to the dimensional instability of the paper, but also due to other reasons. These errors may not be modelled and may not be corrected. Another problem in scanning the paper is that colour separation is not possible and hence all the details are available on the same scanned dataset. This will result in further increase in time for digitisation. Hence, ideally paper should not be used. For transferring the image on the standing nagatives, film based materials have to be used. Other inexpensive material such as sensitised synthetic paper may be tried out to see whether the image transferred is of the required quality.

Results of Investigation in Scanning:
There is a feeling among a number of professional database experts that one may scan at a higher level of resolution to get all the details. The associated problem with the higher scan resolutions is that the quality of data will be voluminous besides picking additional noise. The efforts required in cleaning up the data will be much more. A balance has to be struck for scan resolution. Another crucial area in scanning is choosing the threshold value so that unwanted information and noise is not picked up. Different threshold values were tried to find out the suitable figure. Depending upon the method of digitisation and the features being digitised, the threshold values and scan resolution are found out. After carrying out scanning of over 200 film positives, the following values were found to be ideal.

Table 5. Scan Resolution for Different Modes of Digitisation

Mode of Digitization Scan resolution in dpi
Manual 300
Semi-automatic 400
Automatic 500
Table 6. Scan Resolution for Different Types of Terrain

Details Scan resolution in dpi
Contour in high hills 600-1000
Contours in hills 500
Contours in undulated terrain 400
Contours in plains 200
Other features which are close 500
Other features which are sparse 300

Polygon Overlay:
One of the problems associated with spatial data is polygon overlay. It is common to find that different themes for the same area share central lines. For example, the shore line of a lake will appear on both the layers showing the limits of land as well as that of hydrology layer. When these themes are overlaid on each other, the two versions of shore line should match precisely. In practice, it is often found that they do not match. Even if they match perfectly on the input documents i.e. film positive, their digital representation in the database differs. As the overlay operation is carried out with a high precision in the digital domain, the differences are treated as real, thus forming spurious polygons more commonly known as Sliver Polygons. During the course of this work, a method to overcome this problem yielded good results in minimising the sliver polygons. In the MicroStation environment, the limit/line is digitised only once in a particular theme. Subsequently, the same line/limit segment is copied on to the other themes from one level to another. When the data pertaining to four sheets on scale 1:50,000 was compared by overlaying all the layers of data, not even a single sliver polygon was detected due to the reasons stated above. However, this method is not feasible if different layers are digitised by different persons. In that case, at the time of merging the layers, one of the lines is deleted and copied from other level. The results are given in Table 8.

Table 7. Threshold Values for Different Media

Media Threshold Values
Film positives 190 – 200
Film negatives < 100
Paper maps 100 –110
Table 8. Sliver Polygons Reduction with the New method

S.No. No. of Sheets Sliver Polygon Noticed No. of Slivers New method
1 4 32 0
2 8 67 2
3 12 79 5

The following alternatives were also tried where two different lines appeared after digitisation. Any number of lines lying within the tolerance distance of each other are assumed to be the same. Similar approaches were used for snap joining the lines if the distance between them is less than the set tolerance. However, this at times gives rise to another problem of conflict in geometry and topology. Because of the tolerance distance, all objects or features smaller than the tolerance become ambiguous or indistinguishable. This problem was overcome by either allowing the topology to correct geometry or allowing geometry to correct topology. Only two such cases were noticed in the pilot database created during the work and they were simply corrected by allowing the geometry to correct topology. However, this may have implications on other objects whose properties may change thereby having a serious effect on the integrity of the data base. The clean command of the software had both auto mode as well as interactive mode. Hundred percent clean data without jeopardizing the other aspects may only be achieved by using the clean command in interactive mode on invoking of which the errors are flagged.

This study reveals that accuracy aspects depend upon source data, mainly the source from which the digitisation is undertaken.

  • Paper maps are not found very suitable to achieve the desired accuracy since the aging of maps affects their dimensional stability and hence there is a great deal of variation.
  • The point, line and area are having different accuracy figures while digitising from maps. And hence it is necessary to try other materials like synthetic sheets, plastic sheets etc. for digitisation and not paper prints.
  • Old maps should be discarded for digitisation.
  • On overlaying, it is necessary that the sliver polygons are removed from the data. This proposition being difficult, necessary methods are required to be evolved. The method of copying the line from one polygon to another appears to be satisfactory provided a job for digitising all the layers of a map is carried out by the same person.