Michael f goodchild
proessor of geography, university of california, santa barbara, usa
Email: [email protected]
GIS began as a highly specialized application of information technology, with its own hardware devices for input and output, its own data structures, and its own algorithms for data processing. Through time more and more aspects of GIS have become mainstream, and more and more standard approaches have been adopted to replace earlier specialized ones, taking advantage of the economies of scale inherent in the mainstream. However there are many reasons for treating geographic information as special, and for educating specialists in GIS concepts, principles, and use. The paper enumerates many of these, and presents the arguments against wholesale adoption of mainstream practices. Special attention is paid to metadata standards and the process of search over distributed archives for GIS data sets. The future health of the GIS industry depends on knowing when to generalize and when to specialize.
In the movie Sleepless in Seattle the two leads, Meg Ryan and Tom Hanks, are linked through a common problem-an inability to sleep at night, and a consequent addiction to all—night phone-in radio. In classic Hollywood style we know the end of the movie-they will meet and live together happily ever after, even though they currently live on opposite sides of the US—but the entertainment lies in the numerous false starts and temporary disappointments of the mating dance.
GIS too has its mating dance with the information technology mainstream, as the two players circle around each other, cautious about surrendering autonomy but conscious of the certainty of the conclusion. We want the benefits that being part of the mainstream would bring, but celebrate the special nature of GIS, and are not at all sure that we want to lose our uniqueness. In this paper I explore both sides of this debate: the arguments for joining the mainstream, together with the impediments that remain in the way; and the arguments for a separate identity for GIS, together with the ways in which that separate identity can be preserved and possibly strengthened.
For the sake of simplicity I refer to the first set of arguments as those of the lumpers, and the second as those of the splitters, and the paper is structured as a debate in which both sides first present their cases, followed by the building of a consensus. I use the term GIS to refer to the entire geospatial complex of data, systems, services, and community.
The problems weren’t so special after all
Some of the earliest roots of GIS are found in the Canada Geographic Information System (CGIS; Foresman, 1998), a massive investment by the Government of Canada in the mid 1960s. CGIS was designed to solve a very specific problem, the compilation of summary statistics from tens of thousands of map sheets. A large part of Canada’s land mass had been mapped at a very detailed scale and in the form of several layers, including the capability of the land resource for various kinds of activities, and the current uses of the land. The statistical analysis involved two tasks, the overlay of layers and the measurement of area, that are notoriously difficult, expensive, tedious, and inaccurate when done by hand. Computerization offered the potential for accurate and fully automated analysis, and even though the costs were spectacular they were still less than those of a traditional manual analysis. The CGIS project resulted in the solution of many important problems in geospatial digitization, data modeling, indexing structures, and algorithm design.
I use the example of CGIS because it illustrates the very special nature of early GIS, relative to what was then the computing mainstream. In the mid 1960s almost no-one had thought of using a computer to process the information found on maps; it was far from clear how one could get the information into a computer, and what purpose would be served. Although the project used a standard mainframe and storage devices, the contractors developed a map scanner uniquely designed for large-format maps, with few applications outside GIS. Similarly, the CGIS topological data structure became the basis for many others, including ESRI’s coverage model, but has no obvious analogs outside the geospatial field. Through time, however, the uniqueness of these aspects of GIS has become less and less significant. Few GIS projects today use map scanners, because so much geospatial information is already in digital form, and most paper maps are themselves products of digital databases.
The history of the topological data structure provides a typical example of how GIS has become less special. It was originally devised to solve two related problems associated with the digital representation of a certain type of map commonly found in environmental and resource management, and it is not surprising that these turn out to be the most successful areas for the nascent GIS industry of that period. All of the layers of CGIS looked similar: they divided the space into irregular areas using thin lines, and gave each area a uniform class, drawn from a small set of defined classes. This type has been called the area-class map (Burrough and Frank, 1996), and it is used to characterize soils, land uses, land cover, and vegetation, among other applications. When such maps need to be digitized, the obvious approach is to regard each area as a separate unit, and to create a polygon by digitizing a sequence of points around its boundary. But when applied to an area-class map, this approach will result in the double-digitizing of each internal boundary, adding to what is already a tedious and time-consuming task. Moreover, the two versions of each internal boundary will differ, resulting in a weaving that looks unsightly. The topological data structure of CGIS solved this problem by treating each common boundary between two adjacent areas as the record, rather than each area, and treating areas as collections of such boundaries or arcs. With each arc were associated pointers to the areas on each side, and to the junctions or nodes at each end.