Spatial Products Division, Oracle Corporation
Spatial databases have been an active area of research for over a decade, addressing the growing data management and analysis needs of spatial applications such as Geographic Information Systems (GIS). This research has produced spatial data types and operators, spatial query languages and processing techniques, spatial indexing and clustering techniques. In addition, this research also resulted in several extensions to the traditional relational database systems like extensible indexing and extensible optimizers. This database technology made it possible to provide location-based services to web and mobile applications using standard databases. In this paper, we outline some of the features of a spatial database and show how a mobile location-aware application can be supported using the spatial database services.
Spatial database management systems aim to make spatial data management easier and more natural to users or applications such as urban planning, utilities, transportation, and remote sensing. Even though traditional database technology has been evolving for the last thirty years, managing spatial data with database system poses many challenges. Databases are traditionally used in business and administrative applications. In such applications the common data types encountered are integer, float, character, monetary-unit and date. And the type of operations performed on these data types are simple arithmetic and logical operations like addition, subtraction, less than, greater than, etc. This limited set of data types and operations makes the modeling of real-world spatial applications extremely difficult. Hence, the recent research in database systems has focussed on efficiently storing and managing complex data like the spatial data. In this paper, we discuss how these new relational databases can be used to solve the problems posed by spatial data management.
A common example of spatial data can be seen in a road map. A road map is a two dimensional object that contains points, lines, and polygons that can represent cities, roads, and political boundaries such as states or provinces. A road map is a visualization of geographic information. The location of cities, roads, and political boundaries that exist on the surface of the Earth are projected onto a two-dimensional display or piece of paper, preserving the relative positions and relative distances of the rendered objects. The data that indicates the Earth location (latitude and longitude, or height and depth) of these rendered objects is the spatial data. When the map is rendered, this spatial data is used to project the locations of the objects on a two-dimensional piece of paper. A GIS is often used to store, retrieve, and render this Earth-relative spatial data. Other types of spatial data include data from computer-aided design (CAD) and computer-aided manufacturing (CAM) systems.
These applications all store, retrieve, update, or query some collection of features that have both non-spatial and spatial attributes. Examples of non-spatial attributes are name, soil_type, landuse_classification, and part_number. The spatial attribute is a coordinate geometry, or vector-based representation of the shape of the feature. The spatial attribute, referred to as the geometry, is an ordered sequence of vertices that are connected by straight-line segments or arcs. The semantics of the geometry is determined by its type, which may be one of point, line string, or polygon.
What are Spatial Databases?
GIS applications today usually store the spatial data and non-spatial or attribute data separately. These systems store spatial data describing the spatial properties of objects in files managed by a file management system. GIS applications then store the attribute data of these objects in a commercial database (like a Relational Database). This split data model has several drawbacks, as it is difficult to maintain data integrity between the spatial data and the attribute data, as the two data are not managed by the same database engine. The ideal solution is an information infrastructure that includes a single database system for managing spatial data, with a data structure that is independent of the application. There are several benefits to managing the spatial and attribute data in a single database. Key benefits of this approach to spatial data management include:
- Better data management for spatial data. Traditional GIS users gain access to complete spatial information system based on industry standards with an open interface to their data (i.e., SQL).
- Spatial data is now stored in enterprise wide DBMS, making it possible to spatially enable many enterprise applications.
- Reduces complexity of systems management by eliminating the hybrid architecture of GIS data models.
- Allows for the seamless integration of MIS and GIS data stores, delivering applications that meet the increasingly demanding analysis and reporting needs of a growing geospatial user community.
Challenges of Spatial Databases
Conventional relational databases often do not have the technology required to handle spatial data. Unlike the traditional applications of databases, spatial applications require the databases to understand more complex data types like points, lines, and polygons. Also the operations on these types are complex when compared to the operations on simple types. Hence we need new technology to handle spatial data. Egenhofer (1993) has identified four main properties of the spatial data, which sets them apart from the traditional relational data.
Geometry is a main property in any kind of spatial data. Geometry deals with the mathematical properties of an object. These properties include measurement (metric), relationships of points, lines, angles, surfaces, and solids (topology) and order. A simple geometry is usually constructed from the geometric primitive such as points, lines and areas. Complex geometries are constructed from collections of simple geometries. In addition, there are a number of geometric relationships between two geometries, which are very important for dealing with spatial data. For example, a connectivity relation describes how two geometries are connected (on a road map, how one intersection is connected to another intersection). Metric relationships deal with the distances between two geometries. For example, what are all the cities located within 10 miles of a given road? Geometry is usually represented using a vector data model (where each geometry consists of a set of points) or a raster data model (where each geometry is an image).
Distribution of Objects in Space
Spatial objects are usually very irregularly distributed in space. Consider the case where we model all the cities in the United States as spatial objects (points). Then the distribution of cities on the east coast is very dense where as the distribution of cities in the Arizona, Nevada areas is very sparse. In addition, different spatial objects have largely varying extents. For example, if we look at the road network model which models roads with lines and cities with polygons, we see that there are some very large objects in model (large road like I95) and small objects (like a small city Nashua, NH).
Several GIS applications deal with very large databases of the order of terabytes. For example, remote sensing applications gather terabytes of data from satellites every day. Similarly data warehousing applications and NASA’s Earth Observation System are other examples of systems with terabytes of spatial data.
Requirements of a Spatial Database System
Any database system that attempts to deal with spatial applications has to provide the following features:
- A set of spatial data types to represent the primitive spatial data types (point, line, area), complex spatial data types (polygons with holes) and operations on these data types like intersection, distance, etc.
- The spatial types and operations on top of them should be part of the standard query language that is used to access and manipulate non-spatial data in the system. For example, in case of relational database systems, SQL should be extended to support spatial types and operations.
- The system should also provide performance enhancements like indexes to process spatial queries (range and join queries), parallel processing, etc., which are available for non-spatial data.
A Solution: Object Relational Databases
Object-relational database management systems are an attempt to incorporate object-oriented capabilities to a database environment. The new constructs added to the core functionality of traditional relational databases include abstract data types, object identity, and the ability to create operations or procedures through the database programming interface to work on these objects. An example of a project that proposed object-relational (or extended-relational) systems is POSTGRES. Commercial products include the Universal Servers from Oracle, Informix and IBM. More interestingly, the ANSI standardization committee for the database data language has proposed several extensions to the SQL3 (Gardels (1997), OGC (1998)) standard that incorporate object-oriented features into the SQL language. Any Spatial database system should address the following five main areas to support spatial applications: (i) Classification of Space (ii) Data Model, (iii) Query Language, (iv) Query Processing, and (v) Data organization and Indexing.
Classification of Space
For modeling different objects in space, the basic elements are point, line, and area. A point represents an object, which only has its location in space (X,Y or X,Y,Z) as the spatial attribute. Point can be used to model a city or a building in a large-scale map. A line represents an object, which has location attributes along with an extent. A line can be used to model roads, rivers, or utility lines. A region (or a polygon) has location attributes along with extent and an area. Here a region can be a polygon with holes as long as there is only one contiguous area associated with it. Regions can be used to model county boundaries, state boundaries, etc.
In traditional database applications the data types of the attributes are limited. These data types consist of integers, floats, character strings and dates. Object relational databases provide a higher level of abstraction for spatial data by incorporating concepts closer to human’s perception of space. This is accomplished by incorporating the object-oriented concept of user-defined abstract data types (ADTs). An ADT is a user-defined atomic type and its associated functions. For example if we have land parcels stored in a database then an ADT would be a combination of the “atomic type” polygon and some associated function, say, adjacent, which may be applied to land parcels to determine if they are adjacent. Query Language
A query language provides the means to access and manipulate data in the database. The query language should have enough constructs built into it to express a wide variety of data types. At the same time it must be intuitive and easy to use. A popular query language for relational databases is the Structured Query Language (SQL). It is known that the traditional SQL is inadequate to express typical spatial queries. This has prompted various efforts to extend the capability of SQL with spatial-friendly constructs. At the same time, the standards committee is currently working on a draft to update and make it compatible with the generic functionality offered by object-relational database management systems. The OGIS (1998) consortium led by important GIS and database companies has come out with their own proposal to include GIS capabilities in SQL. The specification is described by a standard set of Geometry Types based on the OGIS geometry model, together with functions for these types. Common spatial operators like adjacent, overlap and inside, and functions like buffer, perimeter and area have been included in the specification.
Spatial Query Processing
Spatial queries are often processed using filter and refine techniques (for a survey on different spatial query processing techniques see Gueting (1994)). In the first filter step, some approximate representation of a spatial object is used to determine a set of candidate objects which are likely to satisfy the given spatial query. Common approximations used in spatial databases are minimum orthogonal bounding box and multiple bounding boxes. Then the candidate set is further examined using the exact representations of the objects to find the actual set of objects which satisfy the given query. The approximations are chosen such that if the approximations of objects A and B do not satisfy a relationship, then that relationship cannot be satisfied between the objects A and B.
The purpose of a spatial index is to facilitate spatial selection. That is, in response to a query, the spatial index will only search through a subset of objects embedded in the space to retrieve the query result set. A fundamental idea for spatial indexing is the use of approximations. This allows index structures to manage an object in terms of one or more spatial objects, which are much simpler geometric objects than the object itself. The prime example is the bounding box (the smallest orthogonal rectangle enclosing the object). Another method called grid approximation divides the space into cells by a regular grid, and the object is represented by the set of cells that it intersects. The use of approximations leads to a filter and refine strategy for spatial query processing: First, based on the approximations, a filtering step is executed; it returns a set of candidates that is a super set of the objects fulfilling a spatial predicate. Second, the result is refined by checking the exact geometry of each candidate geometry.
Commercial Spatial Database Systems
Object-relational database systems (ORDBMS) provide the GIS applications the ability to completely integrate the spatial data with the attribute data in the database system. The ORDBMS provides SQL extensions so that the spatial data can be accessed and managed like any other attribute data. They also provide user defined indexes and functions which lets the database understand the spatial operations during the query optimization phase. Oracle’s Spatial, Informix 2D Datablade, and IBM’s DB2 (Chamberlin, 1997) Spatial Extender are examples of commercial systems which provide this functionality. In this section, we review how the current commercial database systems handle spatial data and we use Oracle Spatial database as the main reference.
Over the last few years, Oracle has extended the classical relational database in several areas. One of those extensions concerns the storage, indexing, querying and delivery of location-based information . Location information is really just another data type that should be stored alongside other information, such as names, dates, etc. It has, however, specific characteristics (ordering, querying) and specific needs (spatial indexing, coordinate systems, etc.).
The Oracle database provides all those facilities. It understands and manages location data just like any other data, and allows location-based queries to be expressed in its native language, SQL. It goes above and beyond the handling of simple location (point) data: it also handles complex geographic entities like lines, curves, polygons and areas, and it provides advanced geometric operations on those entities: intersections, unions, etc. For networks (typically road networks), it supports linear referencing, a fundamental need for describing and providing directions from one location to another.
Standards are especially important when it comes to procuring location data and related geographical information such as political boundaries, transportation networks, demographics, etc. Geographical and location information has traditionally been available in a variety of formats, often proprietary. The key body that drives standardization efforts in this area is the Open GIS Consortium, of which Oracle is a principal member. The data model used for storing the spatial and location information inside the database strictly conforms to the standards defined by the Open GIS Consortium. Location aware application framework
The explosive growth of the use of intelligent mobile devices (mobile phones and PDAs) opens up new possibilities for delivering location information to the mobile devices, and delivering services to the mobile devices that are customized and tailored according to their current location. Currently, we are observing a convergence of the PDAs and the mobile phones. The PDAs are acquiring advanced wireless communication capabilities; and the mobile phones are getting more computation and Internet access features. In the near future most appliances will also have network interfaces and remote management capabilities. These advances, together with inexpensive position sensing devices will make location-aware applications popular. However, due to the lack of an appropriate “location-aware application development framework”, many basic services will be implemented in each of the applications and cooperation between different location-aware applications will be difficult.
Applications today do not take into account the location information of the clients or the service providers. By providing an infrastructure that captures the mapping of services in an area of interest (we call them “regions”) and also a mapping of clients in the regions, applications can aggregate the location information from a variety of remote sensing technologies and provide a single, seamless interface to it. With this type of information it is feasible to enable innovative location-aware applications. Delivering location-dependent services to mobile devices requires the following components: storage of location information and delivery of spatial data to mobile devices.
Locating and organizing location dependent services
Providing services to mobile devices is more than just converting Web information to match the capabilities of the device (small screen, low bandwidth, etc.). A typical Web portal, as provided by companies such as Yahoo or as provided internally by corporations to their work force, is loaded with as much information as possible, usually in the form of jumping-off points to other information repositories. This approach is too limited once one considers mobile devices that, by definition, can be anywhere. However, once one adds the capability for the Web server to recognize the geographical location of a mobile device, the server can then adapt it self and provide services or information that are relevant to the current location.
Extending this model, one can see that large number of services can then be built all with content that is more or less local to specific locations: for example, a service monitors local buses and trains, while another monitors inter-city trains and commuter airline traffic. The model can extend to local restaurants advertising their menu, hotels with room availability, theaters that have last-minute tickets for sale, etc. Further along the line, services could themselves be physical devices that just advertise their availability: public printers on a campus, ATMs, vending machines, etc. Clearly, in such a model, the number of services potentially relevant for a given location can grow enormously and be very dynamic, with new services appearing and disappearing constantly.
Our current line of thought is to associate services with geographical areas or “regions”. A region can of course be defined as a member of a hierarchical structure (country, state or province, county, city or municipality, district, street, etc.). However, this model is too simple to represent the reality, where regions are likely to be defined in more flexible ways, crossing over the hierarchical boundaries. Examples can be metropolitan areas that spread over state or country boundaries, or company-specific sales areas that cover areas over multiple counties. The services provided by the spatial database can be used to define new service regions based on some spatial and non-spatial attributes. For example, a spatial union operator can be used to combine multiple existing service areas into a new service area. The spatial buffer operator can be used to construct a new service area based on some distance metric.
Recent reports have described the accomplishments of spatial database research and have prioritized the research needs in this area. Gueting (1994) and Worboys (1995) provide a broad survey of spatial database requirements and an overview of research results. Kim et. al. discussed the research needed to improve the performance of spatial databases in the context of object relational databases. The primary research needs identified were SQL support for spatial types, support for spatial indexing methods, development of cost models for query processing, and the development of new spatial join algorithms. Many of the research needs identified in this report have since been addressed.
Current spatial database systems provide support to store and retrieve geometric properties of spatial features. They also provide indexing support and operational support like topological and distance functions. Some of the main features in current systems are the support for managing different spatial reference and projection systems and support for applications like transportation systems, which require linear referencing capability.
- Chamberlin, D. (1997). Using the New Db2: IBM’s Object Relational System. Morgan Kaufmann, San Fransisco, CA.
- Defazio, S. et al. (1995). Integrating IR and RDBMS Using Cooperative Indexing, Proceedings of SIGIR, Seattle, Washington.
- M.J. Egenhofer. (1993). What’s Special About Spatial? Database requirements for Vehicle Navigation in Geographic space, ACM SIGMOD, 1993.
- K. Gardels. (1997). Open GIS and On-Line Environment Libraries. ACM SIGMOD Record, 26(1):32-38, March, 1997.
- R.H. Guting. (1994). An Introduction to Spatial Database Systems. VLDB, 3:357:399, 1994.
- W. Kim, J. Garza, and A. Kesin. (1993). Spatial data management in Database Systems. pages 1-13, 3rd Intl. Symposium on Advances in Spatial Databases, 1993.
- OGC. (1998). The Open GIS Consortium. .
- Oracle8i Spatial Cartridge User’s Guide and Reference, Release 8i, 1998.
- M. Stonebreaker and G. Kennitz. (1993). POSTGRES Next-Generation Database Management System. Communications of the ACM, 34(10):78-92, 1993.
- M. Stonebreaker and D. Moore. (1997). Object Relational DBMSs: The next great wave. Morgan Kaufmann, 1997.
- M. F. Worboys. (1995) GIS: A computing perspective. Taylor and Francis. 1995.
- Oracle Spatial White paper –