Shan Yin, LiN Hui
Department of Geography, The Chinese University of Hong Kong
E-mail: [email protected] , [email protected]
FU Wai Chee
Department of Computer Science and Engineering,
The Chinese University of Hong Kong
E-mail : [email protected]
Keywords: Spatial OLAP, Web-based GIS, Data Mining
Nowadays, huge volume of geo-referenced data has been available to geographers and other scientists. This data-rich environment is considerably different from the data-poor environment when Geographic Information System (GIS) originated. Furthermore, although Web-based GIS improved the accessibility, simple query and retrieval function of Web-based GIS cannot meet the need of geographers and other scientists. On-Line Analytical Processing (OLAP) occurs to facilitate the decision-maker to gain insight from large treasury of data, instead of drowning in the sea of unmeaning data. We believe that OLAP should be integrated into GIS so that we can make the most of the geographic data.
In this paper, we would give a very brief introduction to OLAP, Web-based GIS and related techniques first. Then the system architecture for integrating OLAP and Web-based GIS is proposed. Subsequently, spatial OLAP server is examined. We take a prototype system, which is under development, targeting on facilitating users' access to OLAP function through Internet, as an example. A conclusion is given at last.
With the advance of Web technology, Web-based GIS has already become one of the focuses of GIS research. Many Web-based GIS systems have been built around the world. Even some commercial software packages are now available which make the construction of Web-based GIS systems easier than ever (ESRI, 2000a, 2000b). However, most Web-based GIS systems have limited analysis functions, let alone mining knowledge interactively from the precious geo-referenced data. Web-based GIS with OLAP style makes geo-referenced data more understandable to users, which leads to detection of implicit but valuable patterns and associations possible.
Furthermore, huge volume of data is now available. Traditional GIS is only a good repository of geographical information, rather than helping scientist to gain insight from these data. Spatial OLAP and other data mining technique can meet this end. Spatial OLAP help scientist observe these geo-referenced data from different perspective and various levels of concept hierarchy.
Data mining and knowledge discovery in databases is relatively a new field of study. It can be understood as the discovery of interesting, implicit, and previously unknown knowledge from large databases (Frawley, 1991). Spatial data mining is a promising field of data mining, which intends to extract implicit knowledge from spatial database. Some researches have been conducted in this field with good outcome. (Ester, 1997; Kroperski, 1996; Lu, 1993). Issues covered in these researches include spatial generalization, spatial clustering, spatial association detection and spatial classification.
OLAP is an indispensable part of spatial data mining, focusing on the end user's analytical requirements and computation process necessary to fulfill them. Just like OLAP, the popular operations of spatial OLAP are slicing and dicing, pivoting, roll-up and drill-down. Obviously, these interesting functions cannot be found in current Web-based GIS systems. We believe that OLAP should be integrated into Web-based GIS so that the capability of web-based GIS system will be improved greatly.
Architecture of Integrated System
The architecture of the integrated system is depicted in figure 1 in a brief way. This is a multitier structure. Concisely, it consists of three tiers. Presentation Tier publishes data through Internet and gathers requests from users. Service Tier processes request and generates response while Data Management Tier is the repository of geographical information.
As a Web-based system, users can access geographical information through Internet using popular Internet browser. Web Server communicates with Internet Map Server. Internet Map Server retrieves geographical information from Spatial Database and transform into appropriate format before sending the data to a Web Server. Some complicated requests which cannot be handled by Internet Map Server, such as drilling down and rolling up, are handed to Spatial OLAP Server for processing. Spatial OLAP Server processes data retrieved from Spatial Database, combining with hierarchy information and maybe materialized data, to get the results and the then transfers to Internet Map Server for publishing through Internet.
Figure 1. Architecture of Integrated System
The general architecture seems easy to understand. Even some commercial packages are available for some of the services mentioned above which can accelerate system construction. Microsoft IIS, ArcIMS and ArcInfo are good examples for these services respectively. However, Spatial OLAP Server deserves more attention. We would focuses on it in next section. Spatial OLAP Server
The main requirements of OLAP server include supporting multiple users, handling huge volume of data efficiently as well as supporting rich OLAP operations. Multiple users access is very common nowadays, especially to network applications. Typical OLAP operations are roll-up, drill-down, slice and dice, pivot. Spatial OLAP server should implement all of these functions. Generally OLAP runs against very large dataset. Hence efficient and effective access methods are critical. Furthermore, when it comes to spatial data, the situation becomes more complicated (Egenhofer, 1994).
To access spatial dataset efficiently, many methods and models have been proposed from the perspective of databases (Egenhofer, 1994; Aref, 1991). In addition to these kinds of effort, materialization of view (Harinarayan, 1996; Stefanovic, 1993) and indexing are very promising fields. In (Harinarayan, 1996), a lattice framework is presented to express dependency among view. With the help of this framework, a greedy algorithm is developed to choose proper views to materialize so that response time is shortened. Stefanovic (1993) improves the algorithm above according to the characteristics of spatial measures. Indexing on summary tables (subcubes of the data cube) would enhance the efficiency of query further, Gupta (1997) propose algorithm for index selection for OLAP which takes efficient use of space into consideration. We would employ these achievements in our system.
Regarding data storage, the spatial data and its attribute are managed by geographical information systems in our system. The non-spatial data, metadata and concept hierarchy are stored in relational DBMS (RDBMS). Spatial OLAP server manages the materialized view which can shorten response time greatly. Data cube, which is constructed and managed by OLAP server, makes it possible for users to observe data from various concept levels. There are two major direction in implementing OLAP Servers, namely Relation OLAP (ROLAP) and Multidimensional OLAP (MOLAP). ROLAP extend traditional relational server to support multidimensional view while MOLAP utilize a direct way, such as, multidimensional array, to manage multidimensional information. ROLAP integrates naturally with existing technology and standards, which is reliable and scalable whereas MOLAP provide efficiency in storage and operations because of its direct representation of multidimensional data. (Zhao 1997; Shoshani, 1997) The former is adopted in our system
Spatial OLAP is a fresh but promising field of research. Its integration with Web-based GIS is an even more interesting area. Web technology makes spatial OLAP more accessible to decision maker. Spatial OLAP equips Web-based GIS important functions for decision support. So far there many questions to them remain open, such as automatic construction of concept hierarchies, data integration, incremental update of spatial data cube.
- Aref, Walid G., et al., 1991. Extending a DBMS with Spatial Operations. In: Advances in Spatial Databases, Proceedings of Symposium on Large Spatial Databases, SSD'91.
- Egenhofer, M. J., 1994. Spatial SQL: a query and presentation language. IEEE Transactions on Knowledge and Data Engineering, Volume: 6 No. 1, Feb. 1994 pp.86-95.
- Erik, T., 1997. OLAP Solutions: Building Multidimensional Information Systems, Wiley, US.
- ESRI, May 2000a (Access Date). The ArcIMS 3 Architecture. Available at:
- ESRI, May 2000b(Access Date). ArcIMS 3 Features and Functions. Available at:
- Ester, M., 1997. Spatial Data Mining: A Database Approach. In: Advances in Spatial Database. SSD'97 Berlin.
- Frawley, W. J., et al., 1991. Knowledge Discovery in Databases: An Overview. In: Knowledge Discovery in Database. Edited by Piatetsky-shapiro, G., et al., AAAI/MIT Press, Menlo Park, CA
- Gupta, H., Harinarayan, V., Rajaraman, A., Ullman J. D., 1997. Index Selection for OLAP. In: 13th International Conference on Data Engineering, pp.208-219.
- Harinarayan, V., Rajaraman, A., and Ullaman, J. D., 1996. Implementing data cubes efficiently. In: Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, Montreal, Canada, June 1996, pp205-216.
- Koperski, K., et al, 1996. Spatial Data Mining: Progress and Challenges. In : Proceedings of SIGMOD'96 Workshop, Data Mining and Knowledge Discovery (DMKD'96), Montreal, Canada, June 1996.
- Lu, W., et al., 1993. Discovery of General Knowledge in Large Spatial Databases. In: Proc. of 1993 Far East Workshop on Geographic Information Systems (FEGIS'93), Singapore, June 1993, pp. 275-289
- Stefanovic, N., 1993. Design and Implementation of On-Line Analytical. In: Processing of Spatial Data. M.Sc. Thesis, Computing Science, Simon Fraser University, Canada. Available at:
- Shoshani, A., 1997. OLAP and statistical databases: similarities and differences. In: Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. pp.185 – 196
- Zhao, Y., et al., 1997. An array-based algorithm for simultaneous multidimensional aggregates. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp. 159 – 170.