Home Articles Integration across heterogeneous spatial data and applications within a large cyberinfrastructure project

Integration across heterogeneous spatial data and applications within a large cyberinfrastructure project


I Zaslavsky
San Diego Supercomputer Center
University of California San Diego, usa
[email protected]

A. Memon
Chief Software Engineer – GEON Project,
[email protected]

G. Memon
Programmer/Analyst, Science R&D Group,
[email protected]

San Diego Supercomputer Center,
University of California San Diego

Abstract
Grid services emerge as a flexible middleware architecture supporting interoperation of heterogeneous resources. Data grids and services address syntactic and structural differences across spatial data sets, by following a set of common abstractions for data access and functionality invoked over the web. At the same time, the new technology emphasizes several important challenges, including integration of semantically heterogeneous data, management of very large replicated data sets, and interoperation between grid-managed and external data resources. In this paper, we describe several spatial information integration services that address such challenges encountered in the course of a large cyberinfrastructure project. Specifically, we focus on services for assembling composite maps from heterogeneous distributed sources of spatial data.

Introduction
As the central issue of Geographic Information Science, spatial data interoperability has received considerable attention in recent years. Numerous GIS interoperability research projects and initiatives have addressed issues of format and representational heterogeneity across spatial datasets, spatial metadata supporting data integration, spatial data interchange standards, organization of spatial data infrastructure nodes, data quality and error propagation, etc. [Bishr 1998; Camara et al. 1999; DeVogele et al. 1998; Visser and Stuckenschmidt 2002].

Beyond the academic projects, GIS interoperability solutions have percolated in commercial software. Notable examples include the Open Spatial Enterprise, collaboration between Oracle, Autodesk, Intergraph, Laser-Scan, and MapInfo [e.g. , the ESRI Interoperability Extension [https://www.esri.com/software/standards/index.html], the growing list of spatial data formats translated by the Feature Manipulation Engine [www.safe.com]. Many companies include support for industry standards developed within the Open Geospatial Consortium (OGC) [https://www.opengeospatial.org/]. Under the aegis of the OGC-sponsored series of interoperability testbeds, the proposed specifications, including the Geography Markup Language (GML), a set of map and feature server specifications (WMS, WFS, WCS, etc.) have been implemented and tested within various software environments.

While as a result, excellent solutions have emerged for handling syntactic and structural differences across spatial data sources, additional types of heterogeneity have received less attention. Driven by both practical challenges and trends in computing, the focus of interoperability research is moving to models and technologies of seamless automatic on-demand spatial data integration. The agenda includes: (1) registering semantically different spatial databases, and ontology-aware querying across such sources, (2) generating different types of composite thematic maps on demands, (3) interoperability inside large virtual organizations served by spatial data grids, (4) managing very large datasets and staging them to be consumed by web and desktop clients, (5) developing lossless interoperable solutions, (6) data integration across spatial scales, (7) security and privacy in data sharing, (8) scalability and quality of service in wide networks, etc. This list can be continued.

Developing secure and efficient “interoperability services” represents an attractive technical approach to solving some of these challenges. In this paper, we address several of them, deriving from the experience of several cyberinfrastructure projects being developed with the participation of the data technologies group at the San Diego Supercomputer Center. We primarily base our discussion on the Geosciences Network (GEON), a large collaborative NSF-funded project devoted to the creation of cyberinfrastructure for integrating geologic data and computational resources [www.geongrid.org]. Two other related projects include BIRN (The Biomedical Informatics Research Network, an NIH-funded grid project which creates data sharing and mediation environment for neuroscience data, www.nbirn.net), and BorderSafe (a DHS-funded effort exploring implications of privacy and data sharing in law enforcement community). While these projects are in different disciplines and at different stages of maturity, and work at different scales, they share spatial data integration design and components, in particular the services-based ontology-aware spatial mediation and map assembly services. Some aspects of these technologies have been described earlier [Gupta et al, 1999; Zaslavsky et al. 2004a, 2004b; Zaslavsky and Memon 2004].


Fig 1 Organization of GEON software layers
In this paper, we focus on additional issues of composite map generation in the grid environment. Specifically, we describe services-based architecture for spatial information integration, and outline a collection of abstractions and relevant grid services that enable interoperability across distributed resources. Our focus is map assembly services, as they are extended to handling non-hosted standards-compliant spatial data servers.