A Service Driven Approach for Integration of Heterogeneous Geospatial Data Repositories

A Service Driven Approach for Integration of Heterogeneous Geospatial Data Repositories

SHARE

S.K. Ghosh, Manoj Paul
School of Information Technology
Indian Institute of Technology, Kharagpur, India
[email protected], [email protected]

ABSTRACT
Heterogeneous nature of geospatial data makes it difficult to be shared across organizations. In government and other agencies, geospatial data is often produced with diverse departments relying on a mix of software and information systems. Each department uses its individual system to increase efficiency, but sharing data across the enterprise is a near impossibility. In this paper, we discuss a service-based model for integrating diverse geospatial data repositories. The whole system works on publish-find-bind methodology of service oriented architecture. The service-oriented architecture using web services has been adopted for integrating diverse repositories of spatial data. The OGC (Open Geospatial Consortium) standards, namely, Web Map Service (WMS) and Web Feature Service (WFS), have been used for enabling a centralized access of spatial data of different format.

1. INTRODUCTION
In government and other agencies, Geographic Information System (GIS) is often developed with diverse departments relying on a mix of software and information systems. Each department uses its individual system to increase efficiency, but sharing data and applications across the enterprise is a near impossible. An increasing number of institutions are challenged with implementing robust geographic information system (GIS) capabilities for a large number of individuals through information sharing and interconnected networks. In the past, numerous technological roadblocks hampered the successful implementation of enterprise-wide GIS system (E-GIS). With the advent of high-speed networks; increasingly fast computers; intelligent, spatial-data serving technologies; improved data architecture; and advances in GIS software; the newest challenge involves integration of the various technological and institutional components, addressing the interoperability problem through OGC (Open Geospatial Consortium) standards (OGC, 2007). Enterprise geographic information system (E-GIS) is an organization-wide approach to GIS implementation, operation, and management. E-GIS can also be defined as an effort to design integrated geospatial management techniques to serve a complex institution.

In this paper we have followed a service-oriented architecture (SOA) using web services for integrating diverse repositories of spatial data. Two standard web service techniques proposed by OGC, namely Web Map Service (WMS) (OGC, 2004) and Web Feature Service (WFS) (OGC, 2002), have been used for enabling a centralized access of spatial data of different format. WFS allow a client to retrieve geospatial data from multiple Web Feature Services. The OGC WMS is capable of creating and displaying maps, coming simultaneously from multiple sources, in standard image formats such as .svg, .png, .gif or .jpg.

2. RELATED WORK
The need of sharing/integrating geospatial data available in various organizations has been addressed from quite some time (Taldoire, 2001). Several ways of integrating the spatial data repositories have been proposed in the literature. The database community has done extensive research for the integration architecture (Wiederhold, 1992).

(Roth, 1997) proposes a spatial data warehouse based technique and employing middleware technology for data exchange. This way data from multiple repositories are cleaned up and maintained in a warehouse. The warehouse can be composed of a central data repository or distributed data repositories depending on the design of the warehouse. (Cluet, 1998) proposes a mediation-based approach to facilitate data integration. Most of these approaches rely on the wrappers (Roth, 1997, Cluet, 1998) for accessing the data sources and translate the data in some standard format. The main advantage of the warehouse-based approach is that a local administration is required to maintain data. Mediator based approach, on the other hand, handles the queries directly through wrappers and integrates the data locally, which can be costly.

There are some works in the geo-spatial domain using open standards proposed by Open Geospatial Consortium (OGC) (Boucelma, 2003; Comert, 2004). An approach is proposed in (Boucelma, 2003) which uses a Web Feature Service (WFS) based mediation approach with the help of derived wrappers. The advantages over the other approach are that it can capture query capabilities available at the source or access a local query capability not available at the source. This enables an enhanced and efficient query language on spatial data. A spatial web application using open standards is also proposed in (Anderson, 2003).

In this paper we propose a web service based approach for the integration architecture. The data repositories are available on the web as services with some well-defined interface. It is base on XML technology and a client can access any data repository having data in any format located on any platform if it only knows how to communicate with service provider. The goal is to provide unified access to data from heterogeneous data providers. Any client (user) can submit its query to a geospatial server. The server, on the other hand, will retrieve the data from the multiple sources and return the result to the client. This will help in realizing interoperability between the heterogeneous data repositories.

3. SERVICE-ORIENTED ARCHITECTURE AND SPATIAL DATA

3.1 General SOA Methodology
The emerging Service Oriented Architecture (SOA) based method using Web Services is gaining lots of interest in the way of seamless integration of information systems spreading across several organizations. As shown in figure 1 (Champion 2002), service oriented architectures involve three different kinds of actors: service providers, service requesters and discovery agencies. The service provider exposes some software functionality as a service to its clients. In order to allow clients to access the service, the provider also has to publish a description of the service. Since service provider and service requester usually do not know each other in advance, the service descriptions are published via specialized discovery agencies. A service is a function that is well defined, self-contained, and does not depend on the context or state of other services. They can categorize the service descriptions and provide them in response to a query issued by one of the service requesters. As soon as the service requester finds a suitable service description for its requirements at the agency, it can start interacting with the provider and using the service.


Figure 1. Basic Methodology of Service Oriented Architecture (SOA)
Service Oriented Architecture consists of a collection of services. These services communicate with each other. The communication can involve either simple data passing or it could involve two or more services coordinating some activity. Some means of connecting services to each other is needed. The first service oriented architecture for many people in the past was with the use DCOM or Object Request Brokers (ORBs) based on the CORBA specification. The technology of Web Services is the most likely connection technology of SOA. Web services essentially use XML to create a robust connection. The basic interaction of SOA is shown in figure 2. It shows a service consumer at the right sending a service request message to a service provider at the left. The service provider returns a response message to the service consumer. The request and subsequent response connections are defined in some way that is understandable to both the service consumer and service provider.


Figure 2: SOA – Interaction between service provider and consumer
3.2 Geospatial domain
The open and distributed GI domain opens a wide range of possibilities for acquiring, processing and analyzing geographic information without the need of GIS expert knowledge. In an environment where services are previously unknown, a service that is appropriate for answering a given question from among a large number of available services has to be discovered first. Service discovery, thus, is a crucial task that will become even more important with the emerging Semantic Geospatial Web.

Although the SOA technology using web services is evolving in the information system domain, where legacy or already existing systems are integrated to form an organization-wide information system, the application of SOA in comparatively highly heterogeneous spatial data domain has not been tried much Keeping in mind the highly heterogeneous spatial data, the complex query mechanism required for processing them, a service-based technique could possibly be one of the best solutions for co-operative integration of spatial data.

Two major problems that exist in highly heterogeneous geospatial domain are as follows:

  • Improper or insufficient documentation makes it difficult for outside users to discover the data sets those are useful for their task.
  • Datasets in different format often requires to be converted in order to be used in other system. This problem is taken care of (as proposed by OGC) by providing the data in a vendor-neutral formats like GML (ODC, 2003).

OGC WFS provides a set of protocols to provide standardized service interfaces for the geospatial data sources. Through these services distributed geospatial data can be accessed and processed across administrative and organizational boundaries. As the data sources are less coupled to the integrated system, they can be created and managed locally, which leads to increasing quality and efficiency. The integrated system can easily be extended to include new services and/or data sets.

4. INTEGRATING GEOSPATIAL REPOSITORIES
The issue of how to capture data from several highly heterogeneous spatial data sources and integrate them for analysis becomes important for the web-based GIS application. The development of web technologies and Internet provide a way to quickly access various geo-databases. The internet has become immensely valuable and been recognized as an important means of quickly disseminate information and acquire data from several spatial data repositories (David, 1998; Peng, 2003; Peng, 2004).

For providing a unified access of data from multiple data sources each individual institution or data providers are required to register their data in the central registry, which will enable in marinating a catalogue of data in the central server. The catalogue should contain information about data type, feature information etc. Upon request data is searched in the catalogue and request for the corresponding data can be sent to the feature request module. Whatever be the input data format, the output data is to be sent in the Geography Mark-up Language (GML) (OGC, 2003) format due to interoperability reason. A registry at the central server should also maintain a registry file containing the information of different types of operation that can be performed in each of the data features.

This project is an integrated Java Enterprise based implementation of the 1.0 Web Feature Server (WFS) (OGC 2002) and 1.1.1 Web Map Service (WMS) (OGC 2004) specification from the Open geospatial Consortium (OGC). The objective is to enable greater geographic interoperability by reinforcing OGC standards and other web standards and lowering the barriers to entry for geographic data providers. Request to the WFS server provides the feature data in GML format. On the other hand a WMS request to the WMS server serves the data by graphically rendering it i.e. in map format. The work presented in this paper can integrate data in flat file format (Shape format and GML format) and relational database format (Oracle Spatial). With this approach data can be published as maps/images (using the WMS), as actual data (using the WFS). The main focus is to ease of use and support for open standards, in order to enable anyone to quickly share their geospatial information in an interoperable way. The primary goals of the work are as follows:

  • Standards Compliance: The primary intent of the project is to promote standardization and it must, therefore, adhere to open standards as closely as possible. It also attempts to support as many relevant geographic standards as possible.
  • Data Format Support: In order to make the server useful, it must help translate the current diversity in geographic data formats into a single format (e.g. GML). Therefore, supporting several data formats – both relational and flat files – is of primary importance of the project.
  • Ease of Use: It is targeted at organizations with minimal technical expertise and must, therefore, be easy to configure, and run for organizations with few technical resources.
  • Efficiency: Given that the volume of data required by geographic applications generally entails severe computational and bandwidth loads, it strives to be as efficient as possible, while achieving other goals.

With the recent research on adopting SOA for integration geospatial data, the issue of sharing spatial data has taken a new dimension. With web services it becomes possible for applications to acquire and integrate spatial data from heterogeneous sources in real time over the web. OGC web services provide a vendor-neutral interoperable framework for web-based discovery, access, integration, analysis and visualization of multiple online geospatial data sources. Web Feature Service (WFS) and Web map Service (WMS); the two important web service standard proposed by OGC, has been adopted as main technological backbone for this project.

Web Feature Service
Web Feature Service is one of the GIS web service interoperable specifications defined by OGC (OGC, 2002). It is the most powerful data service of OGC Web Services. Web Feature Service allows a client to retrieve geospatial data from multiple geospatial data servers. It also supports INSERT, UPDATE, DELETE, QUERY and DISCOVERY operations on geographic features using HTTP as the distributed computing platform.

WFS defines three primary operations: GetCapabilities operation describes capabilities of the web feature service using XML, it indicate which feature types it can service and what operations are supported on each feature type. DescribeFeatureType operation describes the structure of any feature type it can service. GetFeature operation services a request to retrieve feature instances. In addition, the client should be able to specify which feature properties to fetch and should be able to constrain the query spatially and non-spatially.

Web Map Service
The OGC WMS is capable of creating and displaying maps that come simultaneously from multiple sources, in standard image formats such as Scalable Vector Graphics (SVG), Portable Network Graphics (PNG), Graphics Interchange Format (GIF) or Joint Photographic Expert Group (JPEG) (OGC, 2004).

It provides four operations: GetCapabilities allows a client to instruct a server to provide its mapping content and processing capabilities and return service-level metadata; GetMap enables a client to instruct multiple servers to independently craft “map layers” that have identical spatial reference system, size, scale, and pixel geometry. The client can then display these overlays in a specified order and transparency such that the information from several sources is rendered for immediate human understanding and use; GetFeatureInfo enables a user to click on a pixel to inquire about the schema and metadata values of the feature(s) represented there. Figure 3 shows the access mechanism of geospatial data using OGC web services.


Figure 3: OGC spatial web services for accessing geospatial data
5. ARCHITECTURE OF THE SYSTEM
The overall architecture of the system is shown in figure 4. At the core of the system is a central registry, which holds the information of all the data repositories in the form of metadata that a user can avail data from. Data is provided as features. The registry is realized with a catalogue holding the information of the data repository, e.g. type of the data (shape file, gml file), feature name (to identify a feature uniquely and subsequently accessing the feature data with this name), namespace of data (for semantic access), spatial reference system (SRS) etc in a catalogue file.


Figure 4: Architecture of the system
An entry for a data store in the registry is as follows:

<datastore namespace = “kgp” enabled = “true” id = “Ponds”>
<connectionParams>
<parameter value=”file:data/
featureTypes/kgp_Ponds/Ponds.shp”
name=”url”  />
</connectionParams>
</datastore>

Data can also be on standard relational database management system like Oracle, PostGIS database. Oracle is the preferred for spatial data storage due to its spatial data storage capability. For each feature one style file (.sld) is associated which is used to format the graphical display of the spatial data. Styled Layer Descriptors, or SLD, are what make maps colourful. They provide the necessary information for the data to be rendered in map form. The interface between spatial database and OGC web services is shown on figure 5.


Figure 5: Service driven access of data from spatial database
Once data from multiple data storage is maintained in the central registry, a client can analyse/discover what different feature data available, what are the different operations that can be performed on the feature data etc through GetCapabilities request. This is actually the discovery phase of service-based computing. Figure 6 shows the output of a GetCapabilities request.

Client can access data by requesting a feature through GetFeature request. The request can be sent to the server as GET or POST. The FeatureRequest object handles both the methods similarly. When the request comes in, the servlet container will send the request to the WfsDispatcher, which is the entry point to the server. The entry point is specified in web.xml file inside the servlet container (Tomcat). The FeatureRequest object will then head over to the feature type that was specified in the URL, and query the data. When a DataStore accepts a query, it doesn’t actually return features, instead it returns a FeatureReader that can be used to read the feature that the query selects one-at-a-time. The delegate (i.e. GML2 producer) reads a single feature, converts it to GML2 and sends the results off to the output Strategy object.

GetMap request is handled in the similar way except that instead of transferring the raw data in GML encoded form, the data is rendered in a graphical form using Java Advanced Imaging (JAI) module and is sent to the client in the form of map. This request requires some additional parameters like the boundary of display (bbox), SRS, image format, size of the image (height/width) etc.

<

<FeatureType>
  <Name>kgp:road</Name>
  <Title>road_Type</Title>
  <Abstract>Generated from shape file </Abstract>
  <Keywords>road shape file</Keywords>
  <SRS>EPSG:4326</SRS>
 <LatLongBoundingBox minx=”-0.0014″ miny=”- 0.0024″ maxx=”0.0042″ maxy=”0.0018″ />
</FeatureType>
<FeatureType>
  <Name>kgp:cultivation</Name>
  <Title>cultivation_Type</Title>
  <Abstract>Generated from GML file </Abstract>
  <Keywords>Cultivation GML file</Keywords>
  <SRS>EPSG:4326</SRS>

<LatLongBoundingBox minx=”0.0014″ miny=”-0.0011″ maxx=”0.0042″ maxy=”0.0024″/>
</FeatureType>

Figure 6: Discovering a feature data
5.1 Querying Data
Data from the repositories is accessed in OGC specified standard query format. The adapted XML-based query method allows selecting a subset of a feature data i.e. querying on features. An example of a GetFeature request that performs accessing a feature as well as querying on feature data is as follows.

https://localhost:8080/geoserver/wfs? request=getfeature& service=wfs&
version=1.0.0& typename=roads&
filter=<ogc:Filter xmlns:ogc=https:// ogc.org xmlns:gml= “https://www. opengeospatial.org/gml”>
<ogc:BBOX>
<ogc:PropertyName>the_geom</ogc:PropertyName>
<gml:Box srsName=”https://www. opengeospatial.org/gml/srs/epsg.xml”>
<gml:coordinates>-74.0, 40.0 – 85.0, 40.0 </gml:coordinates>
</gml:Box>
</ogc:BBOX>
</ogc:Filter> Different parts of the data access request are:

  • The server address – https://localhost:8080/geoserver/wfs
  • The request type – request=getfeature
  • The service type – service=wfs
  • The version – version=1.0.0
  • The type name – typename=roads

The filter section of the data access request does tha actual querying on a feature. The output is the feature data in GML2 encoded form.

The requirement of a user can be to select a subset of a feature data with some specified parameter value (e.g. area with > 50% population density). The request for the data combined with query can be specified in such a way so as to select the required amount of the feature data. An example of such a query can be as follows

<wfs:Query typeName=”states”>
<wfs:PropertyName>STATE_NAME</wfs:PropertyName>
<wfs:PropertyName>LAND_KM</wfs:PropertyName>
<wfs:PropertyName>the_geom</wfs:PropertyName>
<Filter>
<PropertyIsBetween>
<PropertyName>LAND_KM</PropertyName>
<LowerBoundary>
<Literal>100000</Literal>
</LowerBoundary>
<UpperBoundary>
<Literal>150000</Literal>
</UpperBoundary>
</PropertyIsBetween>
</Filter>
</wfs:Query> The filter section of the query performs a selection on the feature data “states”. It basically resembles the XQuery techniques of querying XML data. It performs a selection in the feature data to find the states with an area between 100,000 and 150,000. Also, it returns the STATE_NAME, LAND_KM, and geometry, instead of all the attributes.

A GetMap request, on the other hand, requests the data with additional formatting parameters like bounding box (bbox), height/width, image type etc. An example GetMap request is as follows

https://localhost:8080/geoserver/wms?
bbox=-0.0014,-0.0024,0.0042,0.0018&
styles=cite_forests&
Format=image/gif&
request=GetMap&
layers=cite:Forests&width=650&
height=450&
srs=EPSG:4326

Data can also be requested from multiple data repositories. Since data from different sources are available as features, multiple features can be specified in the layers section to obtain multiple feature data as follows:

https://localhost:8080/geoserver/wms?
bbox=-0.0014,-0.0024,0.0042,0.0018&
styles=cite_forests&
Format=image/gif&
request=GetMap&
layers=Forests, road, river, house, &
width=650&
height=450&
srs=EPSG:4326

Data can also be requested from multiple data repositories. Since data from different sources are available as features, multiple features can be specified in the layers section to obtain multiple feature data as follows:

https://localhost:8080/geoserver/wms?
bbox=-0.0014,-0.0024,0.0042,0.0018&
styles=cite_forests&
Format=image/gif&
request=GetMap&
layers=Forests, road, river, house, &
width=650&
height=450&
srs=EPSG:4326

The data is displayed to the requester in overplayed form with proper positioning taken care by the bbox and SRS attributes of the GetMap request. Figure 7 shows data requested from multiple sources and the corresponding output.


Figure 7: Output of GetMap Request from multiple data sources
The query processing mechanism is in compliance of OGC WFS and WMS and thus adhering to interoperability.

6. CONCLUSION
Geographic data is increasingly becoming available on the Internet, allowing a large number of users to share and access the rich databases that currently being maintained in several organizations. However, GIS data is immensely heterogeneous, being available in various formats and stored in diverse media (flat files, relational database). In this paper we have discussed SOA architecture for integration of geospatial data repositories. The problem is not only to integrate these heterogeneous data sources, but also the query processing and domain specific computational capabilities supported by these sources, which makes GIS integration a real challenge.

As the spatial data repositories may be located in several organization, inter-organization sharing of spatial data can be achieved with web service based WMS/WFS. The Web Service based integration technique for spatial data repositories has been addressed in the paper with a case study. The querying capability to the data has also been achieved through XML based query language Xquery. Data stored as flat files or in spatial database have been integrated for interoperable access. Designing a Web Service client, which can invoke the services of the server by sending request in specified format, can do further enhancement.

References

  • Anderson, G., and Moreno, S. R., 2003. Building web-based spatial information solutions around open specifications and open source software. Transactions in GIS 7(4): 447-466.
  • Boucelma, O., Garinet, J., and Lacroix, Z., 2003. The VirGIS WFS-Based Spatial Mediation System. In Proceedings of ACM CIKM’03, New Orleans, USA.
  • Cluet, S., Delobel, C., Simeon, J., and Smaga, K., 1998. Your mediators need data conversion. In Proceedings of the ACM SIGMOD Conference, pages 177–188.
  • Comert, C., 2004. Web services and national spatial data infrastructure (NSDI). In Proceedings of Geo-Imagery Bridging Continents, XXth ISPRS Congress, Istanbul, Turkey Commission 4.
  • Champion M, Ferris C, Newcomer E, and Orchard D., 2002, Web Service Architecture, W3C Working Draft, 2002. https://www.w3.org/TR/2002/WD-ws-arch-20021114/
  • David, J.A., Kerry, T., Ross, A., and Stuart, H., 1998. An exploration of GIS architectures for Internet environments. Computers, environment and urban systems 22: 7-23.
  • OGC, 2002., Web Feature Service Implementation Specification, version 1.0.0. [https://www.opengeospatial.org/specs/?page=specs.].
  • OGC, 2003., OpenGIS Geography Markup Language (GML) 3.0 Implementation Specification, OGC Recommendation Paper.
  • OGC, 2004., Web Map Service, Version 1.3. [https://www.opengeospatial.org/specs/?page=specs.].
  • OGC, 2007., Open Geospatial Consortium [https://www.opengeospatial.org/]
  • Roth, M.T., and Schwarz, P., 1997. Don’t scrap it, wrap it! A wrapper architecture for legacy data sources. In Proceedings of the International Conference on Very Large DataBases.
  • Peng, Z-R., and Tsou, M-S., 2003. Internet GIS: Distributed Geographic Information Services for the Internet and Wireless Networks. New York, John Wiley & Sons, Inc.
  • Peng, Z-R., and Zhang, C., 2004. GML, WFS, SVG and the future of internet GIS. [email protected] 8(7): 29-32.
  • Pulsani, P., 2005. Seamless Geospatial Computing. GIS DEVELOPMENT, pages 24-27.
  • Taldoire, G., 2001. Geospatial Data Integration and Visualisation Using Open Standards. 7th EC-GI & GIS WORKSHOP, Potsdam, Germany, June 13-15.
  • Wiederhold, G., 1992. Mediators in the architecture of future information systems. IEEE Computer, pages 38–49.Clancey, William J., 1997 Situated Cognition: On Human Knowledge and Computer representations. Cambridge University press. The Edinburgh Building, Cambridge CB2 2RU, United Kingdom. ISBN 05214449004, pp 406.