Home Articles The integration of spatial datasets for network analysis operations

The integration of spatial datasets for network analysis operations

Sanphet Chunithipaisan1, Philip James2, David Parker3
Department of Geomatics,
University of Newcastle upon Tyne,
Newcastle, UK, NE1 7 RU
1 PhD Student, [email protected],
2 Lecturer, [email protected],
3 Professor, [email protected],


One of the most important current challenges for Geographic Information Systems (GIS) is the generation of corporate geo-spatial resources whose full potential can only be realised by making them accessible to a large number of applications and end-users [25]. In the field of facilities management, such as gas, electricity, water, telecommunications and transportation companies, spatial network GIS could provide a useful graphical interface and geographical database for the management of network assets and flows [11]. Utility networks typically impact on many people over vast areas and are generally managed by government departments, large organizations or companies. There is often little collaboration between the organisations despite similarities of interest and in some cases new legal requirements to share data with other utilities to minimise the impact of repairs and new build on both the public and the environment [24]. Thus, there is a growing need both to share the basic network information and in some cases to integrate data sets to carry out more complex network analysis operations.

In the real world, objects are connected to each other: thus an optical cable is connected to a multiplexer that in turn is connected to copper cables connecting into our homes to provide cable TV, telephony and internet access. Using GIS in support of network utility management typically involves many types of features that may have connectivity to each other. Several GIS vendors have developed GIS software whose potential functions can provide for network management and analyses [7], but each system has a proprietary format to deal with the connectivity between geometry or features. Topology in GIS is generally defined as the spatial relationship between such connecting or adjacent features [1,5], and is an essential prerequisite for many spatial operations such as network analysis [23]. There are, in general, three advantages of incorporating topology in GIS databases: data management, data correction and spatial analysis [13]. Topology structures provide an automated way to handle digitising and editing errors, and enable advanced spatial analyses such as adjacency, connectivity and containment [2]. In some systems this relationship is assumed (by the user) whereas in others it makes up part of the structure of the geometry [1,22]. In some systems topology can be built where all arcs intersect or touch [4] and in others rules defining connectivity between feature types can be used to build topology [6]. The latter approach is typically used in the major GIS especially for network utility applications. For example, ArcInfo has a "rule base" to specify the connectivity of the features whereas GE Smallworld has a "manifold" to describe how features connect. In addition, the network may be a directional network depending on the application in question. Each GIS designed for network utility applications has different alternatives to manage the issue of a directional network. Some systems have a feature that is part of the data structure such as a "Turn" to deal with directional links in the network [4], whereas in some systems, directional links in the network can be specified in the application by through code [7].

The ability to reuse existing data is a benefit that new applications should be able to take advantage of [21]. This is often not possible because of problems with data integration due to proprietary data formats. Attempts have been made to integrate formats using standards such as GML [16] and tools such as FME [18]. There is however a particular problem in network GIS in that topology is not exchanged in general import/export (other than that assumed by the geometry). Some systems do not support topology or network analysis functions at all, and yet these non-topological datasets still contain valuable data. In order for network analysis to be carried out the current options are to import data into the tool of choice, coercing data into the required format. If further changes are made to the original dataset then the process needs to be repeated. However, data conversion across systems is not straightforward and is similarly time consuming. Whilst systems exist that handle network topological issues in a structured and efficient manner, these tend to be high cost systems and it is not usually possible, nor desirable to upgrade from an existing system to one that manages topology. Likewise, a one off import of all data into such a system to carry out basic network analysis functions is not a practical solution. Furthermore, data concerning the same feature type may be maintained in different systems. To distinguish the duplicated features when converting to the new database or importing to the new network analysis application is a difficult process. Even at the semantic level, inconsistencies in definition cause problems: for example, a feature, such as a road, may be labelled differently (e.g. as a street) in a different system.

This paper reports on an on-going research project entitled "The development of generic, topology-aware spatial datasets and models". This research has been undertaken to address and solve those problems mentioned above. The research framework comprises three main parts: The first stage is to design a model to incorporate data from various systems and to model attributes, geometry and topology so as to be able to carry out network analysis. Several models were developed to describe real world features and connectivity of features, including the defining rules of connectivity between features. The second stage of the research is to design analytical tools and other tools to manage the data. Several tools are being developed to test the conceptual model designed from the first stage and to support network topology and network analyses. This stage also includes investigation into mechanisms for data integration and dealing with data redundancy. The final stage is to develop an application that can be served across public and private networks to carry out network analysis "on-the-fly". This paper focuses on the first stage where the concepts underpinning the conceptual data model are introduced. A limited implementation of the application for the purpose of testing the data model is also introduced.

Paper Structure
Firstly this paper identifies the research overview and motivation. The overarching concept of the research is then introduced followed by the methodology used. The data model and structures are then discussed. There is an analysis of the implementation thus far and finally some conclusions are drawn on the suitability of the current data model and avenues for further development are presented.

Most major GIS support relational databases in some form, and often it underpins the data structure. Data import/export from and to a relational database is relatively straightforward using built-in functionality, or using macros or scripts to connect to a relational database. Furthermore, ODBC [15] or similar tools for connecting to external data sources are available for most platforms. The widespread support of SQL within relational databases also provides a structured and common interface for addition/update and deletion.

Distribution of applications and data via the World Wide Web and associated technologies is clearly becoming a major trend [12]. Many web GIS applications have been developed over the last few years [8,9], but these applications typically provide only basic GIS functions.

The vision of the research is to enable a "web" based application that allows the transferral, where appropriate, from proprietary datasets, whether topological or not, into a generic relational database. Topology is then created and rules defined to allow network analyses to take place over all the required data. Where appropriate data can remain in an existing format, but topology added by setting a semantic schema for the names of features. Figure 1 gives an overview of the process.

Figure 1 The vision of the research

Research Methodology
This research first investigated real-world features and their spatial and aspatial properties. The geometry and topology were treated as a property of the feature. The network connectivity model was designed including the network connectivity rule. A relational database structure was designed to fit the conceptual model and implemented to collect the data come from various dataset. The application was implemented using the Java programming language, and JDBC [20] was used for connecting to databases. Several tools were created to support the application, especially for network analysis. The tool for building topology was created and tested to build topology based on the connectivity rules. The network analysis functions – network tracing and shortest path – were developed that incorporated analysis of directed networks.

Data Model
To understand the real world, the characteristic of real world features must be studied and modelled. In the real world features are described by some descriptive terms, by physical location and by their ability to connect both physically and logically to other features. Thus a "road feature" could be described by aspatial attributes should as name and length, by a geometry representing its physical location, by topology to represent how it is physically connected to other features and by join relationships to represent logical connections with other features. The modelling of attributes, geometry and relationships are standard fare for most GIS and many database systems, so the following section concentrates on the modelling of rule based topological structures to facilitate network analysis functions.

The real world feature
Phenomena in the real world can be modelled as real world features. The real world is composed of many kinds of real-world features. The characteristics of a feature can be represented by its properties. There are four kinds of feature properties: physical, relational, geometrical, and topological.

A physical property is usually alphanumeric data stored as a number or text. Relationship properties are used for representing the logical relationship between features. The shape of a feature is represented by a geometry. For a 2D coordinate system there are three basic types of geometry: point (x, y pair), chain (series of connected x, y pairs) and area (series of connecting x, y pairs making up one or more complete rings). Topology is commonly used to describe the physical connectivity between features.

Figure 2 Real world phenomena

Network Connectivity

  • Topology
    Topology is the common term used to describe physical connectivity between features. Topology is generally represented by links and nodes. A feature instance is connected to another feature instance via a connection point. This connection point is described by a node, and the path between two nodes is described by a link. Topology is derived from the underlying geometry.

    Figure 3. Link and Node model

    There are two common properties for the link: cost and direction. Cost is the value which is taken into account to find the best path. Commonly the cost is the distance of the link which is adequate for most simple network analysis problems. Direction is used for specifying which direction the network can travel on that link. There are also two properties for the node: in/out cost and degree. In/out cost is the accumulated distance from the starting point that used to find the next distance value at another node of the same link. Node degree presents the number of links associated with it.

  • Directional network
    For some applications topological features require direction as well as connection. If we consider the flow of water in a river, the topology must be modelled to take into account the flow direction of the water. However for other applications such as analysing boat traffic on a river it is more sensible to model the network as non directional or two-way. Moreover in a road network, if we consider the road feature, it may be one way or two way and as is the case in some cities it may change depending on the time of day etc.. Thus there are requirements to be able to model the direction of connectivity whilst retaining flexibility to suit the application in question.

    There are several ways to handle the directional flow of a network. Some systems use a special feature to set the directional flow of the link, whereas other systems set the directional flow in the application using additional coding. This research sets the directional flow as a property of a line and provides the database structure for the directional network as a directed line. A Link feature derived from the directed line is a directed link.

  • Connectivity types
    In order to model real world complexity we also need to be able to express the concept of different types of connectivity. Whilst it may be acceptable to allow road features to connect if they share the same 2D space, it is not appropriate for all situations e.g. fibre optic cables, water mains etc…

    To enable the different types of feature connectivity, we need to model the three ways to connect two link features: end-connection, middle-connection and cross-connection, and the two ways connecting link features to node features: end-connection and middle-connection.

    Figure 4 Connectivity types

    To build an accurate model of the real world situation the type of connectivity between feature types must also be modelled successfully.


  • Network Family
    In the real world there are natural groupings of objects; the various types of roads and paths that make up the road network; rivers, streams, canals, lakes etc. that make up the natural water network; high voltage cables, low voltage cables and transformers etc. that make up an electrical network. With some major exceptions these "families" of objects do not topologically connect with features of other families. The concept of a "network family" is used for establishing the various rules of connectivity between feature types. Features that do not belong to the family cannot connect. This mechanism also provides a simple visual means for the modification of specific connectivity rules and also provides a method for dealing with semantic issues e.g. "street" and "strasse" can both be mapped onto the network family feature "road".

    A family contains a collection of real-world features that may have connectivity to each other in the same network. The example for a simple road network family is shown below. Road Family Road – Road via junction
    Road – Trunk Road via junction
    Road – Slip Road via junction
    Trunk Road – Trunk Road via junction
    Trunk Road – Slip Road via junction
    Slip Road – Motorway via junction A matrix representing the connectivity is shown in Figure 5. The first row and column is a list of line type features that may have connectivity. The inner cells show the Point type feature that facilitates connectivity between them.

    Figure 5. The Matrix Table of Road Family

    The family could also be shown as tree structure by setting the root feature. The view of the tree structure varies depending on the root selected. However the relationship between features is still the same. The example of tree structure is show as Figure 6.

    Figure 6. The Tree Structure of Road Family

    • Connectivity across network families
      Network analysis across two families may be required for some applications, e.g. a route planning application may require movement between the road and the rail network families. The network can trace across families if there is a common point connection feature in both families. For instance, a rail station is in both the "road" and the "rail" families and therefore a trace can cross between them via a rail station.

    Conceptual Model
    Based on the above a conceptual data model was developed. An object-oriented design concept was followed as it allows one to develop models that more closely resembles the real world [19]. The ISO Spatial Schema [10] and the OGC Feature Geometry Specifications [17] were adopted and adapted where necessary. Features have attribute data which is described by alphanumeric data types and treated as a physical property. Geometry and topology are used to describe the shape and continuity of features respectively and are treated as objects relating to a feature. A feature can also have a logical relationship to another feature, and this property is the relationship property. The "Family" contains a collection of features that may connect. Features can be associated with one or more families. Figure 7 gives an overview of the conceptual data model.

    Figure 7. Conceptual Data Model


    Database Model
    Once the conceptual data model had been finalised, a physical model was implemented. A simple relational database such as Microsoft Access was chosen to prove the conceptual model. There are six main tables, created for storing spatial data that link to the spatial object. The Family table is used to store the relationship of network connectivity of each family.

    Figure 8. Logical Relational Database Model



Application development
Application development took place in several phases. The first phase was the development of mechanisms for the export of data from a variety of sources and formats. Thus a tool was developed to export data from ArcView via ODBC using SQL. GE Smallworld Magik application language was used to develop a similar system using COM (Component Object Modelling – a Microsoft software architecture) [14] and SQL for export from GE Smallworld VMDS (Version Managed Data Store).

The application to display, query and modify the relational database was developed in Java. Several tools were created for supporting the application: data query, manipulation, display and analysis functions.

Figure 9. Application Model

In the main application JDBC was chosen to facilitate database connection and to extract data into the application. The JDBC-ODBC bridge and an ODBC driver, Type 1 JDBC driver [20], was used for the database connection.

Many basic functions were developed in the application tools: selection, intersection, snapping, transformation, and so on. The GUI was developed from AWT and Swing packages provided as part of the Java platform. The tool to build the topology based on the connectivity rules defined in the family and stored in the database was developed. The network analysis functions were also created. There are two basic functions for network analysis: network trace, and shortest path. The Dijkstra [3] algorithm was used for both network analysis functions. The GUI of the application is shown in the Figure 10.

The implementation has been tested both with simulated data and a network of transport data extracted from a sample GE Smallworld dataset of the Cambridge area. The network connectivity rule in the "road" manifold was transferred to the network "road" family. The topology of the network family was built up, and then the network analysis was tested.

This paper describes a conceptual generic data model for the handling of spatial and topological feature data. The limited implementation that has been completed has demonstrated the validity of the model using a variety of test data sets from several proprietary formats. The network analysis functions implemented so far have proved the validity of the data model in respect to the handling of topological analysis for directional and non-directional networks. The concept of the network "family" of features presents a user-friendly interface to the assignment of network connectivity and overcomes several issues that face integration of topological and non-topological datasets.

Future work will focus on data integration, and an implementation with extended data sets. The mechanisms for checking data duplication (where the same feature is stored in more than one original dataset) will be investigated. Several scenarios of network applications will be tested. Finally more work is required to serve the final application in an "application server" environment running over a private or public network using web technologies.


  1. Burrough, P. and McDonnell, R. Principles of Geographical Information Systems. Oxford University Press Inc., New York. 2000.
  2. David, M. Understanding Topology and Shapefiles. ArcUser April-June 2001.
  3. Dijkstra, E. W. A note on two problems in connexion with graphs, Numerische Mathematik, 1, 269-271, 1959.
  4. ESRI. Online ArcInfo help. "Clean" and "Network Elements".
  5. ESRI. The GIS Glossary, .
  6. GE Smallworld. Online application help. Data Modelling – 5.2 Modelling geometry.
  7. GE SmallWorld Network Solutions .
  8. GeoCON. .
  9. Internet GIS Software. /technology/gis/techgi0027.htm .
  10. ISO. ISO/TC 211 Geographic information – Spatial schema.
  11. Kosters, G., Pagel, B.-U. and Six, H.-W. GIS-application development with GeoOOA. International Journal of Geographical Information Science. 11, 307-335, 1997.
  12. Jeff Fitzgerald. CARIS Spatial Fusion: an Internet GIS. https://spatialnews.geocomm.com/whitepapers/fusion.pdf .
  13. Legault. Topology in GIS Software. GEO Asia Pacific December/January 1999.
  14. Microsoft – Microsoft Com Technologies – Information and Resources for the Component Object Model-Based Technologies. January 1999. https://www.Microsoft.Com/Com/ .
  15. Microsoft. What is ODBC.
  16. Open GIS Consortium. Geography Markup Language (GML) 2.0. https://www.opengis.org/ .
  17. Open GIS Consortium. The OpenGIS Abstract Specification – Feature Geometry. https://www.opengis.org/ .
  18. Safe Software. FME: Feature Manipulation Engine. https://www.safe.com/ .
  19. Steve Demartino, and Eric HrnicekObject-oriented GIS 101: New GIS Data Model Features Intelligent Objects. .
  20. Sun. The JDBC API. https://java.sun.com/products/jdbc/overview.html .
  21. T. Devogele, C. Parent, S. Spaccapietra. On Spatial Database Integration. International Journal of Geographic Information Systems, Special Issue on System Integration. Vol. 12, No 3, 1998.
  22. Tarle, T. L. Topology – Why Bother?. GIS '95 Conference Proceedings. Fort Collins 1995. 2:736-738.
  23. Timms, T. Relationship and Behaviour in GIS. Proceedings of the Association for Geographic Information (AGI) 4th National Conference. 1992.
  24. UK Street Works Act. .
  25. Valsecchi, P., Claramunt, C. and Peytchev, E. OSIRIS: An inter-operable system for the integration of real time traffic data within GIS. Computers, Environment and Urban Systems. 23(4), 245-257, 1999.