Home Articles Developing an Internet-GIS Application using GML

Developing an Internet-GIS Application using GML

 
Ehsan Mohammadi
M.Sc. Student, Dept. of GIS Eng.
Email: [email protected]

Ali Aien
M.Sc. Student, Dept. of GIS Eng.
Email: [email protected]

Ali A. Alesheikh
Assistant Professor, Dept. of GIS Eng.
Email: [email protected]

K.N. Toosi University of Technology Vali_asr St, Tehran, Iran, P.C. 19697
Tel: +98 21 8789357 Fax: +98 21 877 9476

1. Introduction
The ability to extend the audience of GIS to a large circle of users via Internet is growing as well as growth of Internet technology itself. Progress of the Internet technology is the result of many attempts and through these attempts, a great collection of data formats, programming languages, hardware and software appeared. A large part of Internet progress depends on the suitable approaches of dealing with data and information, and markup languages (MLs) play the main role in this progress. A markup language is a way of describing a document by placing tags in the document. Markup languages differ from many other programming languages, in containing loops, conditional logics, subroutines and some other programming structures. There are many markup languages with different applications these days. The most important MLs are: SGML, HTML, and XML.

SGML stands for Standard Generalized Markup Language and is a powerful markup language in handling large quantities of structured data, but it is complex to use. HTML is the language of the web. It defines the way that images, multimedia and text are displayed in web browsers. It is simple to learn and use, but is not good at describing what data means. It is just a picture of information. XML stands for eXtensible Markup Language and was designed to describe data. It is created to structure, store and to send information. It also allows everyone to create his own information, send anything to anywhere for anybody. It has the capability of SGML and simplicity of HTML. It is also free and clear for any user.

XML also is used to create other markup languages for particular applications. For instance, GML is a very useful and simple markup language for GIS applications that is extended by XML. In order to provide an Internet-GIS application, GML must include spatial and non-spatial information together. Some technologies help GML to show the graphic of spatial information, like SVG. SVG itself is a world of graphical tools, which can be handled with using just text.

Integration of these technologies, that follow the principals of markup languages, helps designers to implement simple and versatile applications via Internet.

2. XML
XML is a computer language for defining MLs and is used to create structured documents. Defining the contents of a document is equivalent to creating structured information in the document.

XML opens a great view for creating other markup languages, and allows other applications to have a specific ML only for themselves. The focus of XML is on data and what data is. It means that the duty of defining tags and data is not predefined for XML and designers can do it according to their aims. Any XMLWriter allows the designer to describe parts of XML document, which composes of two main components: The structure of XML document and DTD or schema.

The first component includes of a prolog and elements. The prolog for an XML document states some information to the parsers. This information expresses that this document is marked up in XML and can contain XML processor instruction. The prolog also includes text encoding, declaration of special pieces of text, and the DTD (Document Type Definition) or schema being used. The elements come after the prolog and let XML tags be written here.

The second component is DTD, the grammar of the XML page. It actually is a tool to create and describe XML tags. Once a DTD is created and a document has been written based on that DTD, the document will be compared to the DTD that will cause the validation of the document. If the XML document follows the rules listed in the DTD, then the document is said to be valid, otherwise it is called invalid. Schema has the capability of DTD but differs in some characteristics, like, it is predefined for a specific application, then, there is no need to define how XML application will work.

To use the XML document, some programs are needed to distinguish between text and information contents in the text. Parsers are programs that facilitate the task for users. Parsers can read XML documents and parse the XML information into data and markup. Internet Explorer 4 was the first web browser that included XML parser. Some of the renowned XML parsers are IE5, Expat, and Lark.

The specific advantages of XML are:

  • XML is straightforwardly useable over the Internet.
  • XML moves most of the processing form the server to the clients.
  • XML supports a wide variety of applications.
  • XML is compatible with SGML.
  • XML documents are human- legible and reasonably clear.
  • The XML design is prepared quickly.
  • The XML document is easy for machine to understand.
  • The XML documents are easy to create.
  • It is easy to write programs that process XML documents, and so on.

In order to develop an Internet-GIS application, some new schemas are introduced by XML. Among these new schemas, GML was chosen to maneuver on.


3-Effective Parameters on the Shortest Path Algorithms
Four parameters that effect on shortest path algorithms are Network Representations, Labeling Method, Selection Rules and Data Structures (Zhan and Noon, 1996). These parameters effect on shortest path algorithms performance and cause to be developed different kind of algorithms.

3-1-Network Representations
The way in which an input network is represented and implemented in a shortest path algorithm is vital to the performance of the algorithm.There are several straightforward ways of representing a general network for computational purposes. Past research has proven that the Forward Star representation is the most efficient data structure for representing networks (Gallo and Pallottino, 1988). Two sets of arrays are used in the forward star data structure. The first array is used to store data associated with arcs, and the second array is used to store data related to nodes. All arcs of a network in question are maintained in a list and are ordered in a specific sequence. That is, arcs emanating from nodes 1, 2, 3, …, are ordered sequentially. Arcs emanating from the same node can be ordered arbitrarily, however. All information associated with an arc, such as starting node, ending node, cost, arc length and capacity are stored with the arc in some way (e.g., corresponding arrays or linked lists).

3-2-Labeling Method
The labeling method is a central procedure in most shortest path algorithms(Gallo and Pallottino, 1988). The output of the labeling method is an out-tree from a source node, s, to a set of nodes. This out-tree is constructed iteratively, and the shortest path from s to i is obtained upon termination of the method. Three pieces of information are maintained for each node i in the labeling method while constructing a shortest path tree: the distance label, d(i), the parent node, p(i), and the node status, S(i). The distance label, d(i), stores the upper bound of the shortest path distance from s to i during iteration. Upon termination of an algorithm, d(i) represents the unique shortest path from s to i. The parent node p(i) records the node that immediately precedes node i in the out-tree. The node status, S(i), can be one of the following: unreached, temporarily labeled and permanently labeled. When a node is not scanned during the iteration, it is unreached. Normally the distance label of an unreached node is set to positive infinite. When it is known that the currently known shortest path of getting to node i is also the absolute shortest path we will ever be able to attain, the node is called permanently labeled. When further improvement is still expected to be made on the shortest path to node i, node i is considered only temporarily labeled. It follows that d(i) is an upper bound on the shortest path distance to node i if the node is temporarily labeled; and d(i) represents the final and optimal shortest path distance to node i if the node is permanently labeled.

3-3-Selection Rules and Data Structures
The performance of a particular shortest path algorithm partly depends on how the basic operations in the labeling method are implemented. Two aspects are particularly important to the performance of a shortest path algorithm: 1) the strategies used to select the next temporarily labeled node to be scanned, and 2) the data structures utilized to maintain the set of labeled nodes (Ahuja et ah,1993).

Strategies commonly used for selecting the next temporarily labeled node to be scanned are FIFO (First In First Out): the oldest node in the set of temporarily labeled nodes is selected first in this search strategy, LIFO (Last In First Out): the newest node in the set of temporarily labeled nodes is selected first in this search strategy, BFS (Best-First-Search): the minimum distance label from the set of temporarily labeled nodes is considered as the best node (Gallo and Pallottino, 1988).

A number of data structures can be used to manipulate the set of temporarily labeled nodes in order to support these strategies.These data structures include arrays, singly and doubly linked lists, stacks, buckets and queues (Sedgewick,1990). A singly linked list contains a collection of elements. Each element has a data field and a link field. The data field contains information to be stored, and the link field contains a pointer pointing to the next element in the list.A doubly linked list contains two pointers. One pointer points to the previous element in the list, and another pointer points to the next element in the list. Stack is another special type of list which only allows removal and addition of an element at one end of the list. The bucket data structure is related to the double bucket implementations of the Dijkstra algorithms. Buckets are sets arranged in a sorted fashion. Bucket k stores all temporarily labeled nodes whose distance labels fall within a certain range. A queue is related to the Graph Growth algorithms.The queue is a special type of list which allows the addition of an element at the tail and the deletion of an element at the head (Ahuja,1993).

4- Reviewing of Algorithm Theories
In this section we breifly review implementation senarios for Dijkstra's Algorithm, Huristic Methods and Genetic Algorithm.


4-1-Dijkstra's Algorithm
Dijkstra's Algorithm (Dijkstra, 1959) for computing the shortest path is based upon the caculation of values in three one dimensional arrays, each of size equal to the number of nodes in the network. Each row of each array corresponds to one of the nodes of the network. As the algorithm proceeds, paths are caculated from the start node to other nodes in the network, paths are compared and the best (minimum weight) paths are chosen, given the state of knowledge of the network at that stage in the progress of the algorithm. At each stage in the computation:
  • The array distance keeps track of the current minimal distances from the start node to the array nodes. (As the algorithm proceeds, these distances become refined to closer approximations to the shortest distance).
  • The array path keeps a record of the preceding nodes on the current best paths from the start node to the array node.
  • The array included marks off the nodes as they are used in the caculation of the minimal distance path from start node to the end node.

The algorithm initialized arrays as follows:

  • Cells in the distance array are initialized to 0 for the start node cell; infinity for the cells whose nodes are not directly connected to the start node; and to their direct distances from the start node to all the other cells.
  • Cells in the path array are initialized to the start node for cells whose nodes are adjucent to start, otherwise undefined.
  • Cells in the included array are initialized to 'no' for all cells except the start node cell.

The algorithm then iterates the following procedure until the end node market 'yes' in included array.

  1. Find the node (say n) whose distance from the start node is smallest amongst all those nodes not yet market 'yes' in included.
  2. Mak n as 'yes' in included.
  3. For each node m not yet market 'yes' in included:
    If n and m are adjucent and if the current distance (in the distance array) from start to n plus the edge weight of nm is less than the current distance from start to m then update the distance array to the new and smaller distance from start to m; also update the path matrix to show the proceding node in the path to m to be n.

Eventually, the end node will be included in the computation and the algorithm halts, returning the shortest path from the start node to the end node in the end node cell of distance. The time complexity of Dijksta's is O(n^2), where n is the number of nodes (Worboys M.F.,1995).

4-2-Huristic methods
Multiple goals, factors and constraints in transportation and location real problems make very difficult their treatment with the traditional optimization methods. On other hand, there are different heuristic and probabilistic methods that no always guarantee the optimal solution but are able to find a possible solution space, taking advantage of the particular attributes of the target problem. In a recent study, a set of two shortest path algorithms that run fastest on real road networks has been identified. These two algorithms are: 1) the graph growth algorithm implemented with two queues, 2) the Dijkstra algorithm implemented with double buckets, we reviews and summarizes these two algorithms (Zhan and Noon, 1996).

4-2-1- The Graph Growth Algorithm Implemented With Two Queues
The Graph Growth algorithm implemented with two queues (TQQ) was introduced by Pallottino in 1984. TQQ is an improved version of the Graph Growth implementation developed by Pape (PAP) in 1974. There are four basic operations in the basic procedure for constructing a shortest path tree in this method: Queue_Initialization(Q) initialize queue; Queue_Removal(Q, i) remove node i from queue Q; Queue_Insertion(Q, j) insert node j into queue Q; and Q=Null? check whether queue Q is empty.

In the implementations of TQQ, nodes are partitioned into two sets: the first set of nodes are those nodes whose current distance labels have not already been used to find a shortest path and the second set contains the remaining nodes. The first set of nodes is maintained by a queue Q. Nodes in the second set are further split into two categories: 1) the unreached nodes which have never entered Q, i.e., nodes whose distance labels are still infinite, and 2) labeled nodes, i.e., the nodes that have passed through Q at least once, and the nodes whose current distance labels have already been used. In TQQ we used a data structure called Two-Queue (Q) to maintain the first set of nodes in Q. A Two-Queue consists of a FIFO queue (Q') and a FIFO queue (Q''). Nodes can be inserted at the end of Q' and Q", and they can be removed from the head of Q' and Q".It follows that for any node that is not already in Q, the node is inserted at the end of Q' if it is unreached, or the node is inserted at the end of Q" if it is temporarily labeled (Pallottino,1984).

4-2-2- The Dijkstra Algorithm implemented with double buckets
The original Dijkstra algorithm partitions all nodes into two sets: temporarily and permanently labeled nodes. At each iteration, it selects a temporarily labeled node with the minimum distance label as the next node to be scanned. Once a node is scanned, it becomes permanently labeled. The Dijkstra algorithm terminates when all nodes become permanently labeled. detailed procedure of the Dijkstra algorithm has been described in section 4-1. In Dijkstra's original algorithm, temporarily labeled nodes are treated as a nonordered list (Dijkstra,1959). This is equivalent to treating the queue Q in the above general procedure for shortest path tree construction as a nonordered list. This is of course a bottleneck operation because all nodes in Q have to be visited at each iteration in order to select the node with the minimum distance label. A natural enhancement of the original Dijkstra algorithm is to maintain the labeled nodes in a data structure in such a way that the nodes are sorted by distance labels. The bucket data structure is just one of those structures. Buckets are sets arranged in a sorted fashion. Bucket k stores all temporarily labeled nodes whose distance labels fall within a certain range. Nodes contained in each bucket can be represented with a doubly linked list. A doubly linked list only requires O(1) time to complete an operation in each distance update in the bucket data structure. These operations include: 1) checking if a bucket is empty, 2) adding an element to a bucket, and 3) deleting an element from a bucket.

Dial (1969) was the first to implement the Dijkstra algorithm using buckets. In Dial's implementation, bucket k contains all temporarily labeled nodes whose distance labels are equal to k. Buckets numbered 0, 1, 2, 3, …, are checked sequentially until the first nonempty bucket is identified. Each node contained in the first nonempty bucket has the minimum distance label by definition. One by one, these nodes with the minimum distance label become permanently labeled and are deleted from the bucket during the scanning process. The position of a temporarily labeled node in the buckets is updated accordingly when the distance label of a node changes (Ahuja et ah,1993).

Two levels of buckets, high-level and low-level, are maintained in the DKD implementation. A total of d buckets in the low-level buckets are used. A bucket i in the high-level buckets contains all nodes whose distance labels are within the range of [i*d, (i+1)* d-1]. In addition, a nonempty bucket with the smallest index L is also maintained in the high-level buckets. A low-level bucket d(j)-L*d maintains nodes whose distance labels are within the range of [L*d, (L+1)* d-1]. Nodes in the low-level buckets are examined during the scanning process. After all nodes in the low-level buckets are scanned, the value of L is increased. When the value of L increases, nodes in the nonempty high-level buckets are moved to its corresponding low-level buckets, and the next cycle of scanning process begins (Cherkassky, 1993).
 


4-3-Genetic Algorithm
In this context, Genetic Algorithm is a metaheuristic technique (Diaz, 1996) that could provides robust tools with optimal or quasi-optimal designing and programming of transportation networks and node locating. Because its excellent flexibility, robustness and adaptability characteristics, genetic algorithm has been successfully applied in the non-linear and complex optimization problem solutions, and also it is very appropriated to face the noisy combinatorial problems associated to the real systems optimization and transportation networks.

Individual representation as paths, composed of routes (stretch joining two adjacent nodes), conforming a trajectory from an origin node to a goal node, and operators utilization on sub-paths are novel features presented in this paper. Different solutions with Genetic Algorithms to the same problems have been well studied in (Gen, 1997). A route is the two-neighbor nodes union and is represented as a matrix containing the connecting arcs of each node. The possible amount of paths in each generation corresponds to the population. Besides, not there are, as traditional solution methods, arc constraints; in other words, directed and not directed arcs could be present in the model. Fitness is assigned comparing each path with the others, and is the method to determine if a path has possibility of survive to next generation. Fitness is computed as the sum of products of the route lengths and their associated costs. Selection operation extracts first allocated elements of the sorted population and they are transferred to the first positions of the next generation population, based on a pre-established selection rate.A simple crossing operator is used: two paths from the previous generation are randomly chosen, two cross points are randomly picked in each path string, then the marked fragments of the paths are extracted and interchanged in both paths, previously doing a checking for incompatible nodes. Mutation operator is used in a similar way of crossing one. A path and its mutation points are randomly chosen. A new legal sub-string is random generate. This sub-string will occupied the marked sub-path. Control parameters of the model are defined as a condition set to guide in an efficient way the algorithm, like: Generations number, generation paths number, path length, finishing criteria, selection, mutation, and crossing rates. The finish criteria used to be dependent of the predetermined generation number, or the objective function valuation stability.In the case of n nodes a number of n generations was selected. The more adequate selection, crossing and mutation rates for the proposed model were found as 34%, 42%, and 34% of the population paths, respectively.

The shortest path problem to solve is a generalization of the traditional problem of shortest path. This is, optimal (upon fitness function evaluation) path searching connecting an origin node with a goal node, passing through an intermediary node set, linking each other by non directed routes, from which, length, transportation costs, average altitude, time attributes, etc, are known, and used to evaluate the fitness function.The genetic algorithm allows an initial path population by random generation. Each path-individual has a fitness function facilitating to be differentiated with others, and then through genetic operators participate in next generation.