CEO, Orkash Technologies
Head, smart-systems technology
In a world where about 80% of data has a geographical component, the geospatial context forms an important aspect of human knowledge, information interpretation, reasoning and decision-making.
It is also the underlying common connect for a vast majority of events, relationships, human behaviour and natural and man-made phenomena. The importance of location and geospatial context in human cognition, and how we create, interpret, and consume information is enormous.
Information on the World Wide Web largely remains unstructured and devoid of the means of extracting its geospatial context. This, along with lack of semantic understanding capabilities, has been the biggest gap in the ‘search engine’ based information discovery and exchange models that revolutionised the world in the last few decades.
With the advent and wide availability of of Web2.0 and Web3.0 technologies, the process of giving structure to the Web-based information sources, extraction, search, transformation and analysis promises to get increasingly automated. Yet, the gap in semantic and geospatial contextualisation is a non-trivial challenge. The two fields have a direct interdependence; the geospatial contextualisation of unstructured data requires semantic understanding of location based relationships in events and named entities. So far, there has been only a limited adoption of technologies that would put semantic search and geospatial contextualisation capabilities in the hands of the common user. The barriers to this have been both the complexity of the architecture and the scale that is required for creating and delivering such capabilities.
To put customisable Internet-based intelligence and decision support platforms in the hands of the common user for everyday use, much as we use search engines, Orkash developed a solution that has powerful applications for enterprises across a wide range of areas like knowledge extraction, market and competitive intelligence, homeland security, location risk management, disaster management, supply chain and logistics, and battlefield C3I systems. This first-ofits- kind architecture for automating the process of intelligence creation from Web based unstructured data sources
(e.g. websites and blogs) is achieved through the use of a semantically enhanced expert engine platform that is Web 3.0 compatible and integrated with business intelligence and a GIS platform. This enables delivery of semantic and geospatial analysis capabilities using Web-based information sources.
That the discovery and exploitation of geographic information provides a useful new paradigm for the navigation and retrieval of Web information is well established. The integration of Web mining and natural language processing (NLP) engines in the backend of Orkash’s prototype provides the ability for identifying events and relationships and their geographic context in unstructured data that resides on the Web.
This prototype is built around an earth-browser (e.g. NASA World Wind – Figure 2) as a means to architect an integrated platform for delivering Web3.0 type analytical Web services and data visualisation (raster and vector data), incorporating semantic search, geospatial contextualisation, NLP based location, events and relationships contextualisation, and data mining/BI decision support platform accessing disparate databases (that can be local, LAN or internet based). The other important aspect of the design of this prototype are the plug-ins for the earth-browser based GIS front-end which provide for enhancing the functionalities and features of this application. It, thus, needs not to rely on a single source or resource for redundancy, robustness and scalability of the application.
As a new technology platform, there exist limitations in the prototype. However, it has clearly established the capabilities as an extremely powerful means for providing, almost in real time, semantically and spatially contextualised intelligence that is derived from the Web. The earth-browser based visualisation platform does not attempt to supplant the analytical
capabilities of a GIS application, but instead provides the means to the end-users to visualise spatial data and to query Web enabled RDBMS and BI applications. The architecture also lends itself to facilitate the integration of the Web services, data and user created content in the form of mashups, XML, KML, SOAP etc.
“Base application (Geospatial) Box” (in Figure 3) is the core component of this prototype which includes the capabilities to visualise the geospatial data using maps. The maps could be in the form of raster or vector; users have the capabilities to add more vector data and visualise those using the earthbrowser GIS platform e.g. a business user can put the locations of their offices with appropriate information and share the information with other users either using the centralised data services or by transferring the information on its own. “Data analysis”, as the name suggests, analyses the data created either by the users or by any other application, or crawled from the Web. This component includes artificial intelligence based expert systems e.g. Natural Language Processing (NLP), algorithms, knowledge base in the form of ontologies and data mining. It has the capabilities to generate and query geospatial data as well as conduct semantic
search. To make it effective, there are analysis engines which analyse the data and information from different sources and integrate those with data based upon some features and profile or domain specific criteria.
The NLP Contextualisation Engine contextualises the information or analysed data with respect to location and extracts geospatial context of events and relationships. The NLP engines also analyses the semantic structure of the information and writes it to a database. The user can query the contents semantically and visualise the data through the earth-browser geospatial platform.
The “Web Application & Services” component integrates the existing Web applications and services like mashups, SOAP, XML, RSS an REST. These services are accessible using the URL. The system provides to generate the services using Web 3.0 techniques for interoperability. This permits interchange of the data from or to different sources of information. It gives the users the power to access the data in real time from other applications.
The Collective Intelligence Extraction and “User Created Content” module facilitates the creation of the content as per the users' requirements and its integration with user generated content existing on the Web (e.g. social networking sites). This feature is intended to provide the functionality to share, comment, modify and recreate the contents as a means for creating ‘collective intelligence’. It also facilitates communication, collaboration and sharing of user created content.
The users can put their proprietary data on their local system in the form of some specific standards like XML, doc, pdf, KML, GML (georeferenced data) etc. and then transfer the data through a secure link or network. On the other side, the transferred data can then be put back to the system by accessing the local system and parse the information using the plug-ins built into the prototype. There is also the centralised storage system to put the content with the access rights of the users and groups. It also facilitates integration of other information that resides in server side.
BI Visualisation – There are different mechanisms to pass the information to the GIS section of the dashboard e.g. Web services, GML, KML, GeoRSS etc. The dashboard and graphs provide the power of spatial analysis on the vast amounts of data either available in the centralised database or on the local machine. By simply loading the data from the source and defining the parameters for displaying the data, the system can synchronise the results with the help of spatial data on GIS maps. Users can save the analysed result and then distribute to other users for collaborative sharing of work and team-based decision making on the same spatial view. Users also have access to the features to create the spatial data layer dynamically from data analysed on the dashboard using the facts (cubes) and dimensions.
Some of the challenges of this prototype include:
• Information sharing and analysis of content in the collaborative environment
• Users need adequate tools in order to quickly access and filter relevant information
• ‘Concept’ level semantic searching
• Huge spatial data sets have to be kept up-to-date at ever increasing cycles
• To discover the knowledge from the data and to accelerate update cycles and deliver actual information on-the-fly, techniques for automated contextualisation and metadata extraction of initial data captured and updates are required
• Requirement of spatial data mining to deliver the information from huge amount of raw data
• Discovering relationships between spatial and nonspatial data, construction of spatial knowledge-bases, query optimisation, characterisation of spatial data
• Optimisation of code for running on large parallel-clusters This is the condensed version of the paper that bagged The Best Paper Award at Map World Forum 2009. For full paper, please check, www.gisdevelopment.net