Geospatial: The original Big Data

Geospatial: The original Big Data


Fred-C.-CollinsIn the API service economy, businesses are able to make the most of their data by creating compelling customer experiences and opening new revenue channels

Geospatial data, be it remotely sensed data, weather data, or digital aerial photography was ‘Big Data’ well before the term was coined by Gartner in 2001 — first with three Vs, and further popularized in 2012 with the four Vs — volume, variety, velocity and veracity.

Before this modern era of Big Data, we simply accepted geospatial data as being characterized by large datasets, heterogeneous data sources, high-streaming rates from hyperspectral and video imagery, and timeliness and accuracy issues associated with rapid data aging and homogenization of data sets with different formats, structures and quality. API-based geospatial services, Cloud-computing platforms, and the availability of curated geospatial data are allowing programmers and analysts to leverage geoinformation data more easily than ever before.

Leveraging the API Service Economy

There was a time when you created a service, you needed to create all the underlying components that made up that service. These days, companies such as Uber leverage the API Service Economy. Uber did not reinvent the wheel: they orchestrated existing services in order to create an exceptional user experience that allowed them to disrupt an entrenched business with minimal investment in the resources. Geospatial services are a key component of Uber’s success. They consume location-based mapping services, payment services, and other component services which are provided by third parties as monetized API-based services.

In the API economy, application programming interfaces act as the digital glue that links services, applications and systems. This approach allows companies to take these components and orchestrate them into an app to provide a service like Uber, without having to deal with the underlying complexities of each element. Thus, businesses are able to make the most of their data by creating compelling customer experiences and opening new revenue channels.

Cloud-based geospatial platform

Leveraging the API Service Economy necessitates a scalable Cloud-based development platform. Uber grew an astonishing 38 times bigger in just four years. Without a scalable platform with the ability to quickly pivot and change out technologies, the support of such growth would have been impossible. Geospatial platforms have been historically purpose-built for geospatial data. The challenge of purpose-built platforms is that they tend to be closed ecosystems, and the support for non-geospatial API-based services is limited.

If there is one fly in the geo ointment, it has to be the availability of easy-to-consume, curated digital content

Conversely, traditional Cloud platform as a service (PaaS) environments tend to be limited to simple javascript API map renderings, and do not support more complex geospatial analytics. IBM’s Cloud PaaS is Bluemix.

Bluemix is an open-standard, Cloud-based platform for building, running, and managing applications. It features a catalog of IBM, third party, and open source services (including geospatial) that allow developers to stitch an application together quickly. It features a rich suite of Internet of Things (IoT) services and support and offers a free service tier for developers. Bluemix is the easiest means to access IBM’s cognitive Watson services. Bluemix developers can build a new generation of cognitive geospatial apps that enhance, scale, and accelerate human expertise by embedding Watson services and content through APIs. Services include user modeling, machine translation, text-to-speech, speech-to-text, relationship extraction, visualization rendering, and more.

Curated geospatial data

If there is one fly in the geospatial ointment, it has to be the availability of easy-to-consume, curated digital content. With our increased dependency on geospatial services for everything from augmented reality gaming to personalized transportation services, there is an even greater demand for accurate and timely-curated geospatial data. Currently, the vast majority of geospatial information is raw and without context. There are diverse data sources, multiple resolutions, coverage gaps, varying data types, varying update frequencies and various levels of data accuracy. The data needs to be in a form that is consumable.

Data curation means integrating, enriching, aligning, reclassifying, and re-projecting the data. An examination of many analytics engagements shows that 70% of the time and cost invested is spent simply finding, organizing, aligning and classifying data. And often, the datasets could drive value across multiple teams, enterprises and industries.

Greater automation in the curation process is needed. One possibility is the leveraging artificial intelligence (AI) technologies, such as IBM’s Watson, to automate geospatial data curation. IBM has already started training Watson to analyze visual data for medical imaging. Watson will be given access to 30 billion medical images that IBM has acquired in its purchase of health-tech company Merge, to figure out how to distinguish a normal medical image from an abnormal one.

Real-time location-monitoring

The following example demonstrates real-time maritime risk profile monitoring for customs which leverages some of new geospatial services on Bluemix.

The application uses Automatic Identification System (AIS) radio data that ships use for identification and locating reporting. All commercial vessels and larger private ones are required to transmit an AIS signal, so that others know where they are and potentially avoid being in exactly the same place at the same time — not a good scenario. Ground stations can use the AIS data to monitor local traffic in busy areas, support maritime security or even help with coordinating search and rescue efforts. The service allows you to monitor devices (cars, boats, people, even your dog) as they move around the globe and enter or exit geographic regions of interest.

Figure 1: 7 Geofences along with the current location and heading of vessels in Sydney Harbour

For the purposes of this demonstration, seven regions of interest across Sydney harbour were created that aligned with one of the most travelled ferry routes — Manly to Circular Quay. These regions allow you to see when a ferry arrives or leaves port, plus receive updates on progress as the vessel travels between the two locations. The IBM Geospatial Analytics service, allows you to define different conditions for each geofence, either triggering only on entry, or on both entry and exit. Simple logic was used to format the text string to be sent to Pushover for notification and also provide a REST API, to be able to remotely enable or disable the Pushover notifications. The map (Figure 1) shows the 7 geofences (the red polygons) along with the current location and heading of vessels in Sydney Harbour.

This shows how easy it is to start leveraging geospatial data on IBM Bluemix with the Geospatial Analytics service. Given that the service is built on top of InfoSphere Streams, there would be no issue with scaling this application to cover a far larger geographic region, many more regions of interest, and significantly more device updates per second.

Fred C. Collins
IBM Distinguished Engineer, IBM Global Business Services