Location is the universal key

Location is the universal key

SHARE

Oracle is a platform and applications provider. My views on geospatial technologies are therefore from the perspective of a platform provider.

Steven Hagan
Steven Hagan
Vice President – Product Development
Oracle

What, in your view, is the future direction for geospatial technologies?
Oracle is a platform and applications provider. My views on geospatial technologies are therefore from the perspective of a platform provider.

Geospatial information is increasingly more available and is getting voluminous. Projects, in utilities or land management, now often involve terabytes and petabytes of data collected at high input rates and with greater accuracy and precision. So, the future is about storing, processing and providing access to this volume of data at desirable speed. This includes near real time response requirements for ingesting data, processing and filtering it and answering queries in milliseconds or even microseconds.

There are four basic drivers influencing the geospatial industry. These are big data, big software, real time analytics and big hardware.

Big data is the outcome of the increasing number of data devices: collectors, sensors and servers that are growing at an enormous rate. Then there are data collectors like LiDAR and raster data, which are generating an increasing volume of data per device. There are also threedimensional models, city models and terrain models for government agencies.

These can be combined with the spatial data infrastructures which facilitate access to all the data. In this scenario, interoperability becomes crucial. To support this interoperability, since the use of standards such as INSPIRE is vital, the semantic capabilities of the platform are extremely important.

Where big software is concerned, all classic applications such as ERP, customer relationship management, as well as business intelligence are becoming integrated and spatially enabled. They have to be, or they won’t be relevant.

In real time analytics, we must process huge streams of data in real time in order to give responses that are required in near real time – response to events that are happening on the ground or when we are looking for some correlation among events.

And finally, the platform shift. We all have heard about the cloud. There will be cloud platform capable software but underneath the cloud, there must be hardware that actually is fast enough and scales well enough to support it. These may look invisible but underneath there must be a solid foundation. There are many instances where the government is required to rapidly disseminate information or applications for its stakeholders in the areas of defence, intelligence, land management, environment, water management and agriculture.

Then there is the integration: take the operating sensors and add video, audio and text coming from applications or websites. Location information helps integrate such unstructured content. Even social networking sites, e.g. Facebook or Twitter, now have location as an integral element. All of these data have to be integrated and analysed. This is turn drives the increasing volumes of data and the processing power needed to analyse them.

The reality is that location is the true universal key. Once you have location you can mash up and correlate any information attached to it. Location is part of the infrastructure, whether it is the phone or car or in networks. Location has made it possible to relate information. That’s why the infrastructure we provide is so relevant and widely used – it handles location as an integral component; whether it is within a database, or documents, other applications or generated as part of a business workflow.

This spatial information management is a key feature of Oracle’s capabilities as a database and an application development and deployment platform. Location is just a column in the database. Your address, latitude-longitude, your enterprise information and other properties are a normal part of information management to us. So it becomes an attribute of the business entities in your enterprise applications such as procurement, financials, CRM, HR, manufacturing or marketing – you are your IT infrastructure.

With the increasing ubiquity of sensors and collection of location information comes the requirement of real time analytics: the ability to respond to changing events, e.g. tracking or monitoring mobile assets in near real time. This holds true not just for people, but also for cars, trains, airplanes, and UAVs. We process streams of data using complex event processing to look for interesting events that have happened, look for quality metrics and/or aggregating the data, trying to find correlations and trends and do all this in real time to give feedback to customers.

How is cloud influencing geospatial practices?
The cloud as a platform and architecture has become an industry buzzword. Oracle supports all flavours of the cloud. In general, there are two primary approaches from the users’ point of view – public and private clouds. Some customers are okay with a public cloud where the provider has total control of the data and applications. That is, the control of the applications and information lies outside of the government agencies or private enterprises that own it. Many organisations, however, have security and reliability concerns, so they need a private cloud. Vendors should be able to support either approach and let people or organisations decide whether they want a private cloud or public cloud. Oracle has invested significantly in designing and shipping new hardware with tightly integrated software, which we call Exadata and Exalogic, for this marketplace. Using these platforms, you can blend your data warehouse, your OLTP data or your spatial data and efficiently perform all the required processing. Therefore the different processing needs are met by the same secure and scalable platform.

We can process all these data types in the same box. This requires some special hardware analysis that we have implemented and the offloading of some processing into the disc controllers. For example, a key topic for the geospatial community is raster data processing. Raster is huge; but part of these boxes’ capability is flash memory. In addition to classic real memory, or the main memory of users’ computer, we can put five terabytes of flash memory in one of these boxes in addition to rotating disk storage.

This supports image processing and all related analysis with real time speed rather than getting blocked on an I/O channel. The same is with the virtualisation software: one can productively use virtualisation among these machines. These are among the things we have designed for this market. More is in the offing, for example ways to analyse improvements, doing real time analytics and processing data and feeding it to any other target database. The key point is that it is a standards-based, secure, modular, scalable platform for deployment either as a private cloud or public cloud.

Is there a limit to the amount of data we can handle?
It is always theoretical that there is such a limit. Right now we can already do petabytes and exabytes. We will be able to handle the next big thing that comes along. There are no limits in our system that anyone in the real world is approaching at this time. We already have some petabyte databases in progress.

Looking at it from an application point of view, does it make sense to have this much of data? Since you are talking in terms of real time, one could also have real time filtering of data…
Well, there are two answers to that. One is on the real time side – it’s processing your stream on the way in and filtering out extraneous details. This is where technologies like the Coherence Grid or Hadoop come into play: where using thousand of commodity servers in parallel, users reduce the data using the criteria – ‘this is what I want, and I may also load the results into the Oracle database for further analysis.’ That is being built to handle the arrival rate at the sensor / collector access point. We also have features like automated storage management (ASM) and partitioning, within the database, to manage data storage and access. What often happens is that customers load the database regularly, but eventually much of the data becomes outdated and hence no longer relevant to current analytical needs. However, the data must be retained for legal reasons. Partitioning is a capability where users can load new data into partitions and also move other partitions with old data to cheaper slow speed discs or off to tapes or off to some other flavours of archives. They can do this online so that the data does not disappear. But now older data does not slow down their application and it is also on cheaper storage. Users can now load more and more new, or relevant, data into their application with good performance. If they only use classic B-Tree indices with no partitioning, access will get slower. As data volume gets larger and larger, their B-Trees have to divide further and further. This can slow down performance. So we provide ways in which users can divide data and indexes, with partitioning. For example, all the data more than a year or two years old is put onto archives by partitions. While logically it is still part of the database, it does not slow down users’ queries or application performance.

And therefore your data also becomes persistent?
Yes, this is persistent data. Some users just keep on adding arbitrary data, which is not necessarily relevant for their application. This can hamper the application performance. This is exactly the point that Oracle raises; we clearly understand information processes. Our database management system understands how to let users have access to information any time and how to improve its utilisation, yet lower costs. Users want to manage their cost. Besides that, there is also the question of users wanting to filter data by time and space. That is what we have enabled.


Oracle essentially looks at data in general. Do you have anything special for geospatial data?
Geospatial data is essentially another data type. This distinguishes us from any other database system. What this means is that Oracle stores geospatial data as a column in the database, like any other data type and lets users build indices on it. This gives users much faster response if they use location as one of the keys of their data. Location can be the primary key. Oracle not only handles 2D vector data, it handles three dimensional vector data, raster data and topology data. All these different flavours are stored natively in the database. In addition, there are other flavours of indexes. For example, if you are managing roads, utilities, etc., there is a network data type where the base relationship is between nodes and links, so that one can do a variety of connectivity and path analysis in the physical world using this model. Power companies, utilities and government agencies, such as transportation departments, do this kind of analysis on large volumes of data. But because this is a native part of Oracle, it scales well. In addition, with Oracle’s Automated Storage Management system, users can add more storage online. We’ll automatically restripe the data and users will never see any outages. The indices are remapped accordingly, so the performance remains excellent. The largest spatial installations worldwide are already using Oracle Spatial because Oracle is the only system that can scale to petabytes. We have a lot of customers in the government areas – all the other data was already there, they just needed to add location. With Oracle, they can automatically add spatial capabilities to their information systems and business processes.

In essence, you define different spatial objects?
Yes, just as you have numbers, characters, date and time, you also have point, area, lines, rasters and networks as object types. Advancements in database platforms, such as exadata, and the cloud also add to users’ geospatial capabilities. Users do not have take a GIS application and extensively modify it to use the cloud. Oracle users are just utilising what is already a part of that infrastructure, in the database machine and in the cloud.

So you also have the corresponding applications, or rather processes, also built up?
We have analytic functions for doing cluster analysis, interpolation, proximity and connectedness computations as part of the database engine. These are available, and used, across the Oracle platform – that is, in its application development and deployment tools and infrastructure and its enterprise applications.

A large chunk of people are of the view that since most of the security leaks really happen from inside, does it make sense to have a private cloud?
No one will ever stop the rogue employee who, for example, gets into trouble outside of work and is blackmailed and steals things. There are people (police etc) who address this situation. However, within a private cloud you can still enforce all of your security roles and processes to minimise opportunities for compromising your data. Some of the leaks depend on the clearance level of the person. Other leaks can involve an employee who tries to hack into the database and get information. Oracle provides built-in security features to mitigate risk in these various scenarios.

For example, we have a technology called Data Vault. Within Data Vault, the entire database is secured with encryption keys. So, unless the database administrator (DBA) is allowed to access that specific data, he/she cannot see it. So, basically even a rogue DBA can’t get to the sensitive data. We have what we call “rings of the protection” within the data that even stops DBAs from getting access. You have to be someone who has proper clearance. This is partly for military security but also for normal enterprises. Human resource people don’t want an employee finding others’ salary and/or medical conditions. If the CEO has clearance, he/she can access all data. We have a large security group whose core job is to make the database more and more secure – not only the data but also any access to it.

Security is a major investment for us because many of our customers are intelligence agencies around the world. They regularly meet with us and express their requirements.

As part of our internal training, everyone in engineering must undergo security training, regularly and repeatedly, on the subjects of developing secure code, understanding how people hack into systems etc. Information security is a primary objective of the company and regular, up-to-date training reinforces the commitment and capability of all employees in meeting this goal.