If geospatial systems have to remain relevant in a fast-changing world, then data sources that go beyond imagery and maps must become a part of the analysts’ armory. Big Data, its analytics in the Cloud and the final Internet of Things are what the future holds
Q: How many big data scientists does it take to screw in a lightbulb?
A: Just a minute. Let me run the algorithm.
Fast-changing, human-driven events like expansion of cities and creation of assets for transportation are very vulnerable to old data. Any person who has been misled by car navigation systems can testify to the fact that the digital road network on their device is often out of date and does not show new features. The need of the hour therefore is for speed of data delivery and crunching. Where does this data come from and how can it be used in real-time or near-real-time for decision making?
Natural resources as well as social, political and economic activities have a strong bearing on the outcome of projects such as growth of cities, building of infrastructure and even a farmer’s decision to plant a specific crop. If geospatial systems have to remain relevant in a fast-changing world, then data sources that go beyond imagery and maps must become a part of the analyst’s armory.
Enter the world of Big Data, Big Data Analytics and Internet of Things.
More data is not always more intelligent data
“The rate at which we are generating data is rapidly outpacing our ability to analyze it,” says Dr. Patrick Wolfe, Data Scientist at the University College of London. “The trick here is to turn these massive data streams from a liability into a strength.” The extent to which we are missing extraordinarily valuable data analytic opportunities is incredible: right now, only 0.5% of our information is analyzed. We have more data, but it is not always more intelligent data. Part of the problem with Big Data is that it is not valuable until it is understood. “You have to start with a question and not with the data,” stresses Andreas Weigend, Lecturer at UC Berkeley. “The fact that data gets collected is a good thing,” he adds, but what we really need is to figure out what problems we can solve with it.
The promise of Big Data is exciting. Big Data improves sustainability by reducing power use, and less use of resources also means savings: $200 billion per year, according to one estimate. Chicago and New York City are now being called “smart cities” in the press for integrating Internet of Things (IoT) sensors with analytics to streamline spending and improve infrastructural efficiency.
All technologies are there to solve the world’s problems, which can scale from big to small applications. “There may be problems at the scale of the city’s infrastructure, and to make sure a city works more effectively and efficiently, it might require larger environment monitoring, like floods and climate; or it can focus down on the individual,” says Ed Parsons, Geospatial Technologist, Google.
So how does a technology make your life better? How does it save you a few minutes every day? How does it make you feel a little bit happier in your life dealing with the things that you have to deal with? “We must be driven by user needs saying that — here’s a problem that we can solve and it might make just a small incremental gain but that scaled across everyone on the planet makes a huge difference,” Parsons adds.
“Our world is ever changing and fresh and dynamic applications that are a combination of content, workflow, analytics and experience can be used in any area of application where we need to sense this change,” elaborates Atanu Sinha, Director, Hexagon Geospatial, India & SAARC. Hexagon, for instance, already has Smart M.Apps to analyze green space, road areas, crime incidents, snow cover, forest burn ratio, iron oxide index in rocks, crop health, UAV data processing and so on.
Taner Kodanaz, Director, DigitalGlobe, adds there is a large applicability in economic monitoring, supply chain and logistics fields, commodity trading markets, environmental research and monitoring, the shipping and maritime industry, forestry and agriculture, land management, real estate and real estate investment, and energy markets.
As location intelligence gets more or less relevant across industries, Big Data in terms of consumer-generated data tightly integrated with location data is driving marketing benefits. Advertising and marketing is one big area which benefits from spatial analytics. Tony Boobier, Insurance Leader, EMEA Business Analytics, IBM, UK highlights that weather forecasting uses data from sensors all over the world. Such forecasts can be used in the insurance sector. It can be used for financial services, for understanding the impact of the volatility of assets and liabilities. It can also be used in the retail sector to help understand the pattern of product sales at a particular time of the year.
Geospatial Big Data
Big Data is characterized by five Vs — Volume, Velocity, Variety, Veracity and Value. While volume is easily understood, velocity, variety and veracity as well and value lies in the ability to take fast moving data and convert it into something of value through analytics. Traditional geospatial data, which includes remotely sensed data, is structured and stored for analysis post facto in analytical systems like GIS. However, modern data with useful geospatial content like photos, social media chats, video, voice and messages now constitutes almost 80% of the total data, but in its unstructured form, it cannot be used in conventional analytic systems like GIS because the sheer volume far exceed the data storage capacity available. It also has a high velocity, but its veracity may require curation.
Sinha substantiates this view when he says, “There was always a tussle between advancement and availability of technology in terms of how much and how fast can we capture, curate, manage, search, share, transfer, analyze and visualize versus the sheer amount, complexity and disparity of the available geospatial content.” Even today, despite vast increases in computing speed and storage capacity, it is still true that our capacity to acquire geographic information in orders of magnitude is greater than our capacity to examine, visualize, analyze, or make sense of it. “Today datasets are available from satellites, UAVs, ground-based sensors, smartphones and social media in near-real-time, offering the potential of almost immediate discoveries and predictions. So we can say that there is definitely velocity and variety in the geospatial data itself. However, this is not true for traditional GIS technologies, and hence there is a need to effectively make this data manageable and available,” he adds.
Kodanaz echoes the same sentiment: “Even if one only considered traditional satellite imagery products as solely encompassing geospatial big data (which I do not), the near-term future holds significant potential growth in both variety and velocity from both industry leaders such as DigitalGlobe and new entrants working feverishly to launch their own assets”. He goes on to add that in 2014, Digital Globe alone produced 70 TB of data per day as against 600 TB produced by Facebook. If we add the imagery data produced by other entities and to be produced by new entrants then the total data velocity will be in excess of those produced by social media and other non-traditional sources.
The promise of Big Data is exciting. Big Data improves sustainability by reducing power use, and less use of resources also means savings: $200 billion per year
Apart from this, everything from traditional GIS datasets like roads, terrain maps, places of interest, boundaries, transportation networks, to location information from mobile device movement, to geo-tagged social media content created by users, to UAS/UAV photos/videos created by commercial or private drones, to IoT data from non-stationary devices could additionally be considered part of the geospatial big data family. Even solely examining remote sensing data from satellites, aerial, and UAS/UAV sensors capture a plethora of content every day representing a significant variety of geospatial Big Data.
Parsons supports this view through an example. Google collects peoples’ movements anonymously, and analyzes them to show emerging patterns. For example, if you look at a business in Google Maps like a hotel or a restaurant, Google will show you a little graph of when that business is going to be busy by analyzing the content that people are contributing to detect when that business is busy because of a large number of customers. “It is a simple process but it is analytics scale and I think that is where the geospatial industry can add particular value because we can do these large-scale pieces of analysis viewing things through that sort of geographic lens.”
Ron Bisio, Vice President, Trimble Geospatial, takes a more traditional view: “From the GIS viewpoint, Big Data describes datasets that are so large — both in volume and complexity — that they require advanced tools and skills for management, processing and analysis”.
Bhoopathi Rapolu, Head of Analytics – EMEA, Cyient, UK points out that 80% of corporate data is spatially relevant. “We have been using this data without the spatial context all along, but now that we have enough technology to bring the spatial component, and tightly integrate with the corporate dataset, we can make spatial sense out of it. So, with that we can see that the broader insight is being generated with the spatial element.”
Parsons thinks, “It is about things that change in time and space. It is about geography. Geography is interested in what changes in the world and that is the distribution of things over space and the distribution of those things over space and time.” So the greater that we can get that level of detail and move from being a static viewer of the world to a viewer of the world that has a higher temporal resolution and a higher cadence. “We have heard a lot about the potential of daily satellite coverage. We think about that and the combination of real-time location of people and facilities then that is where the real advances are going to be made. It is going to be that temporal aspect that drives it.”
According to Bisio, “By combining multiple data sets, it is possible to develop 4D models that enable users to view conditions over time.” This approach provides the ability to detect and measure changes and provides important benefits to applications such as construction, earthworks, agriculture and land administration. A fifth dimension, cost, also can be included with spatial information. The resulting model enables users to improve efficiency and cost effectiveness for asset deployment.
To be able to use such data we need real-time or near-real-time engines that analyze the data on the fly to curate the data and establish patterns which are stored and used with conventional geospatial structured data. As an analogy, consider a conventional GIS which applies different analytics on a stored database to realise meaningful reports. With Big Data the stored database consists of analytic modules rather than data. These modules work simultaneously on a variety of data streams and deliver meaningful patterns.
Handling structured Big Data
Sinha thinks maps of future needs to be fresh, portable, dynamic and make sense. While Hexagon provides solutions via their ECW (Enhanced Compression Wavelet) technology in their products Imagine and Geomedia, which can effectively manage the volume of Big Data, their enterprise and Cloud offerings can effectively manage the sheer velocity and variety of Big Data respectively.
We need real-time or near-real-time engines that analyze the data on the fly to curate the data and establish patterns which are stored and used with conventional geospatial structured data
Kodanaz feels communications and storage speed, storage access and Web-based services (APIs) to access the data are all improving at dynamic rates. GPU processing approaches allow users to access areas of interest versus the current model of accessing the entire binary file even if the area of interest represents a small portion of the overall file. These methods, as well as machine learning approaches that operate on the raw data within hours of acquisition, shortening to near-real-time speeds in the coming years, are all being applied to move geospatial big data to real-time or near-real-time access.
Curating unstructured data
Sadly a very common problem with the traditional and these new data sources is quality, or the lack of it.
“That is where validity or veracity of geospatial Big Data comes in, but unfortunately the fourth V (validity) would be a property that geospatial Big Data lacks, which is often undocumented, lacking in metadata, and without clearly identified provenance,” stresses Sinha.
Automated machine learning approaches are used to identify and categorize objects within images as a simplistic example of curation, points out Taner Kodanaz. This is sometimes referred to it as search space reduction or area reduction. Semi-automated methods are used to determine ‘aesthetic’ benefits of an image like cloud cover, image quality, atmospheric distortions, etc. Finally, manual review methods are typically used by analysts to identify the best images to address specific use cases and may include leveraging the automated and semi-automated processes as well.
“There’s a lot of work we need to do on natural language processing, on greater understanding of semantics to try and pull out the meaning from those pieces of social media,” says Parsons. But the social media also represents a more human view of the world. If you are talking on social media, you talk much more in terms of places than spaces. You don’t see coordinates expressed in tweets or in Facebook statuses, you see place names. “A better understanding of how we as humans interact, create place names and define space — that is a really interesting insight and I think a lot of that is driven from this unstructured data. I think we do need it in the systems that we have developed to better reflect how we as humans see the world around us.”
Big Data analytics and IoT
The Internet of Things as another opportunity to streamline operations in many sectors where interaction between machines and machines, (M2M) and machines and humans (M2H) can be improved. A case in point is the concept of a smart city. In such a city sensors can control traffic lights as well as detect traffic jams to alert authorities like the police. Sensors can also alert municipal waste management services when refuse bins become full and need replacement. The technology for such intelligent systems are already available but the adoption is slow because the concept as visualized by the vendors involves connecting all areas of city management to a centralized data infrastructure. Jascha Franklin-Hodge, Boston city’s Chief Information Officer, thinks the movement is overhyped and that more targeted, less centralized IoT big data applications can be more effective.
The interesting intersection of IoT and geospatial Big Data is going to be the ground truth of sensors coupled with the near-real-time modelling of additional visible spectrum data from remote sensing. “In other words, from micro to macro tied together to describe the world in ways never before possible,” feels Kodanaz.
The data size is humongous and as Rapolu puts it that we are not even joining the dots today but creating the dots in bits and pieces to understand the world. As intelligent applications connect up different databases we will see the IoT emerge. “… it’s about connecting the entire intelligent things and then making location sense out of it”.
Boobier takes a contrary view: “I think the individual and organizations and perhaps governments also will play certain restrictions on the amount of information, which is commonly available. Organizations already tend to turn up the security levels around the level of information which is available to employees. We are talking in terms of analytics being democratized. The democratization of information is one of the big ethical questions I think of the big data environment going forward.”
At the end Big Data, Cloud and Internet of Things are all parts of a continuum. It is hard to think about the Internet of things without thinking about the Cloud and it is hard to think about the Cloud without thinking about the analytics.
“It goes without saying that if you are going to have lots and lots of devices creating data, that data is going to exist in the Cloud, and because you have got large volumes of data the only way to analyze those is to use analytical models, identify the current state of the world but then also predict it saying ‘if we see this pattern emerging this is what we can expect to happen’,” sums up Parsons.
Prof. Arup Dasgupta