Systems R&D Engineer
IWS Cloud Platform Group, IPG R&D Lab, Hewlett-Packard. India
Prof. Arup Dasgupta
Managing Editor, Geospatial World
Geospatial technology has a symbiotic relationship with computer science and this is to be expected. Modern geospatial systems arose out of the need to automate repetitive processes and computer technology provided the necessary means. Almost every aspect of geospatial systems, from data acquisition to information dissemination, makes use of the latest in electronics, communications and computer systems and indeed adds to them through novel systems like LIDAR. As advances are made in the core technologies, they are picked up, adopted and adapted for use in the geospatial world.
The ICT world itself is undergoing change and this change must necessarily get reflected in the geospatial world. Computation systems have grown from massive mainframes to massively parallel systems and clusters, to distributed systems, grid and now the cloud. Cloud computing is the buzzword today in the IT world. The most appropriate definition of cloud computing is provided by Borko Furht of Florida Atlantic University, who defines it as "a new style of computing in which dynamically scalable and often virtualised resources are provided as a service over the Internet."
Alexander Lenk, FZI University of Karlsruhe, provides another definition as, "an emerging model in support of 'Everything-as-a-Service', XaaS, where virtualised physical resources, virtualised infrastructure, as well as virtualised middleware platform and business applications are being provided and consumed as services in the cloud." Bastian Baranski of con terra GmbH, describes that from a provider perspective, the key aspect of the cloud is the ability to dynamically scale and provide computational power, storage, and other applications, even complete infrastructure in a cost efficient and secure way over the internet. From a client perspective, the key aspect of a cloud is the ability to access the cloud facilities on-demand without having to manage the underlying infrastructure and deal with the related investment and maintenance costs.
Different people define cloud computing differently but there is at least a common understanding that a layered architecture exists, very much similar to the OSI reference model where the lower layer provides services to its higher layer. However, unlike OSI reference model the number of layers are not fixed and the foundation is versatile hardware. According to Steven Hagan, VP, Product Development, Oracle, "There will be cloud platform capable software but underneath the cloud, there must be hardware that actually is fast enough and scales well enough to support it. These may look invisible but underneath there must be a solid foundation."
The reference stack defined by Alexander Lenk, shown in Figure 1 (adapted from Alexander Lenk, et al, CLOUD'09, May 23, 2009, Vancouver, Canada), is interesting because it maps each layer to the enabler technology and provides examples. According to this stack, the lowest layer comprises of the hardware, on top of that comes the software platform and on top of which is built the software layer. All these layers expose their functionality to the layer above as a service via well defined APIs. These layers are called, starting from the lowermost layer, Infrastructure-as-a- Service (IaaS), Platform-as-a-Service (PaaS) and Software- as-a-Service (SaaS). Alexander's stack also includes an additional layer of Human-as-a-Service (HuaaS) which is an interesting concept because crowd sourcing is becoming a very important distributed and ad hoc data acquisition aspect. Data is one of the most important components of this stack and providing data as service has a huge potential, especially to the geospatial world.
Figure 1: The reference stack
(Courtesy: Alexander Lenk, et al, CLOUD'09, May 23, 2009, Vancouver, Canada)
At the lowest layer is the hardware layer comprising of servers, network devices, storage arrays etc. This layer is vendor dependent and is generally not exposed directly to the outside world, but exposed via a virtual resource layer built on top of it. The virtual resource layer is also part of IAAS layer and masks the actual hardware layer using hypervisor technologies.
The layer on top of IAAS layer is the PaaS layer. This layer generally comprises of development and execution platforms for which development tools such as SDK (software development kit) are hosted on the cloud and are accessible over the Internet through a browser. The SDK could be as simple as a small set of REST APIs, such as the Twitter REST API, or a full SDK set like Google App Engine SDK.
All the software applications that are hosted either on a PAAS layer or directly on top of IAAS layer fall into this layer. Here the application software providing specific functionality to the cloud user is made accessible over the Web through a browser. Google Maps, Google Docs, Microsoft Office Live are examples of such services. Enterprise applications like CRM, HRM and Content Management are provided over the internet through a browser. The cloud user only need an internet connection and a web browser to use these services and can pay only for its usage.
Data as a Service is less talked about compared to the above three layers. As many vendors are now offering cloud based services in one or all of these layers and since the research community is keen to utilise the potential of the cloud, the importance of data availability as a service is of great interest, especially for geographical data. The concept of DAAS is to provide valuable data as a service over the internet on a pay per use basis.
ADVANTAGE OF CLOUD TO GEOSPATIAL
Geospatial systems are a class of information systems and any development that brings benefits to general information systems in principle also benefit geospatial systems. There could be special needs of geospatial systems, as geographic data can be complex in nature depending on what it represents, for example radar backscatter from SAR or point clouds from LiDAR. In general, each layer of the cloud has special significance for geospatial systems.
How DaaS benefits geospatial
The cost of geographic data collected by government agencies and private industries is very high. Also, the temporal aspect of data is critical in terms of trend analysis. Efficient reuse of data is essential to prevent duplication and ambiguity and this is achieved through DaaS. In DaaS model, a user can access but cannot download data. Though DaaS does not trivialise traditional data delivery, the major benefit comes from the pay per use model and the freedom from maintaining and securing a massive data archive. Google Maps is a classic example of DaaS. Using REST APIs, users can use Google Maps data and build tools around it.
How SaaS benefits geospatial
According to Bastian Baranski, spatial data infrastructure (SDI) is mostly focused on data retrieval and data visualisation on a desktop. The migration of data processing from desktop applications to a distributed environment is the next step. OGC started a movement for constructing Web based distributed geospatial applications, but until cloud computing gained momentum, the industry support for these remained low. However, companies such as Esri and 52° North have started offering cloud based applications. Esri's ArcGIS is now available in an online version, on ArcGIS.com.
How PaaS benefits geospatial
Initiatives taken by OGC have resulted in the development of many open source and some commercial Web map services, feature services and coverage services. These are server software and so need hosting or deployment environments. They are generally deployed on high-end servers but such deployments often face difficulty of scaling to accommodate a growing user base. Cloud computing can provide a highly scalable hosting environment for such services. Companies such as SkyGone provide cloud based hosting for popular technologies such as ArcGIS Server, MapService and GeoServer. There is another example of PaaS that also emphasises the importance of DaaS layer in the cloud computing stack. Google Map Maker, Open Street Map and Google recent announcement of Google Earth Builder are classic examples of PaaS where the development tools are for data creation and editing and not only for creating applications.
How IaaS benefits geospatial
Huge data sets, greater processing capability, large user base and unpredictable Web traffic could be some of the reasons why geospatial systems are not easily be made available over the web. But with cloud computing promising dynamically scalable architecture many of these services can be hosted on the Cloud infrastructure. Esri hosting its ArcGIS on Amazon’s EC2 cloud computing infrastructure is a good example of this.
SHARING OF DATA AND SERVICES
The underlying philosophy behind the cloud is the sharing of data and services. There are two approaches to this from users' view: public and private clouds. In a public cloud, the provider has total control of data and services; users can access data and services for realising their applications. The control on data and services and the right to access are decided by the provider. Public clouds are usually off-premises and run by third parties. Google Maps is perhaps the most visible example of such a public cloud.
However, there are many instances where organisation needs to be on the cloud but have security and reliability issues. In such cases, private clouds are the answer. A private cloud is on-premises and run by a wing of the organisation. It is accessible to users within that organisation. Many people criticise the concept of a private cloud because it provides for none of the benefits of cloud architecture except Web access through virtualisation. For community clouds, several related organisations can get together and create a cloud infrastructure for their own use; a typical system being Google's Apps.gov for US government departments.
There can be instances where an organisation may need access to some services in a 'public' manner while retaining its control over mission-critical data. This requirement gives rise to the hybrid cloud where a part is off-premises and public and the rest is on-premises and restricted. Many vendors are providing this solution as the best approach towards sharing data and services while retaining control over its own confidential data. There are issues here of compatibility between the onpremises systems and off-premises systems because no standards have been developed for such interoperability. Therefore there are possibilities of vendor lock-in.
Gartner Global IT Council for Cloud Services has outlined the rights and responsibilities for cloud computing services which address many of the issues pertaining to sharing and security. The main points are:
- The right to retain ownership, use and control one's own data
- The right to service-level agreements that address liabilities, remediation and business outcomes
- The right to notification and choice about changes that affect the service consumers' business processes
- The right to understand the technical limitations or requirements of the service up front
- The right to understand the legal requirements of jurisdictions in which the provider operates
- The right to know what security processes the provider follows
- The responsibility to understand and adhere to software license requirements
It is felt that a clear understanding of these rights and responsibilities will help the service providers and consumers to work together more productively.
Esri offers its ArcGIS as a Web based cloud service. Users can access maps including imagery, topography and street based maps and tasks such as geocoding. ArcGIS server can now be deployed on the Amazon EC2 enabling developers to publish custom GIS mapping application. Software such as ArcLogistics and Business Analyst Online (BAO) are available as SaaS. Esri has also announced the availability of ArcGIS Mobile in the cloud. Soon field staff, business professionals, and consumers will be able to access GIS capabilities and data on the move.
GIS Cloud is a new company founded in 2008, dealing with development and implementation of cloud powered GIS. It provides full desktop GIS features in a browser enriched with new Web capabilities.
Skygone is a large group that provides services and consultancy for deploying and hosting many known GIS applications on to the cloud. Skygon has created an open App Store powered by Skygon cloud where one can select applications such as GeoServer, MapServer, ERDAS Apollo and the type of Virtual Machine (called an instance) such as Slim, Small, Medium, Large and Extra Large, where Slim is a single CPU, 1 GB RAM node and Extra Large is 8 CPU, 32GB node. Charges are on a per hour/month basis.
European Union's Environmental Map Service
The European Union's (EU) European Environment Agency (EEA) is working closely with Esri to improve the agency's cloud environment map services. According to Guenther Pichler of Esri, EEA and Esri will work tightly to develop cloud architecture that serves EEA initiatives and European Union directives. Data sharing will be in line with the principles of INSPIRE and SEIS and a collaborative plan that supports the Eye on Earth initiative will be followed.
Crowdmap, Arc2Earth and GeoIQ Connect
Ushahidi, a not-for-profit technology company, announced Crowdmap, a hosted service providing Ushahidi Crisis Mapping software out of the box with nothing to install.
Arc2Earth is an innovative technology company focused on the interface of GIS and the Web. It tries to bridge the gap between ArcGIS and Google Earth. Arc2Earth Cloud Services provides an alternative for deploying GIS applications to the Web without the hassles and costs of setting up and configuring a GIS server. An Arc2Earth cloud service instance uses Google App Engine as a cloud platform and had defined REST API for Datasource, Tileset and Static Map. These REST APIs can be used to interact with Arc2Earth service deployed on to the Google cloud.
GeoIQ Connect is a geospatial product connecting enterprise internal data with external data stored in the cloud. GeoIQ Connect fuses multiple data sources into one interface simplifying the access, analysis and visualisation of geospatial data. GeoIQ Connect supports traditional relational databases like Oracle (spatial and nonspatial), PostgreSQL and PostGIS, MySQL as well as NoSQL object stores HBase, Hadoop and MongoDB. There are plans of adding support for Riak and Cassandra that all are typically data storage technologies promoted by cloud computing.
Cloud computing is a service consumption and delivery model that can help improving business performance, control costs and ultimately transform business models. Cloud computing can bring opportunities to many, ranging from business that consumes IT infrastructure, to providers of such infrastructure, general users and government as well.
Cloud and IT service providers
For an information service provider, cloud computing brings plenty of opportunities, according to Sadgopan, Director, IIIT Bangalore. Over time, cloud technologies will enable IT service providers to deliver end-to-end services regardless of the platforms, applications and technologies involved. The competition and need for continuous improvement will increase for cloud service providers as buyers will move away from long-term contracts and prefer shorter-term contracts with more flexibility to quickly buy new services. It also brings opportunities for infrastructure manufacturing companies such as HP, IBM and Cisco who will need to bring out new product lines for servers, data storage technologies, network technologies that can efficiently and quickly be used in building cloud platforms and also meet the needs of scalability, multi-tenancy, performance and automation.
Geospatial service providers
As we have already seen, many new implementations of geospatial services and Web GIS are being hosted on various cloud platforms, benefiting geospatial service and data providers. Cloud computing enables anyone with an Internet connectivity and a Web browser using any device such as desktop, a laptop, a tablet or a Web-enabled mobile phone to access geospatial services on the cloud, thus increasing the reach of the provider. This will enable providers to capitalise on economies of scale at the same time decreasing operational and maintenance cost of IT infrastructure and IT processes.
Data is the most important aspect of geospatial systems, and data collection is an expensive and time consuming process. Geospatial data providers can take advantage of crowd sourcing since they can increase their reach using cloud computing. Geospatial service on the cloud can provide easy Web based, no-installationrequired tools for editing geospatial data and maps using which a common user can add more data layers. This is what Google Map Maker and Open Street Map are doing. However, this raises control and quality concerns about geospatial data and maps and opens new opportunities for innovation.
Governments have taken to cloud in many countries like the USA, Canada, Japan and the UK. While there are some practical implementations like the General Services Administration in the US, many are in exploratory stages like the CIO Council in the UK. Swisstopo, the Swiss Federal Office of Topography, is responsible for Switzerland's geographical reference data and all associated products. The Federal Spatial Data Infrastructure (FSDI) of the Federal Coordination Centre for Geographical Information, a division of Swisstopo, operates a server park, which consists of more than 50 servers, the majority of which exist in the Amazon Web Services cloud. Swisstopo now operates a significant part of the FSDI integration and production environments on Amazon EC2, while test environments run on-premises on the Swisstopo intranet.
Users of the cloud computing technology, enterprises and organizations that would like to take advantage of this disruptive technology, constitutes the industry segment. Undoubtedly there are clear benefits of using cloud and some of these are:
This is perhaps the most talked about benefit; however it is actually relative to the size of the business. For big businesses that require huge IT infrastructure or providers who service millions of users, cloud computing reduces operational and maintenance costs. For small or medium size businesses which are growing rapidly, cloud computing can reduce the marginal cost of adding new functionality and adding more infrastructures. In dynamic situations, these can be reduced if the business needs go down or a particular service offering does not meet the market requirements. However, for very small businesses, the effectiveness is uncertain because if typically only one or two servers are needed to run the service and if it is not growing fast, it is easy to maintain it locally rather than in cloud.
User demands are increasing by the day and it is difficult to keep up with the demand. With cloud, development tools are simple and the interaction is REST based; the burden of deployment is offloaded to cloud platforms resulting is lesser time to deliver more software as service. Because of easy programming tools, a common user with very little knowledge of programming is able to develop applications and put them on the Web.
With Web, the service consumption pattern becomes unpredictable. If sufficient computational resources are not available in times of high demand, it can result in service unavailability or poor customer satisfaction. Acquiring more IT infrastructure can be costly and computational resources will remain idle when load is not high leading to high overhead costs. The dynamic scaling aspect of cloud computing that allows one to add any number of virtual resources in times of high demand and release them under normal conditions puts the cloud ahead of conventional technologies.
Instant and automated
Whenever one requires a computational resource, typically a computer node, it is just sufficient to specify the power and OS and software needed and the computer node is ready from the moment the payment is made. Since the infrastructure and platform are part of cloud, the burden related to deployment, versioning and software upgrades is automatically taken care of by the cloud provider. This automation is one of the biggest attractions of cloud computing.
Cloud computing provides solution to many problems in the areas of distributed computing and utility computing. It also opens up new research opportunities for both academia and industry. The Center for Intelligent Spatial Computing, George Mason University recently completed a research project on "Geoprocessing services and standalone client to utilise the cloud computing and WPS services to support Earth and Geography Science communities" funded by the Federation of Earth Science Information Partnership (ESIP). The Center has developed GEOSS clearinghouse, a metadata catalogue system, which is deployed and maintained on Amazon's EC2 cloud.
Marlon Pierce and group at Community Grids Lab, Pervasive Technology Institute at Indiana University have conducted a project called Flood Grid that facilitates and improves flood planning, forecasting, damage assessment, and emergency responses using grid computing which was later shifted to cloud computing.
Byron Ludwig and Serena Coetzee at the Department of Computer Science, University of Pretoria performed a comparison of differences between PaaS clouds keeping Geo-processing service as a requirement. Theodor Foerster and his group at Institute of Geoinformatics, University of Muenster have used 52° North Web Processing service with Open Nebula for building a private Cloud which often outsources some processing to Amazon EC2 as a public cloud.
YANG Jinnan and WU Sheng of Spatial Information Research Center, Fuzhou University have also studied possible application of cloud computing techniques in GIS and listed the challenges. Naphtali Rishe's group has applied Map-Reduce to two important problems in spatial databases, bulk-construction of R-Tree indexes and aerial image quality computation and have confirmed that MapReduce framework shows excellent scalability when applied to parallelisable problems. A group at State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University has developed geoprocessing service that integrates geoprocessing functions and Microsoft Cloud Computing technologies to provide geoprocessing capabilities in a distributed environment.
Borko Furht has identified the following challenges in cloud computing. Let us discuss these challenges in terms of geospatial cloud computing.
For normal applications or even mid size applications, cloud computing can be very efficient but with geospatial data, which is typically in terabytes, processing can often prove to be inefficient on the cloud. Virtual machines also have overheads, compared to applications which run directly on the hardware utilising low level instructions set. Large datasets can be processed by distributed data processing framework such as MapReduce, though it is not suitable for all kinds of datasets. If the result of the processing is also in terabytes or gigabytes, a user located at a long distance from the cloud provider may experience high latency and delays.
Security & Privacy
Security remains the major concern for cloud user companies. Users are worried about vulnerability attacks, when information and valuable data is outside the company firewall. Even though security challenges are there, cloud computing is fairly secure, in some cases more secure than local environment. Chris Matthews and Yvonne Coady explain that the isolations provided by virtualisation have security benefits. In case of attack on the system, virtualisation narrows the attack area to the explicitly exposed and shared resources. Treating the virtualised system as a black box, assertions can be made about the system interaction.
Privacy is an even bigger concern for cloud users as valuable data is residing at an unknown place. The concern is that cloud infrastructure is shared between organizations that are off-premise. However, there are two false worries about privacy. One, just because data is not in your premises, it does not automatically become public. No one can enter a data centre, log into a machine and look at the data base; in fact, this is more likely to happen in a small setup rather than in humongous data centres. Also, just because shared hardware is used it does not mean that privacy is compromised. Strong isolation of virtual machines ensures privacy of the content.
This could be a major concern to geospatial industry as data is collected and owned by a particular organisation at a huge cost. When data is put onto the cloud, there is less organisational control as the cloud infrastructure is not designed to meet specific organisational needs. To have a better control, organisation will have to build tools around its data and provide access via these tools.
By putting data and services onto the cloud, the ownership, operational and maintenance costs are reduced. But if dealing with larger data, the network bandwidth costs can go high.
Reliability is a concern, both in terms of data and service availability. There are outages and service can become unavailable for few hours. This is illustrated by a recent case where the outage struck the EC2 service at Amazon's northern Virginia site affecting a number of websites. Also, it is hard to rely on the data and information available on the cloud if its source is not known.
Prognosis for the future
The cloud is the next disruptive technology to impact the geospatial world. It is resulting in paradigm shifts in the way we interact with data, software and platforms. As data becomes more versatile and therefore costly to buy and archive, data providers will need to relook at their marketing strategies and models. Similarly, software developers and value adders will need to look at their marketing models. Reliability and security will remain issues and strategies to ensure reliability and security is to be revisited. At the regulatory level, a total re-look at policies is needed to take into account the new ways of acquiring and processing geospatial data and disseminating the results. Finally, individual as a data sensor and individual as a consumer of geospatial services will become more prominent turning geospatial into a consumer item.