Whether for functional need, budgetary alignment, or due to top-down pressure, all companies will benefit by executing parts of their workflow in the public cloud at some level. For an organization with less than, say, 50 terabytes of data to manage, it’s easy to move everything there. This is not the case for organizations in the geospatial industry.
With GIS systems consuming and processing petabytes of data, this is challenging and unrealistic. The business value of public cloud infrastructure is desirable, but when there are such large volumes of data, it’s hard to get there. “Lift and shift” strategies to mimic on-site infrastructure in the cloud are not often viable when petabytes of data are involved, and many businesses need to keep at least some data on the premises. Luckily the utilization of public and private infrastructure does not have to be an either/or decision.
Figure 1: The business dynamics of public infrastructure are desirable, but with so much data to manage, it’s hard to figure out how to get there.
Fortunately, you can realize many of the business benefits of the public cloud in your own data centers. Elimination of silos, data that’s globally accessible, and pay-as-you-grow pricing models are all possible on-premises, behind your firewall. The “hybrid cloud” approach is not simply having some apps running in your data center and other apps running in Amazon or Google. Workflows do not have to wholly reside within either private or public infrastructure – a single workflow can take advantage of both. True hybrid cloud is when public and private resources can be utilized whenever it’s best for the application or process.
Here are four key steps to accelerate your journey to the cloud.
Step 1: Go Cloud-Native
Storage and its large size is the primary inhibitor preventing movement towards the public cloud and cloud architectures in general. Data has been classically siloed – stuck in separate repositories – and locked down by specific access methods required by specific applications. This makes it impossible, or at least extremely expensive, to effectively manage, protect, share, or analyze data.
“Classic” applications use older protocols to access data, while newer cloud-native applications use unique interfaces. Converting everything to cloud-native format will save much time, money, and headache in the long run. This does not have to be a massive project; you can start small and progress over time to phase out last generation’s technology.
Figure 2: Start on your journey to the cloud by leveraging cloud-native storage on-premises.
Once you’re cloud-native, not only is your data ready to take advantage of public cloud resources, but you immediately start seeing benefits in your own environment.
Step 2: Go According to Policy
Figure 3: Use policies to place data where it’s needed, across private and public cloud.
On-premises data on cloud-native storage can be easily replicated to the public cloud in a format all your applications and users can work with. But remember, we’re talking about hundreds of terabytes or more, with each data set having different value and usability.
Data management policies in the form of rules help decide where data should be placed based on the applications and users that need it – parts of your workflow behind your firewall and other parts in the public cloud. For example, you may be working with large image-centric applications, but would like to take advantage of the massive, on-demand processing resources in Google Cloud Platform for analysis of a LiDAR image instead of local hardware. Set a policy in your cloud storage software to replicate that image file to the public cloud, then let Google do all the work, and set a policy that says move the file assets back down when complete for the next step in the flow. You will need cluster nodes in multiple geographic regions to protect and location-optimize data.
Don’t worry – the cloud data management software “views” the entire infrastructure as a single pool, universally accessible, regardless of the backing storage hardware or location.
Step 3: Go Cloud to Cloud
Policies help automate and orchestrate services to your applications based on business requirements (e.g. cost, capacity, performance, and security), according to the different capabilities of your on-premise or cloud resources. This also means data is efficiently discoverable and accessible across multiple clouds – the cloud data management platform considers the differences in services provided by the different clouds and moves or copies data to the right one.
As an example, data may be machine-generated in in the field, but the analysis can occur in Google Cloud platform. Then, the data is synched to Amazon Web Services when the results are ready to be shared to outside parties.
Step 4: Go Deep
When data placement policies enable a true hybrid cloud workflow, not constrained by physical infrastructure, you can unlock more capabilities. You can start to use metadata – the data about the data – as what we call the organizing principle. Cloud-native data holds its own metadata right alongside it, not in a separate database only its own specific application can read. Your metadata can now be globally indexed and made available to many applications and groups of users. This also allows you to perform large-scale intelligence projects. It’s fair to say agencies with big datasets use them in a more rigorous way in terms of intelligence than a commercial enterprise would – metadata is key to these functions, and key to being able to find very specific information in a large dataset.
Whether you like it or not, you will be in the cloud in some capacity. Follow these steps to not only make the transition to public infrastructure hassle-free, but to bring many of the business dynamics of cloud – pricing based on consumption, massive scalability, collaboration, etc. – into your on-premises datacenter and increase the value of your data.