Data analysis: The Big Data Explosion

Data analysis: The Big Data Explosion


Big Data is real and is here to stay. The challenge before the armed forces is to develop tools that enable extraction of relevant information from the data for mission planning and intelligence gathering. And for that, armed forces require data scientists like never before.

Big Data describes a massive volume of both structured and unstructured data. This data is so large that it is difficult to process using traditional database and software techniques. While the term refers to the volume of data, it includes technology, tools and processes required to handle the large amounts of data and storage facilities. When dealing with larger datasets, organisations face difficulties in being able to create, manipulate and manage Big Data. Big Data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyse massive datasets. The amount of available digital data at the global level grew from 150 exabytes in 2005 to 1200 exabytes in 2010. It is expected to increase at a rate of 40 per cent annually in the next few years, which is several times the rate of growth of the world’s population. At this rate of growth, digital data is expected to increase more than 40 times from now till 2020, doubling every two years approximately. The data from different sources, for example, web, sales, social media, mobile data and so on is typically loosely structured often incomplete and inaccessible. It can also be called enterprise big data.

ALSO READ: Mansour Raad Discusses Geospatial Big Data: The Next Big Trend in Analytics

Analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, that is, increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Most of the industry continues to use this ‘3Vs’ model for describing Big Data. Though as per the Gartner’s updated definition, “Big Data is high volume, high velocity, and/ or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimisation.” Additionally, a new V, ‘Value’ is added by a few organisations to describe it.

Why This Sudden Noise?
From the early days of digitisation, data was always big in terms of volume, processing speed and storage. For an organisation, datasets in terabytes was big as compared to today’s datasets in the order of petabytes or exabytes. The challenge lies in its capturing, curation, storage, search, sharing, transfer, analysis and visualisation. The trend to larger datasets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total stock of data, allowing correlations to be found to spot trends, determine quality of research, predict future course, prevent disasters and control business processes by extracting value in real-time. In other words, Big Data spurs big decisions through smart analytics. Another factor contributing towards the data revolution is a fundamental shift in the data generation pattern where vendors/ service providers were the source of data and clients were the consumers of data. With an exponential growth in the number of mobile devices connected to an internet and almost all types of services moving on to digital platform being accessed through smartphones, clients and the vendors, both have become the source as well as the consumer of digital data.

GIS applications or services are going to be a major game changer in Big Data analytics with location services assisted or coupled with GIS images/data becoming order of the day in almost all spheres of modern world activities.

Big Data Characteristics

Types of data
Relational data (tables/ transaction/ legacy data), text data (web), semi-structured data (XML), graph data – social network, semantic web (RDF), streaming data, big volume with simple (SQL) analytics or with complex (non-SQL) analytics, big velocity, big variety – large number of diverse data sources to integrate.

These big datasets, mostly unstructured require advanced tools, software and systems to capture, store, manage and analyse them, all in a timeframe that preserves the intrinsic value of the data.

Architecture and Technologies
New SQL, MapReduce and Hadoop are the database platforms becoming increasingly popular. The architecture of Government of India’s ambitious project Aadhaar is based on Hadoop handling 200 trillion biometric matches per day, 2PB of raw data stored, 100 million authentication requests per day, terabyte scale data warehouse of 200 million records, 50 million messages per day, 100 million database transactions per day. The technology spectrum to handle massive parallel processing, streaming data reads, data locality computing, low latency reads, data integrity and challenges of dealing with distributed data include Hadoop stack: HDFS, HBase, Hive, MySQL, SEDA, Search: MongoDB, sharded Solutions, Compute Grid: Spring, GridGain, Monitoring: Custom built, Nagios.

Challenge lies in building a biometric database of 1.2 billion people, support for multi-lingual applications, deployment challenges of reaching out to everyone in the country involving 27,000 installations till date and logistics required to manage enrolments, letter delivery, online authentication and financial transactions in the order of millions.

Cloud Computing and Big Data Alliance
With the growth of the cloud, organisations of all sizes and industries are producing more data than ever, even up to terabytes per second. Hidden in this data are insights with potential business value. The challenge lies in organising and analysing the data to create new business strategies and make organisational decisions. Big Data, data structure, databases, data mining/ warehousing technologies and those of the Cloud need to sync with each other and move in the same direction. The cloud can be a powerful and cost effective way to deliver capabilities around Big Data analytics and Big Data management. Without the right tools and architecture, organisations won’t be able to effectively use the information it has collected.

Big Data Analytics
Big Data analytics can be done with the software tools commonly used as part of advanced analytics disciplines such as predictive analytics and data mining. But the unstructured data sources used for Big Data analytics may not fit in traditional data warehouses. Traditional relational databases can’t handle semi-structured, unstructured and highly variable data the way open source and other alternatives can. A new class of Big Data technology has emerged and is being used in many Big Data analytics environments. These technologies associated include databases, Hadoop and MapReduce. These technologies form the core of an open source software framework that supports the processing of large datasets across clustered systems.

Big Data and the Armed forces
The technological change is reshaping the combat systems and defence networks strategy necessitating the armed forces to develop a close, symbiotic relationship with emerging technology and industrial innovation and embrace them to harness the potential of Big Data. With advances in the sensor based smart combat system, the armed forces are at forefront in driving the hardware and software of the future. The fully integrated decision support system or NCW platform has to shift to Big Data architecture and analytics on its own dedicated cloud. Though, developed nations are leading the charge, developing nations can’t be classified into those who missed the bus. They have the advantage of wait and watch and pick the best policy based on their economic considerations and priorities with respect to quantum of legacy systems in their inventory.

The typical Big Data issues of a decision support system encompasses managing and controlling levels of operations, transactional data, historical or point-in-time, optimised data for inquiry rather than updation, loosely defined or ad-hoc use of the system, used by commanders and analysts to understand the tactical scenario in real-time and make smart judgments. For military, the amorphous term ‘big data’ refers to everything from signals intelligence, mobile phone and electronic warfare interceptions to satellite images, video imagery, etc. The proliferation of sensors has helped fuel the Big Data glut.

The challenge is the creation of software tools which analyse and develop data that military users can exploit to extract relevant information for mission planning and intelligence gathering. To get a handle on Big Data requires striking a balance between growing military requirements and shrinking budgets. Another challenge for defence intelligence analysts is that open source data doesn’t follow a predictable pattern. Digging deep into the mountain of data isn’t useful in combat. Armed forces require data scientists like never before. Military collects too much of data even during peace time. The infatuation with unmanned vehicles and the sensors mounted onto them has spurred a wave of data collected on the battlefield. Hence, there is a need to focus on its processing, exploitation and dissemination.

More than 1.2 billion broadband connections are there as per a survey and 80 percent of them do not have net banking access. Approx 20-30 percent of the global population can be termed as data users. By 2020, this figure is expected to cross 50 percent the way m-commerce is exploding.

Big Data is real and is here to stay, no nation can afford to close its eyes and ignore this phenomenon. India, with its vast pool of IT talent and industries, is placed at right place, right time to join the bandwagon of ‘The Big Data platform,’ only the right intent is required.

References :

  • Wikipedia, GoogleImages, McKinsey Quarterly (Oct 2011)
  •– Catalog
  • uid_doc_3001
  • Michael Stonebraker – What does ‘Big Data’ Mean and Who Will Win?