Big Data: Not Simple Analytics

Big Data: Not Simple Analytics


In this age of big data, data about an individual is collected through a number of ways. But what if these agencies, government or private, misuse this data? Or give away our details to a third party? Are there any laws to protect an individual in India? After all, how safe is our information?

Big data as a term is devoid of a sustainable definition. What seemed like ‘big data’ a decade ago is considered to be data that is surmountable today and this notion of relativity makes it difficult to coin a specific definition for big data. With increasing number of sources to obtain data from, the growing use of the Internet and electronic modes of communication, and an increase in the size of data and capacity to hold it, a definition which comprises a quantified number symbolising Big Data can soon be redundant in nature. Big Data can therefore be said to encompass three Vs – Volume of data collected, Velocity with which it is collected and Variety of the data collected1 – with the quantifiable number defining Big Data changing rapidly.

In India, the concept of Big Data is still in its nascent stage. While the corporations are slowly beginning to understand, both the value of Big Data and the processes involved in making the data useful to it, the Indian Government, one of the largest stakeholders in the data generated in India, remains oblivious of it. Today, India has approximately 150 million Internet users2, 900 million mobile users3, and a domestic big data market that is growing at 83 per cent annually4. With digital information in India poised to grow at 2.3 petabytes in the next decade5, India cannot be ignored as a valuable market for the development of Big Data. Further with a population that is both sizeable and diverse, there exist opportunities to not only create a distinctive database comprising Big Data but also use the product of such data in targeted consumer marketing, innovation and invention in the Indian marketplace, create a sustainable management system, improve data security systems in India and optimise the outputs generated today6.

Big Data can comprise both structured and unstructured data, which when captured, curated, processed and analysed using different analytical methods, helps in identifying non-linear relationships, user habits, predict outcomes, assess behaviour and make recommendations to business models in use. With many multinational corporations in India taking initiatives with respect to Big Data, and over 50 per cent of companies exploring Big Data, it is becoming imperative to acknowledge the part that it will play in the country.

Big Data Collection

The premise behind Big Data is that the government, organisation or company collects as much data as it can through non-connected publicly available sources, as well as directly and indirectly by an individual’s usage, and then sifts through it using advanced analytics to gain insight; whereas the usage of data before the entry of the Big Data model was to acquire directly from the source only as much data as was required, thereby restricting the quantity of data acquired. Big Data therefore today is and can be collected without the express knowledge or consent of the data subject.

How Are Corporations Collecting Data?

In the garb of free services, many companies today ask an individual for personal information to access such services and an individual is only too kind to part with his personal information as a tradeoff to not having to pay for a service. Most people in India do not understand the value of their personal information, and remain oblivious of the many ways such data can be used to create value. In India, while big data is being collected through a number of ways, the analysis of such data still remains embryonic in nature. Further, as a growing economy and being a populous country, any data generated today can be poised as Big Data. While companies are more prudent in collection of data and more specific in their purpose, most Indian companies do not have adequate systems in place which allow them to collect data while keeping up with the privacy and intellectual property concerns that arise when dealing with such a magnitude of data, most of the times, personal information.

Big Data is collected every time you use a website or an application. This data is either collected directly like the details you provide to the company including your name, address, email ids, preferences, location, photographs, bank account details, credit card numbers etc.; or indirectly by the website and applications using technical means like dropping cookies whereby companies collect data but the consumer is in the dark about how exactly such data will be used.

Companies today may also collect data from consumers by use of various devices and technology in the products they use which route information to the company directly, without requiring any approval from consumers. For example, by installation of GPS devices in its manufactured cars, an automobile company can obtain plenty of information about consumers like where he travels, whether he prefers travelling at night or in daylight, etc. This information while helpful and used by the company itself, can be further sold either as a raw data or in an analysed form to third parties who might have different use for such data.

How Is the Indian Government Collecting Data?

The Aadhaar card scheme in India, as of today, allows the central government and state governments, various law enforcement agencies as well as private players that assist it in the data collection process, to get access to an individual’s name, date of birth, gender, address, photographs, fingerprints and iris scans among other information. This scheme is positioned to provide Aadhaar cards to all citizens of the country, thereby envisaging holding of a minimum of over one billion names, addresses, birth dates, gender and photographs. It further hopes to have in its database over two billion iris scans and twenty billion fingerprints. The scheme has been initiated to combat the ill-governance there exists in India and to make sure that government services are delivered to every citizen as per his needs , thereby requiring the government to undertake the use of data analytics to deliver targeted social benefits to its citizens directly.

With social media gaining traction among Indians, the Indian government has been censoring and conducting and attempting to conduct surveillance activities with respect to social media and internet activities of its citizens. The Indian government has further been attempting to get corporations to part with data of Indian citizens citing national security concerns and proprietary rights over such data1. With an attempt to have an unabridged right over the data of its citizens, Indian government has also been initiating the Central Monitoring System (CMS), which is an internal surveillance programme that allows the government to monitor all internal communication undertaken by its citizens, again under the garb of national security.

The CMS derives its justification from the Indian Telegraph Act 1885' which gives the Indian government freedom to monitor its citizen’s communications in interest of public safety, allows specific government agencies to work without requiring any authorisation, and without any approval of the court or legislation or even the parliament. Currently, the government deploys certain intercepting and monitoring systems which allow it to monitor internet traffic, personal email accounts, web-browsing activity, and any other internet based activity undertaken by Indian citizens9, the scope of which is only posed to increase.

Legal Position In India

Privacy and Data Protection

The end result of Big Data analytics have far reaching impact on how businesses function today and how a government can better implement policies, especially in rural areas. Reports have suggested that as long as a person’s birthday, gender and postcode are available, he can be uniquely identified10. In India, at a minimum, people willingly share their birthday, names, gender, postcodes and email ids at restaurants, shops, information booths as well as on websites, social media accounts, and to corporations they enter into transactions with. These everyday activities however, in the age of Big Data and data analytics, raise serious privacy concerns.

The Information Technology (Reasonable Security Practices and Procedures and Sensitive Personal Data or Information) Rules 201111 (Rules) deal with personal data and information collected today and requires organisations and corporations to lay out clear privacy and data protection policies in an attempt to increase the transparency to the consumer about how their information will be used. These Rules bring under its purview, very broadly, personal data, financial information as well as medical records. However, they keep freely available information or information furnished under the Right to Information Act, 2005 out of the purview of such defined sensitive data.

The Rules require corporations, websites and organisations to clearly publish privacy policies on their website which should be clear, precise and easily accessible to the consumer. They should mention the data to be collected, purpose of collection, how the data is to be used. Further, under these Rules, disclosure of information to a third party requires prior permission from the person whose information is to be disclosed unless such disclosure is contractually allowed to be disclosed, which many websites do by the use of online terms of use and privacy policies. However, corporations and organisations having brick and mortar establishments still do not practice these policies, even though legally mandated.

Further, while the Rules require every corporation, website and organisation to have in place reasonable security practices and procedures including information security programmes and allow for only such data to be collected as required for a lawful purpose connected with a function or activity of the body corporate and is considered necessary. It also requires the body corporate or any person on their behalf to ensure that the person whose data is sought to be collected is aware about it, the purpose for which the information is being collected, its intended recipients, and the name and address of the agency collecting the information and the agency retaining it.

However, in the ambit of Big Data collection, as it stands today, majority of the corporations do not adhere to these Rules, thereby at every step placing a user’s privacy rights in jeopardy. Further, the Information Technology Act, 2000, and the rules thereunder, deal only with a ‘body corporate’ possessing, dealing or handling any sensitive data or information and do not impose any liability on the Indian government for any breach of privacy or loss of sensitive data so collected by them or under their instructions. Since the Aadhaar scheme and the CMS project are initiated by the Central government, and the data collected is on behalf of the government, there currently does not exist any legal mandate or statute which govern such activity being undertaken by the government, thereby giving them complete freedom with respect to privacy laws.

Also, many corporations use the adage that they anonymise the data before using it or that data collected is masked before any use, but anonymisation and data masking can never be a fool proof process. While the Privacy Report12 suggested that there should be collection and purpose limitation so that only so much data is collected as necessary, in the world of Big Data where the aim is to obtain as much data as possible and then retrieve value from it, such principles fall short of implementation.

Copyright Law

Most corporations, the government and individuals remain under the assumption that since they have collected the data after seeking a user’s permission, they own all the collected data, and thus the only legal concern left to be addressed are those dealing with privacy. However, certain intellectual property rights need to be taken into account before collecting, analysing or using any Big Data.

The first question that arises when dealing with Big Data and intellectual property law is who owns the data that is being collected? In most cases, it is an individual who owns the data that is collected, especially structured data, and even under a privacy policy or terms of use, the corporation or the government merely get the right to use it, akin to a license, and not the assignment of copyright itself. In respect of Big Data, the volume of data is so huge that it becomes almost impossible to know who owns what data. But when data is identifiable, due permission needs to be taken before using the data, even if a corporation eventually masks the data.

The Copyright Act, 1957 defines computer as any electronic or similar device having information processing capabilities13 while a computer programme means a set of instructions expressed in words, codes, schemes or in any other form, including a machine readable medium, capable of causing a computer to perform a particular task or achieve a particular result14. Computer programmes as well as databases are understood to mean literary work under the Copyright Act, 195715. Therefore, while the government or a corporation may hire third parties to collect the data, as in the case of the Aadhaar scheme, or purchase data from data brokers, it is important for them to have the copyright in the data assigned to them, before they claim proprietary rights over the same.

Further, the services of data scientists and analysts not connected to the corporations as employees may be procured to create value of the raw data, and in the process of doing so, when the derivative work is created, a new copyright comes into being which needs to be assigned as well.


To make sure that the collection and use of Big Data is in compliance with the legal principles, it is imperative for companies and government to — have privacy and data policies in place that inform an individual about what information is being collected and how does a company or government propose to use it; give notice to an individual of the data collection and protection practices to be followed as well as the purpose for which the data so collected will be used; choice and consent where every individual should have the choice of opt in or opt out of the provisions requiring them to provide their personal information; and an individual should have an opportunity to withdraw any consent given previously; disclose information to third parties only after express consent is taken from an individual for such disclosure. The current lack of visibility where an individual though consents to sharing of information to third parties for analysis but has no control over how the third parties are using such personal information, needs to be addressed.

Be Accountable: The Information Technology Act, 2000 while makes a body corporate accountable for the data collected or secured by it, does not have any such mandate for government or government agencies.

Big Data analytics is a growing field, and in India the growth potential is unparalleled. While knowledge of privacy laws and copyright laws will allow the stakeholders to know what compliance mechanisms need to be followed, it is only when companies and the government educate and keep their consumers informed about their policies and adhere to them, can such Big Data be treated as a boon and not a bane?


  1. 3-D Data Management: Controlling Data Volume, Velocity and Variety, February 6, 2001; Doug Laney
  2. 2013 India Internet Outlook; Sandeep Aggarwal; February 1, 2013; https:// india-internet-outlook/
  3. aspx?relid=85669
  4. Big Data and Enterprise Mobility: Growing relevance of technology themes: the India perspective; Ernst & Young; 2013
  5. Id
  6. Id
  7. Id at pg. 9
  9. national/govt-violates-privacy-safeguardsto- secretly-monitor-internet-traffic/ article5107682.ece
  10. L. Sweeney, Simple Demographics Often Identify People Uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000
  11. dit/files/senstivepersonainfo07_02_11.pdf
  12. Report of the Group of Experts on Privacy; October 16, 2012
  13. Section 2(ffb) of the Copyright Act, 1957
  14. Section 2(ffc) of the Copyright Act, 1957
  15. Section 2(o) of the Copyright Act, 1957