New York, US: Organisations are taking information they have collected and analysing them to gain better insights in their customers’ behaviour, but the ultimate opportunity lies in analysing geolocational data to figure out where people will be at a given time, observed Jeff Jonas, an IBM distinguished engineer and chief scientists at IBM Entity Analytics.
Jonas was delivering the morning keynote address at GigaOM’s Structure Big Data conference in New York City, US. While using “space-time-travel” would be an enormous opportunity, it will unravel secrets and challenge existing notions of privacy, he said.
“Surveillance society is irresistible. And you are doing it,” he opined during his presentation. He noted the use of location-based services such as FourSquare, free e-mail, and social networking tools such as Twitter and Facebook. Despite privacy concerns, Jonas was overall very enthusiastic about big data, noting that having large data sets meant companies were able to make more accurate predictions. There were lower false negatives and lower false positives, he said. The computing time required to obtain the data also decreases, meaning the enterprise has access to more data, faster, he said.
“Every two days now, we create as much information as we did from the dawn of civilisation up until 2003,” Jonas quoted Google CEO Eric Schmidt during his presentation. Now no one wants to wait to sift through huge amounts of data to get the “smart answer,” he said.
Data is useless unless it is placed in context with other information in order to discover relevance, he added. Comparing data collection to puzzle pieces, there is no way to tell what the individual pieces mean or represent without actually trying put them all together, Jonas continued. Noting that the same thing cannot be in two places at once, including space and time observations, removes ambiguity from collected data, Jonas emphasised. “For example, the last 10 years of address history, taken in context, can tell if a person is the same or not, when digging through billions of rows of data,” he said.
According to Jonas, new observations can also reverse earlier assumptions and conclusions. However, after hitting a certain amount of data, there is a “tipping point” after which confidence in what the data analysis is revealing improves while the computational effort decreases.
Cellphones are generating a “staggering amount” of ge-locational data, over 600 billion transactions per day in the United States alone, Jonas said. The data quickly reveals where people spend most of their time and who they spend it with, he said. “De-identified” does not mean “true anonymisation,” especially in large data sets, Jonas said.
It is possible to predict with “87 percent certainty” where someone will be at a certain time in the future, he said. A government intelligence service could pre-empt the next mass protest in real-time based on geolocation data alone, he said.
Privacy advocates have been saying for years that users are inadvertently or voluntarily, giving companies large amounts of tracking data. Jonas agreed, noting that if the government was collecting the same information, users would be horrified.