Imperative to build public trust for ethical use of data

Heather Savory, Co-chair, UN Global Working Group on Big Data

Today, powerful data sources are readily available, with scalable cloud compute and storage resources, tools, techniques and skills, which empower us to generate more insight. 

Data has an essential role to play in sustainable development, as citizens, businesses and governments become more actively interested in how our actions impact the planet — locally, regionally and internationally. Global issues such as Climate Change, natural capital, agriculture, poverty and levels of trade between nations are increasingly at the top of public and political agendas. Also, many businesses are starting to look at the overall impact of their actions, seeking to operate more sustainably and to invest in a more impactful way. A new era is dawning. This era is marked by the data revolution.

Powerful data sources are readily available, together with scalable cloud compute and storage resources, tools, techniques and skills, which empower us to generate more and more insight. The barriers to entry into the data space — in terms of the cost and availability of these data and technologies — are lower than ever.

Today, many people recognize the opportunities the ‘Data Era’ presents for sustainable development. Data for good projects and programs are underway across many countries and sectors, with different organizations, including governments, charities and NGOs, starting to work more innovatively with data.

To understand the world better, we need independent measures of the planet (environment, climate) and the societies people live in (how they live and the economies supporting these societies). These measures need to be trustworthy — data should be used ethically, individual privacy should be respected, methods of calculation should be open and results should be published for all.

We inhabit a physical world and so need to know where things are happening. Geography tells us this. Geo-referencing is a fundamental component of any useful data infrastructure, allowing us to know how the different factors we are interested in combine at any given physical location. Geospatial data can provide us with high-frequency, high-resolution insight into many things; it is probably the most powerful source of data we have.

Understanding Earth’s extremities

I recently attended an evening on ‘Sustainability and the Climate Change Emergency’ hosted by Valtech UK at the Royal Geographical Society. The speakers for the evening included Paul Rose, recipient of the Society’s 2018 Founder’s Medal for scientific expeditions and enhancing public understanding. Paul has been leading expeditions for over 30 years, collaborating with the world’s top field scientists to unlock the secrets of some of the Earth’s most extreme areas. He spoke about his work and we saw pictures of the tented research stations set up on vast ice plains in Antarctica and of the team diving beneath ice flows in sub-zero seas.

One of the major projects, the Antarctic Seabed Carbon Capture Change Project, involves trying to measure how much carbon is held per unit area of the seabed per year, and how this varies in time and space. What is interesting about this work from a data perspective is that it brings together the most disparate data sources you can possibly imagine. Directly sampled physical data is collected by diving in frozen seas to find (and later replace in the exact same locations) small shellfish. This local, physical data is combined with satellite data for analysis.

Local biodiversity

At the event, the National Biodiversity Network spoke about the data they collect on the flora and fauna of the United Kingdom and their innovative NBN Atlas which combines multiple sources of information about UK species and habitats, providing the ability to interrogate, combine and analyze data at a single location. The NBN Atlas lets you type your location into a map and shows you the species around you. In my case, 1984 separate species were reported within 1km of my home despite my London location. Again, physical data combining with geospatial data to deliver useful insight.

These are just two small examples of why we need geospatial data as data for good. It needs to be trusted data; we need to know its provenance and we need it to be as open as possible. Geospatial data is needed, in its own right, also to be combined with other data sources such as administrative data, business data and survey data to answer the questions which, ethically, we should be asking about the world. The challenge is not so much about ‘can we do this’? It is more about should we do this and how to build, guarantee and maintain trust in data for public good.

Public interest in data

As public interest in the use of data grows, anyone working in the sustainable development sector, or in other sectors, should expect to be challenged more and more strongly in the future about how they are using data. Where that data comes from, how it is being analyzed and what they have discovered from the data? Also, how to prove what they have discovered from the data, what they have done with that information?
In the data for good space, the onus is on us to welcome these questions and be open in our answers to them to build public trust in the ethical use of data for public benefit. Data for good needs to pay more attention to the wider questions around data usage. Open data and open information should be the norm unless the data identifies individuals, in which case it must be properly protected. We must be able to prove who is accessing which data for what purpose.

The work I am doing with the United Nations Global Working Group (GWG) on Big Data for Official Statistics is bringing such principles and practices together. Under the auspices of the United Nations, we have developed the UN Global Platform (UNGP), an ecosystem which facilitates international collaboration in the ethical use of data for public benefit. The UNGP brings together National Statistics Offices, UN development programs, data scientists, academics, data providers and technology suppliers with an operational model which implements strong ethics, active governance and robust working practices, and delivers benefits to all parties.

We should not be competing in the data for good space and the UNGP facilitates collaborative co-working across multidisciplinary international teams. Since the work is for public benefit, results will be shared openly. So, when one team has developed a robust method to measure an indicator using satellite data, the method and the data source is there for use by a neighboring country. The UNGP also has learning materials and virtual communication channels to deliver cross time-zone support for skill development and capacity building, which are especially important for small and developing nations, many of whom face significant challenges to measure and deliver the SDGs.Those who cannot prove that their use of data is ethical, properly governed and fully operationally controlled will, in time, lose their license to operate, across all sectors.

The public questioning has started. As I write this column, the United States House Committee on Financial Services is grilling Mark Zuckerberg. They have asked directly why “he and Facebook should be trusted after years of privacy scandals related to data breaches and the Cambridge Analytica affair”. All of us who work with data should take this seriously. This is not just about a successful and wealthy entrepreneur, or about Facebook and other big corporations. These and other questions will be publicly levelled at us too in the future. The more important the questions we are asking and the insights we are generating, the more we need to be above criticism; we need strong ethics, active governance and robust working practices.

But the key benefits of the UN Global Platform are not the data sources, technology, services, tools and content; they are the ecosystem and operating model which reduce the barriers to entry to advanced data analysis for all parties whilst maintaining strong ethics, active governance and robust working practices. These aspects, underpinned by openness and collaboration, will deliver solid outcomes for data for good.

