Global-scale water monitoring in the cloud

Global-scale water monitoring in the cloud


Water plays many roles in our lives. At the most basic level, we need clean, safe, accessible drinking water each day. But we also use fresh water for cooking, sanitation, recreation, power storage and generation, and irrigation. Water, however, can be fickle. The location and availability of naturally-occurring fresh water vary with the weather and is likely to change in the future due to human influence and the changing climate. We have long sought to tame rivers and stabilize our access to water through feats of engineering, leading to the construction of nearly 7,000 dams around the world.

Today, we have an additional tool at our disposal: massive amounts of satellite imagery and the compute power to process it, which gives us the ability to monitor water availability worldwide.

Training Computers to See Water

Here at Descartes Labs, we have developed a platform for geospatial analysis that allows our scientists to easily access petabytes of satellite imagery in order to build models using machine learning and artificial intelligence tools. At first glance, mapping and monitoring water might seem easy — just look for the blue pixels, right? But the reality is much more complex: water can appear blue, brown, green, or black depending on the light and its level of dirt or algae. And some things that are not water, like shadows, can share similar spectral characteristics to water.

To build our water model, we sampled over half a million 30-meter pixels from around the world and across Landsat missions 4 through 8, and Sentinel 2, spanning 1984 to the present. Using the publicly available water occurrence dataset as our “ground truth,” we trained a boosted random forest model to predict the probability of water. In addition to six native bands of the satellites (red, green, blue, near infrared, shortwave infrared 1&2), we used two transformations into hue-saturation-value space, as well as normalized differences like the normalized difference water index (NDWI). Finally, we employed high-resolution slope information, which helped us distinguish terrain shadows from the water. Because clouds often obscure our view of the ground, we combined our predictions for each image across three-month seasons (e.g. March, April, and May) to produce a quarterly water mask.

Time Traveling through Terabytes

Once our model was trained, it was time for the fun part: deploying it at scale! We chose to focus on about a dozen interesting dams and reservoirs across the world, as well as the continental United States as a whole. The latter task is truly one of “big data”: the continental United States has a surface area of over 8 million square kilometers, which corresponds to nearly 9 trillion 30-meter pixels — and we were looking over a period of 34 years. Thanks to the hard work of our software engineering team, however, it was possible to perform this feat in not much longer than a weekend by spinning up thousands of computers in the cloud at a time, and sending a subset of the imagery to each computer. In the end, our models crunched more than 100 terabytes of satellite imagery across millions of machines in the Google cloud.

By peering into the past through historical satellite imagery, we can get a front-row view of some dramatic changes to water bodies over time. For instance, we can see the natural progression of the Yangtze River in Hubei, China in 1987 based on Landsat 5 data (below, water mask shown in light blue). The landscape was dramatically altered after the installation of the Three Gorges Dam, which led the Yangtze to extend beyond its original banks and flood parts of the surrounding mountainous landscape. While the dam provides an enormous amount of non-fossil fuel energy for China, the large changes that we can see from the satellite imagery had even larger impacts on the ground: 1.3 million people were displaced from their homes to allow for the inundation of the reservoir, and its construction has led to an uptick in massive landslides that risk injury to people and infrastructure.

Global-scale water monitoring by Descartes Labs

Global-scale water monitoring in the cloud by Descartes Labs
Water extent (shown in blue) in the region surrounding Sandouping, China before (1987) and after (2016) the construction of the Three Gorges Dam. Note that the backgrounds of both images is a median composite from 1987 in order to emphasize the changes in the surface water.

We can also examine more rapid changes in surface water. Spain has experienced a major drought over the past two years, due to the combination of low precipitation and high temperatures. Many of the reservoirs that provide water for agriculture have reached dramatically low levels — in some cases exposing the original cities that were flooded when creating the dams, a reminder of the human impact of dam building. Using satellite imagery, we can watch as one reservoir along the Duero River, the Almendra, recedes from March 2016-November 2017. This is not, however, the first time that the Almendra Reservoir has shrunk to these low levels: we can see similar behavior in imagery from the late 1980s and early 2000s.

Global-scale water monitoring in the cloud
Water extent (shown in blue) of the Almendra Reservoir in northwest Spain. Each frame is based on three months of imagery, beginning in March 2016 and ending in November 2017. Note that the backgrounds of all images is a median composite from September 2017 in order to emphasize the changes in the surface water.

Finally, let’s zoom out to look at the big picture — the really big picture. After applying our water mask to the full continental United States, we can begin to understand how surface water changes over seasons and years. Watch as lakes and rivers fill up with spring rains and snowmelt, then die down again over dry summers, or as the salt flats west of the Great Salt Lake flood and dry out again.

Global-scale water monitoring in the cloud
Water extent (shown in blue) across the continental United States from 1984-present. Each frame is based on three months of imagery. The original 30m resolution dataset has been downsampled to 300m for display purposes using a maximum resampler in order to emphasize small water bodies. Missing data, due to clouds or a lack of imagery, is infilled via the four nearest seasonal neighbors (i.e. missing data for summer 2000 can be inferred from summers 1998, 1999, 2001, and 2002). Pixels that remain missing after infilling are in dark gray.

The Future of Water

These visualizations are just the beginning: with this algorithm and dataset, we can ask and answer important questions about how water availability is changing with human use and the weather at large spatial scales.

To address the environmental challenges that we increasingly face, we first must actively measure and monitor our natural resources — and then make the necessary global, regional, and local changes to protect them. For the first time, we can do so with a global view through the use of satellite data. The information we glean from our analyses can help us understand, for example, whether a water shortage is related to local weather and use variability, or indicative of a larger, continental-wide trend. The answers to questions like these are not only of scientific interest, but also critical for governments, nongovernmental organizations, and businesses who are committed to building a more sustainable future.

Just as it is possible to use our massive trove of satellite data and computational power to track global supply chains and estimate agricultural yields, we are also committed to monitoring our precious natural resources. Increased transparency and improved measurements are critical first steps toward the development of policies that will ensure that we all have a water-secure future.