Urban Mapping: 54+M buildings in Russia

12/20/2019 4 Minutes Read

Geoalert a startup company has completed the very first version of its big map data project using CV automated pipeline — Urban Mapping, 54 364 789 buildings all over Russia that are available through the API. In the demo below we transformed polygons into points (centroids) and compared them with Openstreetmap “state of the map” data using vector tiles to visulalize both layers.

The organization has completed the 0.0.1 (very first) version of its most ambitious big map data project— Urban Mapping, 54 364 789buildingsall over Russia that are available through their platform’s API. In the demo below you can see transformed polygons into points (centroids) and compared them with Openstreetmap “state of the map” data. Vector tiles technique is used to visulalize both of layers.

Below is a story in and around the project…

How to calculate people from Space?

It is interesting how scientists estimate the world population. If some report tells you there are 7+bln people and this number is projected to grow to some more bln in the next years — where does the raw data come from? Is it only the census data supplied by the UN country members and how is it verified and updated for each country?

Seems it’s a big deal to calculate people worldwide since the difference in accuracy at the country/region level introduces significant errors into the regression and weighted models used to calculate the population for a specific area. What impresses us the most is how remote sensing imagery becomes the key part in the global projections like this. Satellite imagery allows us to define urban areas at different levels of resolution to generate high-resolution population maps based on objective and continuously updated data.

However, the satellite data used in population modeling has its own limitations regarding the sensor capabilities that can also lead to inaccuracy. Many researchers point out the limitations of Landsat 30m imagery in rural areas where this resolution is insufficient to detect the very sparse and scattered man-made objects to define populated areas. A partial solution comes in a form of nightlight imagery. For example, the Global Rural Urban Mapping Project uses it for adding urban and rural boundaries. But we can face another challenge like in the pictures below: why is there so little light (and population?) in the northern part of Korean peninsula but a lot at the seaside? or why the are some very bright spots in the size of large cities in the northern Siberia?

Recent global projects

Last year Facebook AI team presented their method for generating high-resolution population density maps at a global scale. They used a tailored CNN model to detect man-made structures from satellite imagery of 0.5m resolution. At the moment, they released population density maps, allocated to 1 arcsecond cell, of ~30 countries, mostly in Africa.

You may have heard about MS buildings project? The most recent update of Facebook’s RapID, a fork of ID editor, allows to display these buildings in addition to what is not presented in Openstreetmap. The selected feature can be added and edited by user.

Auto mapping at the building scale

In the meantime, a small team in Moscow decided “we are in the game” and started auto-mapping of the buildings all over Russia.

The entire Russian territory became our primary focus due to the following reasons:

The size of the territory is challenging — we have to figure out how to reduce our time and processing costs
We hope to impact the Russian spatial data services like surveying and insurance that are still far from completeness and ubiquity.
“…Estimates of the housing stock also vary significantly. According to the Federal Statistics Service there are 2,125,211 multi-apartment buildings in Russia, but 1,009,696 houses on the “Reform GKH” portal, and 1,586,048 houses in “GIS GKH…”*
Most of our training and validation data covers Russian cities. It’s better to start from the place you know better

The company applied segmentation CNN to satellite imagery at the low-resolution level to get the primary binary classification — if the area is probably populated.

This way Geoalert got a ~30x reduction in the total number of tiles to be processed at the HDM zoom. It makes about 130 mln tiles 256×256. The scaling of the Geoalert data processing workflow was an important step in achieving an acceptable overall performance. So the HDM model runs on GPU cluster that is part of Skoltech’s High-performance calculation infrastructure (https://www.skoltech.ru/en/2019/01/zhores-supercomputer-presented-at-skoltech/).

Geoalert’s buildings on the top of nightlight Black Marble

The organization keep working in the following directions:

- Improving CNN output using validation data. Their main approach is merging auto-mapping results with external data with spatial reference like Openstreetmap and Reforma GKH. About 2 mln buildings have been auto-validated and Geoalert look forward to getting more. Validation, buildings “heighting” (https://medium.com/geoalert-platform-urban-monitoring/buildings-height-estimation-7babe6420893) and semantic enrichment of the data is a huge topic that is worth a separate story.

>Post-processing issues such as polygonization that can be done much better and with fewer “stupids”.</li

All this stuff is implemented into Geoalert platform that allows setting up multiple workers to organize them into workflows that connect data sources and scale up a processing.