Now, 3D modeling of cities can be done using images alone

Now, 3D modeling of cities can be done using images alone

SHARE
The three-dimensional model of Zurich was created using image data alone. (Visualisation: ETH Zurich)

Clicking photos or making videos of your favorite place in a city or capturing an interesting architecture of a building and then sharing them with your friends on the social media is a common phenomenon. But what if these photos or videos are used to map every street and building of a city? Yes, that is what the VarCity project has demonstrated with Zurich, Switzerland.

This computer vision research project that has been created over last five years, take images from numerous sources — social media, public webcams, transit cameras, aerial shots, drones — and analyze them to create a 3D map of the city. This is how they have come up with an extensive and detailed 3D map of Zurich. The best way to understand it is by imagining inverse of Google Street View. So, here photos will not illustrate the map, they will be the source of the map itself.

Using machine learning algorithms, the model can differentiate between buildings, streets, vegetation, and bodies of water. (Visualisation: ETH Zurich)

VarCity’s vision

The main objective of VarCity is to digitize and understand cities and its daily dynamics in an automatic manner, not undermining people’s privacy, by taking image data alone. The project combined millions of images and videos into a three-dimensional, living model of the city of Zurich. The new technology has many possible applications – for example, it can analyze where and when pedestrians are on the move and parking spaces become free.

Funded by the European Research Council (ERC), the project, prodigy of the team at ETH Zurich’s Computer Vision Lab, was conceptualized and written by Prof. Luc Van Gool in 2011-2012. “In this project, the goal was to bring together new capabilities of computer vision and machine learning and bring them to the next level, thanks to the recent and future availability of an incredible amount of images of a city environment. VarCity was then created as a group of young researchers,” says Dr. Hayko Riemenschneider, VarCity Project leader and researcher.

Over the last five years, the team has developed algorithms and tools to create virtual city models automatically from images. In fact, the algorithms can also tell the difference between sidewalk and road, pavement and grass, and so on. The idea is that you could run these algorithms on other large piles of data and create a similarly rich 3D model of other cities automatically.

The result can be seen as a dynamic 3D version of Google Street View, augmented with automatically deduced additional knowledge on the city — what is it composed of, how many green areas are there, what are the architectural highlights of a building, etc. However, VarCity maps are designed with privacy in mind, and hence the images serve as the basis to create the maps, and not to directly show them.

The model also recognizes façades and windows (highlighted in yellow and red in the buildings to the front left of the picture). (Visualisation: ETH Zurich)

Sources used

The project’s research is based on multiple data sources but only image and video data in each case. “We used aerial images collected in 2008 to get a coarse overview of the entire city. Then we collected street-side images in 2013 to map drivable streets with higher quality. And finally, we filled in the empty spots like pedestrian-only and tourist areas over the timespan of the whole project using images from public webcams, aerial drones, or social media. The more data is available, the more precise our city model will become,” says Dr. Riemenschneider.

However, all data used were public and with usage authorizations — the data that the team crawled online were shared with the creative commons license, allowing usage of them.

VarCity can also evaluate individual buildings, such as height of windows, surface areas and sun exposure

Technology use and its benefit

The improved technologies regarding machine learning, including deep learning, structure-from-motion, 3D surface reconstruction, scene understanding for object detection, tracking and segmentation (in particular pedestrian and car detection as well as semantic segmentation of images), are some of the technologies used to develop this project. So, the question arises now is how does it help the common man?

With the hi-tech methods used, VarCity can instantly evaluate a city’s composition like buildings, vegetation, water, etc. It can also evaluate individual buildings, such as the height of windows, surface areas, and sun exposure. Besides, it can also provide a global traffic model and micro-simulations of specific areas.

So, imagine being able to get sunlight estimation on a facade for all buildings, or its penetration through the windows. Getting to know how much cars pass in a street by watching a condensed simulation of it, the list can go on.

VarCity can play a vital role in a city’s planning — zoning law analysis, simulating how new buildings hinder sunlight penetration for others, etc. While all these applications already exist in the market, they require a huge amount of manual work by skilled people to model the whole city in 3D with specialized software. This whole process is also very expensive too. A quick solution of all the problems is VarCity. “What we have proposed here is to make things as automatic as possible, using available image data, in the hope that such tools will facilitate making 3D city modeling available to the public,” says Dr. Riemenschneider.

Braving the odds

Several startups have already emerged from the project: Spectando and Casalva offer virtual building inspections and damage analysis. Parquery monitors parking spaces in real time through its 3D knowledge of the city. UniqFEED (on a different note) monitors broadcasted games to tell advertisers and players how long they’re featured in the feed.

However, the VarCity project did have initial hiccups. There were many challenges across technical, scientific and data issues. But because the project had a long-term funding, the team was able to sail through the problems.

“We crawled various cloud-based photo sharing websites, yet there is no specific platform which we provide ourselves, as of now. We invite people to upload more images with as many geo- and textual tags. Other ongoing research projects such as the REPLICATE may produce such a platform in the future.” adds Dr. Riemenschneider.

The project used common reference frames, such as the Swiss coordinate system known as CH1903+, which helped integrate all of the different results into the same place. For this, the team converted the partial measurements of GPS metadata that they had into the same common coordinate system.