Home Articles Semi-Automatic building extraction from LIDAR Data and High-Resolution Image

Semi-Automatic building extraction from LIDAR Data and High-Resolution Image

N. Ekhtari
MSc student
Faculty of Geodesy & Geomatics Engineering
K. N. Toosi University of Technology, Tehran, Iran
Email: [email protected]

M.R. Sahebi
Associate professor
Faculty of Geodesy & Geomatics Engineering
K. N. Toosi University of Technology, Tehran, Iran
Email: [email protected]

M.J. Valadan Zoej
Associate professor
Faculty of Geodesy & Geomatics Engineering
K. N. Toosi University of Technology, Tehran, Iran
Email: [email protected]

A. Mohammadzadeh
PhD Student
Faculty of Geodesy & Geomatics Engineering
K. N. Toosi University of Technology, Tehran, Iran
Email: [email protected]

This paper proposes a semi automated system of building extraction by the use of the combination of LIDAR point cloud data and high resolution imagery. This system is comprised of two phases; Detection and Extraction.

Our system in the detection phase, detects building blobs. To do this, first a Digital Terrain Model (i.e. DTM) from LIDAR data should be produced. Then a normalized DSM is produced by subtracting the height of each LIDAR point from the interpolated height of the corresponding DTM grid beneath them. Then building blobs are detected using a multi-step classification process which divides all points into two general classes of building class and non-building class.

In the extraction phase, the system first extracts long edge lines from image. Then some of those edge lines which lie in the building blobs are selected by an operator and help sketch the building shape. The accuracy of system is evaluated using some datasets and has proved satisfactory results.

The production of 3D building models today is growing rapidly due to their application as well as the late improvements in their acquisition. 3D urban models are being widely used in urban planning and management, set-up of telecommunication networks, utility services monitoring, air pollution control, etc. The production of an urban model requires object extraction, which detects the object of interest and extracts its geometric boundary from remotely sensed data. (Sohn and Dowman, 2007) Then some details can be added to the model to improve its applicability. The levels of detail incorporated in a 3D city model determine its production cost.

The current state of automation in the reconstruction of buildings from aerial images is still low due to the complexity of the reconstruction itself (Suveg and Vosselman 2003). This suggests that to improve the state of automation in such tasks, at least one complementary data source should be integrated to the image.

Another data source which can be solely used to extract buildings is LIDAR point cloud. During the last decade some research has been done on detection and extraction of buildings form high-resolution IKONOS images (Fraser et al, 2002; Lee et al, 2003), from stereo images (Baillard and Meitre, 1999), and from LIDAR data (Hu, 2003; Miliaresis and Kokkas, 2007) each of which have their own benefits and weakness points.

Some research has also been done on the extraction of buildings in urban areas from the combination of image and LIDAR datasets (Halla and Brenner, 1999; Rottensteiner and Jansa, 2002; Zhou et al, 2004; Sohn and Dowman 2007) and even the fusion of LIDAR and image (Rottensteiner et al, 2005).

Since new sensors’ data such as high-resolution imager, SAR and LIDAR, have become applicable, the semi-automated and automated production of 3D building models with higher levels of detail and accuracy is being feasible. The integrated use of the data provided by two different sensor types enables us to avoid the weakness points of either sensor.

Our system integrates LIDAR and aerial image as two complementary data sources to detect and extract building boundaries. The detection phase of our system is explained in section3 while section4 is dedicated to the building extraction phase. Conclusions are stated in section5.

2.Test dataset
LIDAR point cloud data used to evaluate our system comprises of two recorded laser pulse returns; First pulse and Last pulse return points. FP (i.e. First Pulse return) points are those recorded from the first reflection of the laser pulse. As a result they might belong to the edges or surfaces of objects on the terrain instead of the ground beneath them. While the LP (i.e. last Pulse return) points are more likely to belong to the terrain, especially for vegetation points and those near walls of buildings. That’s why we prefer LP points to create DTM and FP points to create DSM.

The aerial image used here is a 3 channel (RGB) image with 25 cm ground pixel size and 8 unsigned bits pixels. FP and LP 3D points with 1.2 m spacing across-track and 10 cm along-track are also used in combination with the aerial image.

The data used for the evaluation of our system is provided by the ISPRS Commission III Working group8 web-site, and is available on-line at the following URL. https://isprs.ign.fr/packages/zone3/package3_en.htm

Before starting with the process we should co-register these datasets. Since the provided LIDAR data has a known coordinate system and, we used them as the reference and registered aerial image to it.

3.Building Detection
In our system building detection is accomplished with the cooperation of LIDAR point cloud and aerial image. As mentioned before, to detect buildings we should first produce a Digital Terrain Model by LIDAR points lying in ground. Similarly a DSM is produced by rasterizing FP points. This results in an image of which every pixel has a value according to the interpolated height for its center. Then a normalized DSM is calculated by subtracting the height value of corresponding DTM and DSM pixels. Then a multi-step classification is applied to the aerial image. As a result, FP points will be classified into two groups of building and non-building points. Finally building blobs which are polygons larger than individual building roofs are generated each of which lying inside an individual building.

3.1. DTM Generation
A DTM should be produced by the use of LP points lying in the terrain among the other ones. So we should detect ground points and separate them from point cloud. This is achieved by first resampling aerial image and combining it with the DSM image to generate a color orthoimage, and then applying the ISODATA clustering on this color orthoimage as explained in Haala and Brenner (1999). The basic idea of their algorithm is to simultaneously use geometric and radiometric information by applying a pixel-based classification (Haala and Brenner, 1999). However the result of ISODATA clustering is not that satisfying for building detection purposes, it is appropriate for terrain detection. Since this algorithm classifies pixels with respect to both their radiometric and geometric characteristics, smooth regions with seamless texture will be classified in an individual cluster. So in the case of similarity between the color of roof, ground, or asphalt, as illustrated in Fig.1, the algorithm will fail to classify building pixels thoroughly and accurately. Red pixels in the cluster image of Fig.1 represent the terrain. But the roof of the building in the center of the scene is classified in the same cluster because its color is similar to the asphalt covered area and both the street and roof are smooth. Green pixels also represent the grass covered regions which belong to the surface of the earth.

Figure 1- The illustration of ISODATA clustering. Original aerial image (on the left) versus the results of clustering (on the right)

Now a human operator compares the clusters with the aerial image and detects those clusters lying in the terrain. Even if any editing is necessary for clusters to completely cover terrain or not to cover non-ground pixels, the human operator will take care. The edited clusters will be assigned a “Ground” label. Then a Point in polygon analysis is performed on the LP point dataset and “Ground” clusters. The results will be a set of points called “ground” points which lie on the ground.

Finally the DTM is produced by interpolating “ground” points to a raster. Since a DTM represents the terrain, which is almost smooth especially in urban areas, it’s better to use an IDW (i.e. Inverse Distance Weighted) interpolation method.

3.2. nDSM Calculation
After the DTM is produced, we should detect high-rise object points among the point cloud. To do so, first a Digital Surface Model (i.e. DSM) is generated using all the points in the FP return dataset by interpolating them to raster grids. Since the surface models cover all objects above the terrain and these objects usually make a rough surface, the interpolation method used here is the Nearest Neighbor (i.e. NN) interpolation. Consequently such a model is a raster dataset in which every pixel has a value according to the height of the nearest point to it.

Then a normalized DSM (i.e. nDSM) is created by subtracting individual DSM pixels from the corresponding pixels on DTM. Like DTM and DSM, an nDSM is a raster in which pixel values correspond to their elevation, but this elevation is normalized. A normalized DSM is a model illustrating every object as if they are mounted on a flat area. In other words, nDSM is a DSM in which the topography effect is eliminated.

3.3. Multi-step Classification
In this section we will find building points among FP points dataset. Both nDSM and aerial images are used in a multi-step classification.

Step1. In this step we classify nDSM so that high-rise regions are classified in a class according to the fact that high-rise pixels in the nDSM can be detected by thresholding. The selection of an effective height threshold,, depends on the characteristics of the buildings in the region of interest. For the present dataset we selected a 3m height threshold. So every object taller than 3 meters can be a building. Our system classifies the nDSM into two polygon classes of “low-rise” and “high-rise” using the above threshold.

Step2. In this step we find high-rise points which are FP points completely contained by the “high- rise” polygons. This is the matter of Point in polygon analysis which leads to the “high-rise point” label for such points. Similarly low-rise points are those thoroughly lying inside the “low-rise” polygons. Since we do not need these points in building detection phase, they actually are not assigned “low-rise point” label.

Step3. Since trees are objects with the elevations usually more than, high-rise points can belong to trees as well as building roofs. In this step those “high- rise” points belonging to trees should be masked out the others. To this end, first “high-rise” polygons defined in step1 are mapped onto the aerial image in order to give a meaning of the region of interest (i.e. ROIs) to it. Then a Maximum Likelihood (i.e. ML) classification is performed on ROIs on the aerial image to classify them into the “Tree” and “Building” classes. Then those “high-rise points” lying inside polygons of “Building” class are assigned the “Building point” label and stored in a distinctive file with a similar name.

The rest of FP points are assigned the “non-building” label which includes ground points, low-rise points, and high-rise points mounted on trees. Fif.2 illustrates this multi-step classification. The background color image is the produced nDSM. Due to the density of LIDAR points in along-track, “High-rise points” are mapped on nDSM in a way that they sketch their scan-lines.

Figure 2- produced nDSM with the detected “High-rise” points mapped on it.

3.4 Blob detection
So far we have detected “building points”. Now we should detect buildings themselves. A building blob is a polygon thoroughly covering an individual building. They are produced by grouping building points as isolated objects (Sohn and Dowman, 2007). This task is fulfilled by generating building blobs from building points. As a result every building blob will represent a building. To this end, first a buffer analysis is performed on the points recorded in the “Building points” file produced in the previous step. The radius of buffer circles, for the present dataset is set to 0.7 meter, since the point spacing in this point cloud is 1.2 m in the across-track. This assures that the buffer circles of the points mounted on building roofs will completely cover them, as well as the blobs will be larger than roofs areas. Then a union operator merges overlaying buffer circles, and building blobs are produced as a result. Finally building blobs are simplified by removing extraneous bends in their boundary and an area threshold, discards blobs smaller than 100 square meters.

4. Building Extraction
So far we have detected buildings. This means that we have achieved some regions of the scene where buildings exist. In this section we extract buildings’ boundary shapes as a 2D plan. To this end, using aerial, high resolution image and nDSM image, First edge line segments should be extracted. Then those edge lines lying inside the building blobs will remain and the others should be discarded. This will decline the computational burden of line simplification and line segment grouping processes, since line segments lying outside the building blobs will distract these algorithms. After that, remaining edge line segments will be simplified, grouped, and stored in a vector file format. Finally a human operator will generate building plans using the data in this file.

4.1 Edge detection and Extraction from aerial image
To detect edge lines of the aerial image we use the Roberts cross-gradient operators. These operators are two masks of size which compute the first-order partial derivatives at every pixel. (Gonzalez and Woods, 2002)

After the implementation of the Roberts edge detector, an image sharpening filter is applied on the resulting edge image to enhance it. Then a binary threshold, (e.g. equal to 70 for our case) will be applied on the sharpened gradient image to classify its pixels into two classes; “Edge pixels” and “Background pixels”. The resulting binary image is called the “Edge image”.

Edge lines should represent the boundaries of buildings. To extract them we should first vectorize the edge image which will lead to a set of line segments. Since there are too many brightness differences on the image, there are too many extracted line segments. As stated earlier, not all of these line segments belong to the building boundaries. Therefore, to discard extraneous line segments we first map edge line segments on the building blobs determined in the previous section. Then only line segments lying inside the building blobs will be remained. Afterwards, line segments are to be filtered by a length threshold, which filters out short line segments. Finally remaining line segments are grouped using the Fast Line Segment Grouping method developed by (Jang and Hong, 2001). This method finds longer and more favorable line segments which will be exported to a vector file named “primary border lines”.

4.2 Edge detection and Extraction from nDSM image
As stated in section 3.2, nDSM is stored in a raster format. Therefore coarse changes in elevation of nDSM grids (i.e. brightness of nDSM pixels) can be detected as edge line segments. The process of line segment detection and extraction from nDSM image is the same as what mentioned for aerial image. The only difference is that the line segments extracted from nDSM image belong to boundaries located between two areas with a significant height difference, while line segments extracted from aerial image are located between every two regions having significant radiometric differences. Finally longer line segments in the edge image of nDSM lying inside building blobs, will be stored in the vector file “primary border lines” produced in the previous section.

The ultimate result of our building extraction system is illustrated in Fig.3 in which building polygons are shown with a transparency level of 50%, superposed on the aerial image.

5.Conclusion and further research
This paper presented a semi-automatic building extraction process by the integrated use of aerial high-resolution imagery and LIDAR point cloud data. Our system was introduced as a 2-phase process; Building detection, and Building extraction. In the detection phase the LIDAR data helped the automatic production of DSM, and the semi-automatic generation of DTM by an ISODATA clustering approach with the simultaneous use of aerial image. Then an nDSM was calculated from DTM and DSM.

Figure 3- The 2D extracted buildings (Blue regions) superposed on the aerial image.

Afterwards, “Ground points” were detected by the use of a thresholding on nDSM image which led to “high- rise” regions, a point in polygon analysis which resulted in “High-rise points” determination, and a classification on aerial image which led to “tree points” elimination. Finally building blobs were extracted using the remaining points as some polygon shapes representing buildings roughly.

Then in the extraction phase, the system first extracted long edge line segments from both nDSM and aerial images. Then those long line segments lying in building blobs helped an operator sketch the building outlines. The resulting building outlines are depicted in Fig.3.

As explained above, DTM generation, and building outline reconstruction from line segments in our system are fulfilled manually. Therefore the further research should be focused on the fully automation of the system.

The next step can be the reconstruction of the 3D building model by means of the results of our building extraction phase, as well as edge lines lying inside buildings’ outlines, as well as height data of LIDAR points inside building outlines.


  • Baillard, C., Maitre, H., “3-D Reconstruction of Urban Scenes from Aerial Stereo Imagery: A Focusing Strategy”, Computer Vision and Image Understanding (Vol. 76, No. 3, December), pp.244-258, 1999
  • Fraser, C.S., Baltsavias, E., Gruen, A., 2002, “Processing of IKONOS imagery for submetre 3D positioning and building extraction”. ISPRS Journal of Photogrammetry & Remote Sensing 56 (2002) pp.177-194
  • Gonzalez, Rafael C., Woods, Richards E., 2002, Digital Image Processing (second edition), 2002, pp577-578, by Prentice-Hall, Inc
  • Haala, N., Brenner, C., 1999, “Extraction of buildings and trees in urban environments”, ISPRS Journal of Photogrammetry & Remote Sensing 54 (1999) pp.130-137, 25 February 1999
  • Hu, Y., 2003, “Automated Extraction of Digital Terrain Models, Roads and Buildings Using Airborne LiDAR Data”. University of CALGARY, pp206
  • Jang, J., Hong, K., 2002, “Fast line segment grouping method for globally more favorable line segments”. Pattern Recognition 35 (2002) 2235-2247.
  • Lee, D.S., Shan, J., Bethel, J.S., 2003. “Class-guided building extraction from IKONOS imagery”. Photogrammetric Engineering and Remote Sensing 69 (2), 143-150.
  • Leung, Y., Yan, J., 2004, “Point-in-Polygon Analysis Under Certainty and Uncertainty”. GeoInformatica journal, Springer Netherlands, pp.93-114
  • Miliaresis, G., Nikolaos, K., 2007, “Segmentation and object based classification for the extraction of the building class from LIDAR DEMs”. Computers & Geosciences (2007)
  • Rottensteiner, F., Jansa J., 2002, “Automatic Extraction of Buildings from LIDAR Data and Aerial Images”, ISPRS Commission IV, Symposium 2002 Ottawa, Canada, July 9-12, 2002
  • Rottensteiner, F., Trinder, J., Clode, S., Kubik, K., 2005, “Using the Dempster-Shafer method for the fusion of LIDAR data and multi-spectral images for building detection”. Information Fusion, Volume 6, Issue 4, December 2005, pp 283-300
  • Sohn, G., Dowman, I., 2007, “Data fusion of high-resolution satellite imagery and LiDAR data for automatic building extraction”, ISPRS Journal of Photogrammetry & Remote Sensing 62 (2007) pp.43–63, 15 February 2007
  • Suveg, I., Vosselman, G., 2003, “Reconstruction of 3D building models from aerial images and maps”. ISPRS Journal of Photogrammetry & Remote Sensing 58 (2004) pp.202-224
  • Zhou, G., et al. “Urban 3D GIS from LiDAR and digital aerial images”, Computers & Geosciences 30 (2004) pp.345–353, August 2003