Home Articles Subpixel Estimation of Impervious Surface Using Regression Tree Model: Accuracy of the...

Subpixel Estimation of Impervious Surface Using Regression Tree Model: Accuracy of the Estimation at Different Spatial Scales

M. Rafee Majid
Dept. of Urban and Regional Planning
Universiti Teknologi Malaysia
81310 UTM, Skudai
[email protected]

As a key indicator of the level of urbanization, impervious surface has widely been used as an environmental indicator linking development to its impact on the environment. Many have for example suggested that there is a direct correlation between the amount of impervious surface and the environmental characteristics of an urban area such water quality and local climate. Quantification of impervious surface, nevertheless, remains tedious if not difficult and lately more efforts have been focused on the methods employing remote sensing and GIS technologies for this purpose. One of the methods used is the regression tree model that has generated promising results at various pixel levels. This paper discusses an attempt at applying the regression tree model in estimating the amount of impervious surface based on the Landsat ETM+ images. Using data from the remote sensing images and high-resolution aerial photography, a regression tree model is first developed for estimating subpixel impervious surface. The accuracy of the estimated impervious surface using the model is then evaluated at different spatial scales, from pixel to regional scale. While the results reveal that the estimated impervious surface may not be as accurate at pixel scale, it shows encouraging accuracy at subdivision and regional scales. Thus there is a potential for the use of such method at a larger spatial scale. A discussion on the advantages and pitfalls of using the estimation method then concludes the paper.

Impervious surface is not a single homogeneous quantity but when used as a landscape indicator it is typically presented as a percentage of the land that is covered with impervious materials. Arnold & Gibbons (1996) defined impervious surfaces as any material covering the ground that prevents infiltration of water into the soil. While the most prevalent impervious surfaces are man-made materials such as pavement and building rooftops, there are also natural surfaces that are so heavily compacted as to be functionally impervious. Examples of these are compacted soil in construction areas, dirt roads and, even to a certain extent, grass turf in residential areas (Arnold & Gibbons, 1996; Schueler, 1995). In this study, impervious surface is defined as any fixed man-made materials in a residential subdivision that has the potential to prevent infiltration of water into the soil. These include rooftops and patios; transportation-related impervious surface such as roads, parking lots and sidewalks; recreational surface such as tennis court and swimming pool; and infrastructure-related impervious surface such as water tank and the likes.

This paper discusses the application of a regression tree model on remote sensing images for estimating the amount of impervious surface. Specifically, the paper investigates the accuracy of using the model on moderate-resolution Landsat ETM+ images in estimating impervious surface aggregated at three spatial scales; pixel, subdivision (local) and regional. Even though there have been studies reporting the accuracy of the method at the pixel level (see Smith, 2000; Ward et al., 2000; Wu & Murray, 2000; Yang et al., 2003), no accuracies have so far been reported at the subdivision and regional scales. In an effort to fill the gap, this paper attempts to investigate the accuracy of the method across the three spatial scales. Accuracies are assessed by the degree of mean absolute errors(MAE) when comparing the predicted amount of impervious surface to the actual amount. The amount of impervious surface obtained through visual interpretation of 0.3m orthophotos is adopted in this study as the actual amount of impervious surface. Due to its relatively high accuracy, imperviousness estimated through visual interpretation of the high-resolution orthophotos can be considered as the actual value of imperviousness in the field (Lee, 1987; Harvey, 1985; Kienegger, 1992).

Early impervious surface mapping efforts using remotely sensed data were mainly conducted through visual interpretation of aerial photography which were both time consuming and prohibitively expensive when performed over a large area. In addition, available aerial photographs were collected at differing scales and on different dates, thus requiring time-consuming rectification, digitization, and interpretation. Digital satellite imagery later began to provide a synoptic view of the Earth’s surface capable of producing regular, repeatable spatially-intensive land cover maps. Significant reductions in the amount of labor necessary for impervious surface delineation came with computer-automated spectral analysis of satellite data. These methods were capable of obtaining results comparable to aerial photo interpretation in considerably less time and with a significant reduction in cost (Ragan & Jackson, 1975).

Another advantage of satellite imagery over aerial photographs is that satellite sensors have spectral bands that match the spectral reflectance properties of certain land covers. The Landsat TM (Thematic Mapper) sensor, for example, has six reflective bands whereas color and color-infrared aerial photography is limited to three spectral bands, and black and white photographs have only one band. There are, however, some limitations to satellite imagery depending on the resolution (pixel size) of the image. Landsat TM images, for instance, have a pixel size of 30m x 30m which is large enough to encompass a diversity of land-cover conditions of differing imperviousness. On the other resolution scale, however, finer-resolution imagery such as IKONOS (4m x 4m for multispectral image, 1m x 1m for panchromatic image) is cost prohibitive for a large study area requiring multiple scenes.

Many of the earlier methods using spectral information from satellite sensors are based on supervised and unsupervised classification techniques and other forms of spectral clustering, thresholding, and modeling. Products are often presented as maps portraying the presence or absence of impervious features at the single pixel scale. Other estimates of impervious cover, meanwhile, rely on lookup tables (conversion factors) derived from surrogate measures of parcel size (Monday et al., 1994) and land use and land cover information (Deguchi & Sugio, 1994; Williams & Norton, 2000; Ward et al., 2000). Forster (1985), however, warned against classifying MSS and TM pixels found in the urban settings as one specific land cover class due to a mismatch in resolutions; the sensor resolution being too coarse compared to the fine spatial resolution of features in the urban environment.

More recent studies adopted advanced machine learning algorithms and spectral mixture analysis that allowed the derivation of imperviousness at the subpixel level. Flanagan & Civco (2001), for example, conducted a subpixel impervious surface mapping using artificial neural network and an ERDAS Imagine® subpixel classifier. The overall accuracy at the binary impervious-non-impervious detection level varied from 71 to 94% with a root mean square error (RMSE) of 0.66 to 5.97. Ji and Jenson (1999), Wu and Murray (2002), Ward et al. (2000), and Rashed et al. (2003) meanwhile have experimented with spectral mixture modeling to derive information about the amount of impervious cover in a single pixel. Wu and Murray reported an overall estimation RMSE of 10.6 percent imperviousness. In another approach, modeling techniques using decision tree models have been successfully implemented in subpixel quantification of impervious surface. A decision tree model dealing with discrete data is known as a classification tree model and that dealing with continuous data is referred to as a regression tree model. Smith (2000), for instance, used classification tree with the overall within-class accuracy of about 84%. Yang et al. (2003) went a step further by using regression tree, thus modeling the impervious surface output as a continuous rather than discrete variable.

Regardless of the method used for quantification of impervious surface (spectral mixture analysis, artificial neural networks or machine learning algorithms), the subpixel processing techniques have proven effective at increasing classification accuracy of impervious surface (Slonecker et al., 2001). Civco & Hurd (1997) also concluded that the information derived about impervious surfaces from subpixel classification methods was superior to traditional land use/land cover based method.

A decision tree method is a multistage or hierarchical decision scheme that recursively partitions a data set in binary fashion into smaller and smaller subdivisions until the final subdivisions can no longer be partitioned or they satisfy some user-defined criteria. The procedure is called a decision tree since it is done in an upside-down tree-like structure; starting with the full data set at the root, followed by a series of partitioning at internal nodes or splits before ending with the final subdivisions at the terminal nodes or leaves (Figure 1). The variables used for partitioning exercise at both the root and internal nodes are predictor or independent variables while the variable partitioned is the dependent or target variable. The decision tree method is called classification tree model if the target variable is discrete or categorical and regression tree model if the target variable is continuous.

Numerous approaches to decision tree method have been developed in the past thirty or so years (Friedl & Brodley, 1997). Early tree construction approaches were limited to utilizing data sets that were both well understood and well behaved, partly due to limitation of computing technology at the time. Tree construction at the time depended solely on analyst expertise to set up a priori threshold values used for splitting the tree nodes. Given the difficulty in specifying the threshold values on the basis of user knowledge alone due to the tendency of the values to vary across both time and space, this procedure is very difficult to implement in practice. With advanced computer technology nowadays, however, a statistical procedure is more commonly used on a set of training data to estimate the threshold values. The specific techniques used for this work are called learning algorithms, which have been developed within the machine-learning and pattern-recognition communities (Quinlan, 1993). The techniques require high-quality training data from which relations among the independent and dependent variables present within the data can be “learned”. A classic example of this learning algorithm approach is the classification and regression tree (CART) model described by Breiman et al. (1984). In CART, a tree is constructed by recursively splitting the data at each node on the basis of a statistical test that increases the homogeneity of the training data in the resulting descendant nodes. There are now a number of statistical softwares that function specifically to handle CART or incorporate CART as part of their package. Among the widely-used ones are See5, Cubist and S-Plus which is the statistical software used in this study.

Figure 1: Decision tree

Starting with a training data set or learning samples, three steps are necessary in building an optimal regression tree; 1) tree building; 2) tree pruning; and 3) selection of optimal tree. Tree building based on binary recursive partitioning begins at the root node using all the learning samples where the CART software finds the best possible variable to split the node into two child nodes. To find the best variable, the software checks all possible splitting variables (independent variables), as well as all possible values of the variable to be used to split the node. The selected split at each node is a split that partitions the data into two parts such that it minimizes the sum of the squared deviations from the mean (of the dependent variable) in the separate parts. The splitting process then goes on to successive nodes until terminal nodes are reached under one of the following criteria: (1) a node reaches a user-specified minimum node size (i.e. number of training samples at the node); (2) all observations within each node have the identical distribution of predictor variables, making splitting impossible; or (3) the deviance among the samples at a node is lower than a user-specified value.

Since the resulting ‘maximal’ tree is constructed from training samples, it follows every idiosyncrasy in the learning data set and as a result generally suffers from overfitting. Overfitting results in less generalization capability and may deteriorate the regression accuracy of the tree when applied to unseen data. Correcting the regression tree for overfitting is the second step in the construction of a regression tree and is called ‘pruning’. A pruning process is generally adopted using a fresh set of samples known as a validation data set. The objective of pruning is to minimize the output (independent) variable variance in the validation data. In the pruning process, the last grown node of the maximal tree is removed first followed by more and more nodes (of increasing importance), resulting in simpler and simpler trees. Each of these simpler trees is a candidate for the appropriately-fit final tree or optimal tree, i.e. the one with the lowest or near-output variable variance. Detailed process of tree pruning is described in Quinlan (1993) and Breiman et al. (1984).

Each of the simpler trees generated after pruning may have different combinations of independent variables and different qualities. Generally one wants a tree that is parsimonious in its use of independent variables yet low in its error rates. The quality of a regression tree is measured by the mean absolute error R(T) expressed by:


represents the regression plane through the example set, N is the number of samples used to establish the tree and Yiis the actual value of the predicted variable. Thus, mean absolute error can be used as the basis for selecting the optimal tree.

The use of decision trees, both classification and regression, has steadily increased in remote sensing field (Hansen et al., 1996; Friedl & Brodley, 1997). Compared to the traditional supervised classification procedures used in remote sensing such as maximum likelihood classification, a classification tree has several advantages. A decision tree is, for example, strictly nonparametric and do not require assumptions regarding the distributions of the input data. It also handles nonlinear relations between features and classes, allows for missing values, and is capable of handling both numeric and categorical inputs in a natural fashion (Friedl & Brodley, 1997). A decision tree also has a significant intuitive appeal because the classification structure is explicit and therefore easily interpretable. Hansen et al. (1996) tested a classification tree with the use of remotely sensed data and their results showed that the tree performed comparably to a maximum likelihood classifier. In another study, Huang and Townshend (2003) reported that accuracy and predictability of the regression tree models were better than those of the simple linear regression models.


Study Area
The study area was Wake County in the State of North Carolina, USA (Figure 2). With a land mass of 860 square miles, the county housed an estimated population of 678,651 persons in 2002. Wake County measures about 46 miles from east to west and 39 miles north to south. Based on the 1998 land use distribution data from North Carolina’s Center for Geographic Information and Analysis (Figure 2 and Table 1), urbanized area covers about 7.3% of the county. Forest cover in the form of evergreen, deciduous, and mixed forests as well as woody wetlands makes up the largest percentage of the land use at 71% of the total area. Agricultural land in the form of crop agriculture and pasture places in second with 18.7% of the

Figure 2:Study area showing land use distribution

Data and Image Preprocessing
The study used two forms of remote sensing imagery and a planimetric data of a small section of the study area. The first type of the remote sensing imagery was scenes from Path 15/Row 35 and Path 16/Row 35 of the Landsat ETM+ images captured by Landsat 7 satellite. The images that were captured on 25 April 2002 had already been orthorectified and projected to UTM Zone 17 coordinate system with a NAD83 datum (Figure 3). The second type of remote sensing imagery acquired was the 0.3m rectified orthophotos from the USGS Earth Resources Observation System Data Center taken on 28 March 2002 (Figure 4). The orthophotos have been orthorectified and projected to UTM Zone 17 coordinate system with a NAD83 datum. The high resolution of the orthophotos made it possible to visually compare between pervious and impervious materials as well as between different types of impervious materials. For that reason, the orthophotos were utilized to update the planimetric data used in establishing the training and validation data set for impervious surface. To avoid potential errors due to temporal difference in image dates, it is best that all the images are of the same date or at least close enough for assumption of no changes in and use/land cover operties.

Table 1: Distribution of land use in Wake County

Digital planimetric data for a small urbanized section of the county was acquired and utilized as the main source of training and validation data sets for impervious surface in building the regression tree model. It is important to note here that although the planimetric data contained rich information in vector format delineating as-built boundaries of building footprints, parking lots, roads, footpaths and other structures, its use for determining the amount of impervious surface was limited by its spatial availability. In this case, the planimetric data covers only about ten percents of the study area. On top of that, the currency of the planimetric data was lacking, requiring updating of information using the latest high-resolution aerial photos. With its information updated using the 0.3m orthophotos, the planimetric data provided an accurate and current training and validation data sets for impervious surface.

Figure 3: Landsat ETM+ images

Direct estimation of imperviousness at subdivision level using aerial photos required usage of GIS spatial data in addition to the 0.3m orthophotos described above. The GIS vector data layers include: 1) a subdivision map for identification and random selection of subdivision samples, where a total of 115 samples out of the available 3280 subdivisions were selected; 2) planimetric data, used in conjunction with the orthophotos, to expedite digitizing of impervious surface in subdivisions for which the planimetric data was available; and 3) a digital road network to expedite calculation of impervious surface originating from streets. All of the data used in this part are digital spatial data in ArcView shapefiles.

Figure 4: 0.3m orthophoto

Before they could be used, the remote sensing images must go through a few steps of preprocessing. The original 28.5m ETM+ images were co-registered to the 0.3m orthophotos to within 0.5 pixel root mean squared error (RMSE) before being resampled to 30m pixels using the nearest neighbor resampling method. The 0.3m orthophotos themselves were beforehand co-registered to the planimetric data. Co-registration is a process of superposing two or more images guided by ground control points so that equivalent geographic points coincide. Accurate co-registration between the images is important since even a slight mis-registration could result in potentially large differences between actual and predicted values of imperviousness. All of these steps were done using the ERDAS Imagine 8.6 software.

Using the GIS software ArcView® 3.2 equipped with its spatial and image analysis extensions (Spatial Analyst and Image Analyst), the digital numbers (DN) of all six reflective bands of the Landsat ETM+ images (Bands 1-5,7) were then converted to at-satellite reflectance values as described by Landsat Project Science Office (2002). Then the Normalized Difference Vegetation Index (NDVI) values for the images were calculated, followed by the Tasseled-cap values of brightness, greenness and wetness, using at-satellite reflectance-based coefficients described by Huang et al. (2002). The ratio of Band 5:1 was also added as a possible soil moisture indicator helpful in discriminating between concrete and exposed soil. To summarize, the final layers that would be used as independent variable inputs were grid layers of at-satellite reflectance of ETM+ visible bands, NDVI values, Tasseled-cap values and Band 5:1 ratio (Table 2).

Table 2: Independent variables for regression tree method


Training and Validation Data
Building the regression tree model to estimate imperviousness per Landsat ETM+ pixel required substantial training and validation data. These training and validation data of impervious surface were the dependent variable of the regression tree model. The main source of the data was the planimetric data that had been updated and verified using the 0.3m orthophotos. All coverages of the impervious surfaces from the planimetric data (buildings, roads, parking, utilities, etc) were merged into one vector dataset. Four 1800m x 1800m windows of the planimetric data were visually selected to cover spectral variations of impervious surfaces and degree of urbanization that best represented the study area. Each of the four 1800m x 1800m windows was then divided into nine equal-sized blocks of 600m x 600m where six of which were randomly selected for use as training blocks and the remaining three as validation blocks. Campbell (1981) and Friedl et al. (2000) suggested that using randomly selected pixel blocks rather than individual pixels as test data should reduce possible bias in model accuracy assessment due to spatial autocorrelation between training and test data. The 1800m x 1800m vector windows were then rasterized into 0.3m pixel in ArcView and reclassified into binary categories of impervious and pervious. Zonal summary of the 0.3m pixels of the impervious category within each 30m pixel of the training and validation areas were then carried out using the Spatial Analyst function of ArcView 3.2 to give the percentage of impervious surface within those 30m pixels.

Regression Tree Analysis
Collection of the values for the independent and dependent variables was first carried out before starting the regression tree analysis. The task was carried out in ArcView with the help of an Avenue Extension called StatMod developed by Garrard of Utah State University (Garrard, 2002). StatMod was used to collect the grid values of both the independent and dependent variables from their respective grid layers. The independent variables were those listed in Table 2 and the dependent variable was the percentage of impervious surface within each 30m pixel of the four 1800m x 1800m training and validation areas. A total of 9600 data points (grid values) per variable were collected for use as training data and 4800 data points per variable as validation data. The data were then exported from ArcView into S-PLUS® to build a maximal regression tree using the values of the dependent and independent variables from the training area (Figure 5). Then, pruning of the maximal tree was carried out to produce a series of simpler trees, each of which a candidate for the optimal tree. In order to select an optimal tree, the quality of each candidate tree was based on its mean absolute error of prediction (using the validation data). In addition to mean absolute error, correlation coefficient was also calculated for each candidate tree. Since several trees were close in their qualities, the tree that used the least number of independent variables, a parsimonious model, was selected. A parsimonious tree model is desirable since it requires less data volume as well as computing time.

Figure:5 Maximal regression tree without the variables (a) and details of a portion of the tree generated using S – PLUS (b).

The selected regression tree model or the optimal model was then used to estimate the imperviousness of all Landsat ETM+ pixels within the study area. This was done in S-PLUS by providing the regression tree model with the pixel values of the relevant independent variables for all pixels within the study area. The resulting imperviousness of each pixel was then exported back into ArcView for visual display. The whole process consumed a lot of computing time and resources since it involved more than 4.3 million pixel values per variable (30m pixel) for a study area of this size, i.e. 860 square miles. This is one reason why a parsimonious model was preferred. In ArcView, the pixel imperviousness was also aggregated at several levels for further analysis. The levels were 2×2 pixel windows (60m x 60m grid), 3×3 pixel windows (90m x 90m grid) and, of course, at subdivision level for the selected subdivisions.

Digitizing Impervious Surface of Subdivisions
Quantification of impervious surface at the subdivision level for later comparison with the predicted values of the regression tree model was done using the manual on-screen (or head-up) digitizing of the 0.3m orthophotos integrated in GIS with the vector data of subdivision boundaries. This method has been successfully carried out and described by various people among whom are Lee (1987), Harvey (1985) and Kienegger (1992). The procedure began with overlaying of the subdivision digital map onto the geo-referenced 0.3m orthophotos in ArcView. From there, impervious surfaces as schematically shown in Figure 6 were digitized from the orthophotos. The process entailed tracing each identifiable feature’s impervious footprint from the orthophotos and summing its total amount according to subdivision. Imperviousness of each subdivision was then calculated which was the percentage of the total subdivision area covered with impervious surface.

In digitizing impervious surface within a subdivision, all area of pavement, sidewalks and nonresidential-lot impervious surface were digitized whereas only samples of driveways and rooftops were digitized. Stratified random sampling was carried out in sampling of lots within a subdivision. This involved separation of lots into two groups, lots served by cul-de-sacs and lots served by through streets before random samples from each group were taken, proportionate to each group’s share of the total lots. Once the lot samples were selected, impervious surfaces from rooftops, lot driveways and right-of-way (r.o.w) driveways adjacent to the selected lots were then digitized. Undeveloped lots were excluded from sample selection and assumed in this study to have the average amount of imperviousness of other lots with similar lot size and location. Altogether there were 13,828 residential lots in all 115 subdivisions and a total of 3,107 lots were sampled for digitizing of impervious surfaces. The total samples thus represent approximately twenty two percents of the total lots. The percentage of samples however differs from subdivision to subdivision depending on homogeneity of lot size within the subdivisions. The range of sample percentages was from five percent for subdivisions with homogeneous lots to as high as thirty percent for subdivisions with variable lot sizes.

Figure 6: Components of impervious surface in residential subdivisions


Selection of an optimal model
Table 3 lists accuracy estimates for some promising model options based on different combinations of independent variables. The mean absolute errors for the models range from 7.8 to 8.4% with the correlation coefficients range very closely from 0.69 to 0.71. These results are close to those reported by Yang et al. (2003) when they used the same model to estimate impervious surface. They reported mean absolute errors of 9.2 to 11.4% and slightly higher correlation coefficients of 0.82 to 0.89.
Table 3: Performance of selected models using different combinations of predictive variables

The small differences in accuracy estimates among the models encouraged adoption of a simpler and parsimonious model requiring the least number of independent variables. The relative importance of the independent variables was assessed based on the position of each variable within the rule-sets (the tree) of the model. Within the rule-sets, independent variables are ordered in decreasing relevance to the dependent variable with the most important independent variable positioned at the top of the tree. Figure 5(b) shows portion of the maximal regression tree generated in S-PLUS showing the relative importance of each independent variable in the tree. Inspection of the rule-sets of the models revealed that the most important variables in descending order were NDVI, wetness, B1, B7 and B4. The insignificance of the other variables excluded from the models was not surprising since there were high correlations among the variables as indicated in Table 4. The selected regression tree model was therefore the one developed using only NDVI, wetness, B1, B4 and B7 (Model 4 in Table 3).

Table 4: Correlations among independent variables

Model accuracy across spatial scales

a) Accuracy at pixel scale Validation of pixel imperviousness estimated by regression tree models using Landsat ETM+ images was poor on a pixel-by-pixel basis due to the geometric registration errors between the Landsat images and the orthophotos. Figure 7 shows the plot of predicted versus actual imperviousness on pixel-by-pixel basis. In general, image-to-image registration can rarely be less than half a pixel off in both horizontal and vertical directions. When comparing the subpixel impervious surface from these two sources on a pixel-by-pixel basis, there is less than a quarter of a pixel overlap. This small overlap is the reason why a small mismatch in the registration can lead to large errors in accuracy assessment (Dai and Khorram, 1998).

Figure 7: Predicted versus actual imperviousness per pixel for the actual 30m pixel.

The impacts of mis-registration on validation, however, can be reduced when working on aggregated window basis. Two window sizes were therefore chosen in this study, i.e. 2 pixels by 2 pixels or 2×2 window (60m pixel) and 3 pixels by 3 pixels or 3×3 window (90m pixel). Figure 8 shows the plots of predicted versus actual imperviousness after aggregation at 2×2 and 3×3 window sizes. The impact of mis-registration decreases as window size increases, leading to better agreement between the modeled and the actual impervious surface fractions.

b) Accuracy at local (subdivision) scale
The accuracy of model prediction at the pixel level is important from a scientific perspective and as shown earlier even a slight mis-registration between images could result in large errors. From the management perspective, however, the assessment of imperviousness is more meaningful if done on a landscape management unit such as a watershed or a subdivision. Therefore, the pixel-based imperviousness predicted by the selected regression tree were summarized at subdivision level for the selected 115 subdivisions and compared to the digitized values obtained from the visual interpretation of the 0.3 orthophotos. Summarization of the pixel-based predicted imperviousness was carried out in ArcView only after the water and farm masks had been applied. This eliminated the possibility of misinterpreting water bodies and fallow fields for impervious surface, but the potential of misinterpreting bare soils in non-farm land, however, was still present. Figure 9 shows the plot of model-predicted imperviousness versus digitized imperviousness at subdivision level. The results were encouraging with the mean absolute error decreased to only 4.8% and the correlation coefficient increased to 0.9. There was however still a tendency for the model to overpredict imperviousness at low values. This can be attributed to confusion in Landsat images between bare soils and impervious surface.

Figure 8: Predicted versus actual imperviousness per pixel for (a) 60m pixels (2×2 window) and (b) 90m pixels (3×3 window)

Figure 9: Model-predicted imperviousness versus digitized imperviousness at subdivision level

c) Accuracy at regional scale
Another way to assess the accuracy of the selected model is through visual inspection of predicted imperviousness over the entire study area. Application of the selected regression tree model over the entire study area produced reasonable spatial pattern of impervious surface with some weaknesses that could be overcome to a certain degree. The most obvious weakness was the confusion in interpreting water bodies as impervious surface but this weakness was overcome in this study by implementing water mask to the study area. Water mask can be easily extracted from classification of the remote sensing images. The second and more difficult weakness was the spectral confusion between bare soils (especially from fallow fields) and man-made impervious surface that might have caused the overprediction of low imperviousness. This is however more a weakness of the remote sensing images than the model itself. In this study, this weakness was partly overcome by including a farm mask extracted from the parcel map and assigning zero as the imperviousness value of the area. For urban area, however, there is no available data for such mask and it may or may not be reasonable to anticipate that the extent of bare soils in urban area is relatively minimal.

Figure 10 shows the results of applying the final model over the entire study area with the water and farm masks discussed above incorporated. Visual inspection of the outputs indicates reasonable representation of the pattern of impervious surface within the study area. Major urban centers, the airport, commercial centers and even major transportation routes are well represented with very high imperviousness. High density residential areas are also well differentiated from areas of low residential density surrounding them. These results are good enough for analysis at this level, i.e. a regional level.

Figure 10: Imperviousness level of the whole county.

The study was about application of remote sensing technology in urban planning works. The objective here was investigate the accuracy of using medium-resolution Landsat ETM+ images in estimating impervious surface aggregated at three spatial scales. Images from Landsat ETM+ were used together with GIS-ready planimetric data updated with high resolution orthophotos for developing a regression tree model to predict imperviousness percentage of each Landsat pixel. Zonal summary of the imperviousness percentage of relevant pixels would give percentage of impervious surface within any spatial zone such as subdivisions, city or even county. It was found that there were several limitations of the model, some of which could be overcome as discussed earlier. However, certain weaknesses seemed to be inherent of the model or the procedures involved in developing the model. One such weakness was the difficulty in co-registering the images used in the model which affected the accuracy of pixel-to-pixel model validation. Nevertheless, this difficulty was overcome by validating the results on aggregated window basis and the resulting prediction error of about 8% was comparable to those reported in past studies.

More useful from management perspective, however, was aggregation of the predicted imperviousness at subdivision level which resulted in higher accuracy when compared to the digitized values. The mean absolute error reported was about 5% but there was still a tendency for the model to overpredict imperviousness at low values due to confusion with bare soils. Although the mean absolute error of 5% is encouraging, the tendency to overestimate low imperviousness can generate biased results. Through visual inspection, the accuracy of the model was acceptable at the regional scale where the model managed to separate areas of high imperviousness from those with low or no impervious surface. Overall, the model has a potential for a quick and synoptic estimate of imperviousness in large areas provided that the areas have no or little bare soil or a procedure is available to eliminate bare soil interference in the model’s prediction. The convenience of using remote sensing images for impervious surface estimation should therefore be taken advantage of. Cautions, however, should be exercised when matching the objectives of the study to the resolution of the remote sensing images used and the issue of spectral confusion between impervious surface and bare soils or other similar natural features still need to be resolved/duly noted.


  • Arnold, C.L. and C. J. Gibbons. 1996. Impervious surface: The emergence of a key urban environmental indicator. Journal of the American Planning Association 62(2): 243-258.
  • Breiman, L., J. Friedman, R. Olshen and C. Stone. 1984. Classification and Regression Trees. Chapman and Hall, New York. 358pp.
  • Campbell, J. 1981. Spatial correlation effects upon accuracy of supervised classification of land cover. Photogrammetric Engineering & Remote Sensing 47(3):355-63.
  • Civco, D.L. and J.D. Hurd. 1997. Impervious surface sapping for the State of Connecticut. Proceedings of the 1997 ASPRS Annual Conference, Seattle, WA. pp124-135.
  • Dai, X. and S. Khorram. 1998. The effects of image misregistration on accuracy of remotely sensed change detection. IEEE Trans. Geoscience and Remote Sensing 36:1566-1577.
  • Deguchi, C., and S. Sugio. 1994. Estimations for impervious areas by the use of remote sensing imagery. Water Science and Technology 29(1-2):135-144.
  • Flanagan, M., and D.L. Civco. 2001. Subpixel impervious surface mapping. Proceedings of the 2001 ASPRS Annual Convention, 23-27 Apr. 2001, St. Louis, MO.
  • Forster, B.C., 1985. An examination of some problems and solutions in monitoring urban areas from satellite platforms. International Journal of Remote Sensing 6(1):139-151.
  • Friedl, M.A. and C.E. Brodley. 1997. Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment 61:399-409.
  • Friedl, M.A., C. Woodcock, S. Gopal, D. Muchoney, A.H. Strahler and C. Barker-Schaaf. 2000. A note on procedures used for accuracy assessment in land cover maps derived from AVHRR data. International Journal of Remote Sensing 21:1073-1077.
  • Garrard, C. 2002. StatMod. Available online at . Accessed on 11/20/2004.
  • Hansen, M., R. Dubayah and R. DeFries. 1996. Classification trees: an alternative to traditional land cover classifiers. International Journal of Remote Sensing 17(5):1075-81.
  • Harvey, R. B.1985. The Use of orthophotography and GIS technology to conduct a storm drainage utility impervious surface analysis: A case study. Proceedings ASPRS/ACSM Annual Meeting, 10 – 15 Mar 1985. Washington DC. pp271-78.
  • Huang, C. and J.R.G. Townshend. 2003. A stepwise regression tree for nonlinear approximation: Applications to estimating subpixel land cover. International Journal of Remote Sensing 24(1):75-90.
  • Huang, C., B. Wylie, L. Yang, C. Homer and G. Zylstra. 2002. Derivation of a tasseled-cap transformation based on Landsat 7 at-satellite reflectance. International Journal of Remote Sensing 23(8):1741-1748.
  • Ji, M. and J.R. Jensen. 1999. Effectiveness of subpixel analysis in detecting and quantifying urban imperviousness from Landsat thematic mapper imagery. Geocarto International 14(4):33-41.
  • Kienegger, E.H. 1992. Assessment of Wastewater Service Charge by integrating Aerial photography and GIS. Photogrammetric Engineering and Remote Sensing 58(11):1601-1606.
  • Landsat Project Science Office. 2002. Landsat 7 science data user’s handbook. Goddard Space Flight Center. Available online at handbook_toc.html. Accessed on 10/28/2004.
  • Lee, K.H. 1987. Determining impervious area for stormwater assessment. Proceedings ASPRS/ACSM Annual Convention, 29 Mar – 3 Apr 1987. Baltimore, MD. pp17-23.
  • Monday, H.M., J.S. Urban, D. Mulawa and C.A. Benkelman. 1994. City of Irvine utilizes high resolution multispectral imagery for NPDES compliance. Photogrammetric Engineering & Remote Sensing 60(4): 411-16.
  • Quinlan, J.R. 1993. C4.5: Programs for machine learning. Morgan Kaufmann Publishers. San Mateo, CA. 302 pp.
  • Ragan, R.M. and T.J. Jackson. 1975. Use of satellite data in urban hydrologic models. Journal of the Hydraulics Division ASCE 101: 1469-75.
  • Rashed, T., J.R. Weeks, D. Roberts, J. Rogan, and R. Powell. 2003. Measuring the physical composition of urban morphology using multiple endmember spectral mixture models. Photogrammetric Engineering & Remote Sensing 69(9):1011-20.
  • Schueler, T.R. 1995. The peculiarities of perviousness. Watershed Protection Techniques 2(1):233-39.
  • Slonecker, E.T., D.B. Jennings and D. Garofalo. 2001. Remote sensing of impervious surfaces: A review. Remote Sensing Reviews 20:227-255.
  • Smith, A.J. 2000. Subpixel estimates of impervious surface cover using Landsat TM imagery. M.A. Scholarly paper. Unpublished. Department of Geography, University of Maryland. College Park, MD.
  • Ward, D., S.R. Phinn and A.T. Murray. 2000. Monitoring growth in rapidly urbanized areas using remotely sensed data. Professional Geographer 52(3):371-86.
  • Williams, D.J., and SB. Norton. 2000. Determining impervious surfaces in satellite imagery using digital orthophotography. Proceedings of the 2000 ASPRS Annual Conference. 22-26 May 2000. Washington, D.C.
  • Wu, C. and A.T. Murray. 2002. Estimating impervious surface distribution by spectral mixture analysis. Remote Sensing of Environment 84:493-505.
  • Yang, L., C. Huang, C.G. Homer, B.K. Wylie and Michael J. Coan. 2003. An approach for mapping large-area impervious surfaces: synergistic use of Landsat-7 ETM+ and high spatial resolution imagery. Canadian Journal of Remote Sensing 29(2):230-240.