Home Articles Optimum Feature Selection For Classfication of Lidar Data Using Genetic Algorithms

Optimum Feature Selection For Classfication of Lidar Data Using Genetic Algorithms

F. Samadzadegan
Department of Geomatics Engineering,
Faculty of Engineering, University of Tehran
[email protected]

A. Javaheri
Department of Geomatics Engineering,
Faculty of Engineering, University of Tehran
[email protected]

Due to the low resolution characteristic of available commercial LIDAR systems, it becomes difficult to correctly classify objects from LIDAR range data. In order to improve the performance of classification process, additional data should be considered. These are mainly: First and last LIDAR pulse and Intensity of returned laser beam and color Aerial Image. By using different combination of mentioned information, several numbers of features (Pattern Descriptor) have been developed. Nevertheless, there are no theoretical guidelines that suggest the appropriate features to be used in specific classification situations.

The presented method uses a genetic algorithm for feature selection. Genetic algorithms (GAs), a form of inductive learning strategy, are adaptive search techniques which have demonstrated substantial improvement over a variety of random and local search methods

We have taken the most popular classifier, the maximum likelihood classifier, to evaluate the quality of the output of proposed optimum feature selection method. A framework for quality assessment has been proposed and tested, based on similarity measures between classified data and reference data. The numerical investigation of the obtained results demonstrates the high capability of the proposed method for determining the optimum features for classification of LIDAR data. And results show that Image classification with optimum feature subset increase the overall accuracy.

1. Introduction
Recognition and reconstruction of the object in real world is a major goal for many fields of research such as, photogrammetry, machine vision and vision metrology. In this field, we can define an object with textural, structural and spectral property. Textural property related to this fact that, the image of real objects often do not exhibit regions of uniform intensities and textural image, defined as a function of the spatial variation in pixel intensities (Tuceryan 1998). Structural feature describes the geometry of an object and finally the electromagnetic radiation reflected by objects of the same nature is similar overall and these objects will thus have similar spectral property.

Simplest way to classify image is use all of extricable features simultaneously in classification algorithm but and there are a number of inter-related reasons why feature selection is desirable.

  1. Using a smaller feature set may improve classification accuracy by eliminating noise inducing features (Jain & ongker, 1997, Siedlecki & Sklansky, 1989).
  2. Small feature sets should be more generalizable to unseen data. If training data is in short supply, the use of a small number of features may reduce the risk of “overfitting” the parameters of a classifier to the training data (Yang & Honavar, 1998).
  3. The use of a small feature set raises the credibility of the estimated performance of the classifier (Siedlecki & Sklansky, 1989).

“Feature selection” is the process of selecting an optimum subset of features from the enormous set of potentially useful features which may be available in a given problem domain (Gose, Johnsonbough & Jost, 1996). Therefore, the main goal of feature subset selection is to reduce the number of features used in classification while maintaining acceptable classification accuracy or increase it. This process is a very important step in organizing a classifier. Theoretical approach cannot be applied to determine the optimal combination of features, and the only way to select the optimal feature subset is to evaluate all possible combinations of the features.

Feature subset selection algorithms can be classified into two categories. If feature selection is done independently of the learning algorithm, the technique is said to follow a filter approach. Otherwise, it is said to follow a wrapper approach (M. Sebban, 2001). The filter approach is computationally more efficient but its major drawback is that an optimal selection of features may not be independent of the inductive and representational biases of the learning algorithm that is used to build the classifier. On the other hand, the wrapper approach involves the computational overhead of evaluating a candidate feature subset by executing a selected learning algorithm on the database using each feature subset under consideration. wrapper based algorithm is categorized into three, Sequential, Exponential and Random Search. Genetic algorithm is type of randomized search strategy. The applicability of GAs to the optimum feature subset selection problem is obvious, and there has been considerable interest in this area in the last decade. In this paper, genetic algorithms are applied to optimum feature subset selection.

2. Genetic Search
Genetic Algorithms (GAs) are adaptive heuristic search algorithm premised on the evolutionary ideas of natural selection and genetic. The basic concept of GAs is designed to simulate processes in natural system necessary for evolution. The main operator of genetic algorithm to search in pool of possible solutions is Crossover, Mutation and Elitism.

The usual approach to the use of GAs for feature selection involves encoding a set of d features as a binary string of d elements, in which a 0 in the string indicates that the corresponding feature is to be omitted, and a 1 that it is to be included this coding scheme represent the presence or absence of a particular feature from the feature space Figure (1). the length of chromosome equal to feature space dimension.

Figure 1. Designed Chromosome for subset Selection

3. Maximum Likelihood Classifier
Classification can be defined as the association of a land use/land cover attribute to every pixels of an image (Duda and P.E. Hart-1973), and ML Supervised image classification begins with computing statistics for user-selected training feature vector of land cover classes and it uses the results of the statistical summary to classify the image. For classify the image, the probabilities of each feature vector’s belonging to each of the classes are calculated and the image pixel is assigned to the class for which this probability is the highest.

The computation of probabilities is given by:

……………….(1)

Where μ is the mean value and ∑ is the covariance matrix of class i Gi(x)> Gj(x)if then pixel x is belong to class i.

4. Data set
The airborne LIDAR data used in the experimental investigations have been recorded from city in Germany. The pixel size of the range images is one meter per pixel so that the density of point is one per m2. Intensity images for the first and last pulse data have been also recorded and the intention was to use them too in the experimental investigations. Furthermore colored aerial image was available for describe spectral property of objects. Feature space has 8 members in which constitute a subcategory .a pool of possible feature or feature spaces contain:

  • First and Last Intensity
  • First and Last Range
  • Red& Green& Blue
  • Normalized Difference of Range Image(NDDI)

Because the capability of penetration of LIDAR pulse this is the good feature discrimination of the vegetation pixels from the others. It can computed by formula (2) that

……………(2)

Figure 2.Range Image First & Range Image last

Figure 3.Colored Aerial Image & NDDI Image

Figure 4. Intensity Image & Intensity Image last

5. our work
The classification process is composed of the following steps:

Step1: Preparation data, Contain Co-registration of LIDAR and Aerial Image, Noise reduction and filtering the LIDAR data

Step2: generate pool of possible solution
Step3: Optimum feature selection

An optimum feature subset selection has demonstrated in this following diagram:

Step4 .After the selection process, the all image classified with optimum feature subset with ML Classifier.

5.1. Objective Function
A goal of supervised image classification is classify image to a certain class with a highest accuracy and the feature set is optimal that make available this condition.

With this concept The fitness evaluation is a mechanism used to determine the confidence level of the optimized solutions to reach higher accuracy.

In this work we use confusion matrix for accuracy assessment. And extract Kappa parameter that can be computed from below

…………..(3)

Maximum value of kappa parameter equal to 1 and Because of this type of genetic algorithm minimized the fitness value, so fitness function defined as

……………..(4)

5.2. Parameter Setting
Our experiment used the following parameter setting for genetic algorithm.

  • Initial Population size : 150
  • Population Size: 15
  • Number of generation: 70
  • Crossover: 0.8%
  • Mutation: 0.2%
  • Elite Count: 1

This setting is obtained from several running Genetic program.

6. Experiment and Result
The main goal of these experiments is to optimize the feature set presented in feature space section to reduce the complexity of pattern recognition and increase the overall accuracy of this problem. Above concept has been implemented in MATLAB7.1.

The classification result shows in figure (4), figure (5)

Figure 5. Reconition Result of Tree and Road Class

Figure 6.Recognition result for Grassland and Building

During the analysis of classification results, quality assessment was performed by comparing overall accuracy and kappa coefficient. In general, classification with optimum features leads to overall accuracy of about 0.928%. The result shows that overall accuracy is 3% higher than using all of the features. Furthermore improvement of the accuracy in building class is better than Grass Land class. Table (1) shows the Confusion matrix of image classification with optimum feature subset.

Table 1.Result of ML classifier with test data

7. Conclusion
We have presented the results of applying ML Classification technique on LIDAR data and Aerial Image for 3Dand 2D object recognition. The result shows the capability of using this dataset simultaneously. Furthermore shows that optimum feature subset lead to improvement of classification accuracy.

8. References

  1. M. Sebban, R. Nock; A hybrid filter/wrapper approach of feature selection using information theory Pattern Recognition 35 (2002) 835-846
  2. Amin P. Charaniya, Roberto Manduchi, and Suresh K. Lodha University of California, Santa Cruz “Supervised Parametric Classification of Aerial LiDAR Data
  3. M. Pei1, E. D. Goodman1, W. F. Punch’Feature Extraction Using Genetic Algorithms
  4. Ping Zhang*, Brijesh Verma , Kuldeep Kumar ,2004,”Neural vs. statistical classifier in conjunction with genetic algorithm based feature selection”
  5. Michael L. Raymer, William F. Punch, Erik D. Goodman, Leslie A. Kuhn, and Anil K. Jain “Dimensionality Reduction Using Genetic Algorithms” IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 4, NO. 2, JULY 2000
  6. Baranidharan Raman Thomas R.Ioerger Department of Computer Science Texas A&M University “Instance Based Filter for Feature Selection ” Journal of Machine Learning Research 1 (2002) 1-23
  7. Erick Cant´u-Paz ,Shawn Newsam,Chandrika Kamath “Feature Selection in Scientific Applications ” Center for Applied Scientific Computing Lawrence Livermore National Laboratory
  8. F. Samadzadegan, Automatic 3D object recognition and reconstruction based on artificial intelligence and information fusion concepts, University of Tehran (2002).
  9. R.O. Duda and P.E. Hart. Pattern classification and scene analysis. John Wiley & Sons. NY. 1973.
  10. Tuceryan Mihran, Anil K. Jain “Texture Analysis” The Handbook of Pattern Recognition and Computer Vision (1998)