Fast k-Nearest Neighbour Classification for Remote Sensing Hyperspectral Data

Fast k-Nearest Neighbour Classification for Remote Sensing Hyperspectral Data


Xiuping Jia
School of Electrical Engineering, University College
The University of New South Wales, Australian Defence Force Academy,
Campbell, Canberra, ACT 2600, Australia
Tel: 61 2 6268 8202, Fax: 61 2 6268 8443
Email: [email protected]

Hyperspectral remote sensing data, generated by imaging spectrometers, cover the full solar reflected portion of the spectrum with high spectral resolution. They provide rich information on ground cover types and make possible detailed study and monitoring of the Earth. However, data analysis and image classification become difficult due to the high dimensionality resulted by using hundreds of spectral bands.

Parametric classification techniques, such as the Gaussian maximum likelihood, have been widely used in multispectral remote sensing image analysis. However, with hyperspectral data, the performance of this classification approach largely relies on two factors. One is that the class data must be approximately unimodal. Secondly, a large enough number of training samples for each class is required in order to obtain reliable statistics estimation. The number of training pixels is usually limited in remote sensing image classification. When the number of training samples is inadequate, Gaussian maximum likelihood classification accuracy will not always increase with the number of features used; it starts to decrease when the ratio of the number of training samples to the dimensionality is low. This has been referred to as the curse of dimensionality. The k-NN decision rule can overcome these problems. However, the price to pay is that it requires a large memory size and a large computational load. In this paper, a cluster-space classification is developed, which can implement k-NN method efficiently and make it feasible for high-dimensional data classification.

The cluster-space classification consists of three steps. Firstly, a large number of clusters (spectral classes) are generated from all the training data of the classes of interest, using a clustering algorithm, such as ISODATA. Each cluster can be interpreted as a neighbourhood region. Secondly, the original training data are then labelled into the clusters generated based on Euclidean distance measure. For each class, a histogram is constructed which is the number of patterns assigned to a given cluster versus the cluster index. The normalized histogram forms the cluster-space representation of the class data, which describes quantitatively the class conditional probability distribution in the cluster-space. Using Bayes’ theorem, each cluster (neighbourhood region) can be associated to the information class automatically. This process is consistent with a majority vote. A look-up-table is then established. Finally, an unknown pattern (pixel) is assigned as the nearest cluster and the class label is obtained from the look-up-table.

The look-up-table is only needs to be calculated once for each image, and this would be done in the training phase. There is no need to calculate the distance to each training pattern for every unknown pattern during the labelling process. Therefore, the requirement for the memory size and the computational load are significantly reduced. This makes the k-NN decision rule feasible for the classification of high-dimensional data, such as, the hyperspectral remote sensing image data sets.

Using the cluster-space classification, several clusters are associated with a single information class; i.e., each class is modeled by a number of sub classes. This multi-signature structure improves class separability and generates a flexible and adaptable representation of individual class data without the need for adopting any mathematical models. It also provides a means for class data separability estimation, both visually and quantitatively, regardless of the number of spectral bands used.

Experiments have been carried out based on computer-generated data and imaging spectrometer data. The advantages of the cluster-space classification are demonstrated, showing satisfactory classification performance with significant reduction in costs.