Research scholar, Department of Computer Science and Engineering
Sri Jayachamarajendra College of Engineering
Manasagangothri, Mysore 570 006 Karnataka, India
Email:-[email protected], [email protected]
T. N. Nagabhushana
Asst. Prof., Department of Computer Science and Engineering
Sri Jayachamarajendra College of Engineering
Manasagangothri, Mysore 570 006,Karnataka, India.
Symbolic objects are extensions of classical data types. In conventional data sets, the objects are individualized, whereas in symbolic data sets they are more unified by means of relationships (2-4,16-17).
Gowda and Diday(2-3) have presented an agglomerative clustering algorithm clustering algorithm for symbolic objects. They form composite symbolic objects (CSO) using a Cartesian join operator whenever a mutual pairs of symbolic objects are selected for agglomeration based on minimum dissimilarity and maximum similarity.
The combined usage of similarity and dissimilarity measures for agglomerative and divisive clustering of symbolic objects is presented by Gowda and Ravi(5-6). To manage time and memory requirements, the clustering method involves the use of data reduction techniques(11-12). Here the data reduction as proposed by K.C.Gowda and S.K.Prakash(15) is modified to accommodate the task without dimensionality reduction.
We have also incorporated the gravitational clustering techniques to cluster multispectral images of satellites(7,10). We have also compared our results using cluster indices(4). Both Agglomerative and Disaggregative Gravitational Symbolic approaches are considered. The concept of mutual pairs is used to merge or split the samples.
Data Reduction Techniques
Data reduction techniques are used before clustering the data. The data reduction technique uses bin arrays to store useful information of the reduced image. Here the m samples of d-dimensions are mapped on to bin arrays. Two cases are:
1) When dimension d ³2 and
2) d = 1.
The results of the reduced data must be mapped on to the original data after clustering the reduced data. This requires a set of labels of all the m samples, which identifies the location of each sample in the nonempty bins. This is used for making the classification-output map after clustering the reduced data.
1 When dimension d ³ 2
This works by assigning d-dimensional m number of samples to any one of the 2-dimensional arrays. The d 2-dimensional R ´ R bin arrays (R is a positive integer as required) are used to store the useful information of all the features. The data before reduction requires normalization between 1 to R. The normalized features are assigned to the R´R arrays.
The idea that the first feature is used to determine the column position of the bin to which the sample is to be assigned. The combination of the remaining feature values is used to determine the row position of the bin. If the first sample have the feature values f11, f12, f13,…………,f1d, which are normalized between 1 to R. This sample is assigned to a bin having a column value of f11. As this is the first sample, the row value of its bin is also 1.
If the second sample have the feature values f21, f22, f23, …………,f2d. Say if f11 = f21, then the bin to which this sample is assigned also has a column value of f11 and its bin has a row value of 1 if the below condition is satisfied:
Where T is a user-defined limit. If the above condition is not satisfied, then the row value of the bin is 2. Accordingly for a given column position, a new bin with a higher row value is considered only when the present sample cannot be assigned to any other bin with lower row value.
The steps are as follows: If S1, S2, S3,…………Sd be the R´R 2-dimensional arrays to store updated information of each feature and W is another such R´R 2-dimensional such bin-weight array to update number of samples assigned to each corresponding bin.
- Normalize the d dimensional m samples between 1 to R.
- The first feature gives the column position of the bin.
- The row value of the particular bin is determined as above.
- As each sample is assigned to a bin the arrays S1, S2, S3,…………Sd is updated.
- Also the number of nonempty bins is updated in W bin-weight array.
2 When dimension d=1
This works by assigning 1-dimensional m number of samples to any one of the 1-dimensional arrays. The one 1-dimensional R bin arrays are used to store the useful information of all the features. The data before reduction requires normalization between 1 to R. The normalized features are assigned to the R arrays. The idea that the first feature is used to determine the column position of the bin to which the sample is to be assigned. If the first sample has the feature values f11, which is normalized between 1 to R. This sample is assigned to a bin having a column value of f11.
If the second sample have the feature values f21. Say if f11 = f21, then the bin to which this sample is assigned also has a column value of f11.
The steps are as follows:
Let S1 be the R 1-dimensional arrays to store updated information of the feature and W is another such R 1-dimensional bin-weight array to update number of samples assigned to each corresponding bin. Also the number of nonempty bins is updated in W bin-weight array.
Cluster Coglometrate and Global Coglomerate Strengths
The cluster coglomerate strength (CCS) of cluster Cj is :
Where mj cluster weight of Cj
Xm Composite object of Cj
Dj max Xi Î Cj D ( Xi , Xm ) or maximum dissimilarity.
s: a suitable positive number and
r: a suitable negative number depends
on the input sample set.
The global coglomerate strength (GCS) of cluster at any level is defined as the sum of all the CCS of all the prevalent clusters at that level. Thus the GCS when cluster Cj is formed, is defined as:
Symbolic Gravitational Clustering Algorithm
1 Agglomerative Method
2 Divisive Method
- Reduce the data by choosing an appropriate bin size and threshold. Let N be the initial number of samples in a sample set S. And let the number of clusters be equal to one, with N number of samples.
- Set a threshold value TCCSm.
- Using the similarity measure, find all the mutual pairs present in the symbolic data set.
- For all the Mutual Pairs present find the CCS ‘CCSm between samples. Check if CCSm>TCCSm, if so then split the Mutual pairs into two separate clusters. Increment the number of clusters by 2.
- Step 3 and 4 is repeated for all the mutual pairs present at that stage.
- Determine the Cluster Indicator value for each Pth merging as in equation 6.:
- Decrement TCCSm in steps.Iterate steps 3 to 6 until a stage is reached when no replacement of mutual pair occurs.
Image of Moon taken by Galileo space craft on December 9 1990, with catalog number PIA00113 is taken for testing. Size of image is 535 * 535 with 3 features. Both Agglomerative and Divisive Gravitational Symbolic methods are considered, results are as shown in figure 1 and 2 respectively . Plot of Cluster Index Vs Number of Clusters is also shown in figure 3.
Fig 1. Classification Map and details of Classification Cover of Moon (Agglomerative method)
Fig 2. Classification Map and details of Classification Cover of Moon (Divisive method)
Fig 3. Variation for CI for Moon Image ( Series 1 Agglomerative Method Series 2 Divisive metod.)
We have proposed a clustering methodology to classify multispectral images. The algorithm is based on the physical phenomenon in which a system of particles in space converge due to the gravitational attraction between the particles. To accomplish space and time complexity requirements data reduction techniques are used. We have proposed both Agglomerative and Divisive methods of Clustering. Cluster indices for both the methods are shown in figure 3. Analyzing the plot (figure 3) it is evident that Divisive technique requires less number of iterations when compared to Agglomerative technique. The process ends at a stage when there are no more mutual pairs available for merging or splitting.
- A.K. Jain and Dubes, “Algorithms for clustering data”, PHI, 1988
- K.C. Gowda and E.Diday, “Symbolic clustering using new dissimilarity measure”, pattern recognition, vol.24, no.6,pp 567-578, 1991.
- K.C. Gowda and E.Diday, “Symbolic clustering using a new similarity measure”, IEEE trans. Systems, man Cybernetics, vol.22, pp.368-378,1992.
- K.C. Gowda and E.Diday, “Unsupervised learning through symbolic clustering”, Pattern Recognition Letters”, vol. 12 pp 259-264., 1991.
- K.C.Gowda and T.V. Ravi , “Divisive Clustering of symbolic clustering using the concept of both similarity and dissimilarity”, Pattern recognition, Vol.28,No. 8, PP.1277-1282,1995.
- K.C.Gowda and T.V. Ravi, “Divisive Clustering of symbolic clustering using the concept of both similarity and dissimilarity”, Pattern recognition letter, No.16, 1995 pp. 647-652.
- K.C.Gowda and T.V. Ravi, “Clustering of Symbolic objects Using Gravitational approach”, Vol. 29, No. 6, 1999
- K.C.Gowda and G.Krishna, “Agglomerative clustering using the concept of mutual nearest neighborhood”, pattern recognition, vol. 10 pp 105-112, 1977.
- K.C.Gowda and G.Krishna, “Disaggregative clustering using the concept of mutual nearest neighborhood”, IEEE transaction on systems man and cybernetics vol. SMC-8, pp.888-895, 1978.
- K.C.Gowda and G.Krishna, “Gravitational Clustering using the concept of Mutual Nearest Neighborhood”, Joint meeting of the classification and psychometric society, Montreal, Canada, May 31-June 2, 1982.
- K.C.Gowda, “A feature reduction and unsupervised classification algorithm for multispectral data”, Pattern recognition, vol. 17, no. 6, pp 667-676, 1984.
- K.C. Gowda, ” A new method for mapping multidimensional data to lower dimensions”, proceedings of int. Symposium. On Remote Sensing of the Environment, 2nd Thematic conference, 1982
- H.N Srikanta Prakash, P. Nagabhushan and K. Chidananda Gowda, “Symbolic clustering of remotely sensed data using a new dissimilarity measure”, International conference on cognitive Systems-ICCS’97, New Delhi, 13-15 Dec. 1997.
- H.N Srikanta Prakash, P. Nagabhushan and K. Chidananda Gowda, “Symbolic Agglomerative clustering for qualitative analysis of remotely sensed data”, International conference on remote sensing and GIS/GPS, India, March 1997.
- H.N Srikanta Prakash, P. Nagabhushan and K. Chidananda Gowda”, Symbolic ISODATA clustering procedure useful for land cover classification”, IGRASS-97, Singapore, 3-8 august-1997.
- E.Diday, “The Symbolic approach in clustering, Classification and Related Methods of Data Analysis”, H.H. Bock, ed., Elesvier, North Holland, Amsterdam, 1988.
- E.Diday, “Data Analysis, Learning Symbolic and Numeric Knowledge”, Nova Sc. Publishers,1988.
- D.S. Vinod and T.N. Nagabushana, “Classification of Remotely sensed data using Self -organizing Feature map”, International conference on remote sensing and GIS/GPS, India, February 2001.
- D.S. Vinod and T.N. Nagabushana “Classification of Multispectral Images Using Symbolic Gravitational Approach”, National conference on recent trends in information Technology, Coimbatore, pp 29-34, 22-24 Aug., 2002.