KARPAGAM Journal of Computer Science (ISSN : 0973-2926)

Semi supervised ensemble clustering algorithm (GA based) for high dimensional genomic data
Authors : P.Krishnakumari & K.Vivekanandan

Clustering high-dimensional spaces are a difficult problem which is recurrent in many domains, for example in computational biology. Developing effective clustering methods for such domains are rare and also it is a challenging problem. This paper presents an efficient algorithm designed for high- dimensional gene data which combines the ideas of Linear Discriminant Analysis LDA based on PCA feature extraction along with K-Means algorithm to select the most discriminative subspace and it uses genetic algorithm for performing local optimization from the points that are globally optimal. The clustering process is thus integrated with the subspace selection process based on LDA and the data are then simultaneously clustered while the feature subspaces are selected .Then clustering instances are aggregated to generate final clusters based on agglomerative clustering. Also genetic algorithm is used to eliminate the problem of local optimality. Real datasets show that the proposed method outperforms existing methods for clustering high-dimensional genomic data in terms of accuracy and time.