ParaKMeans: Implementation of a parallelized K-means algorithm suitable for general laboratory use.
MetadataShow full item record
AbstractBACKGROUND: During the last decade, the use of microarrays to assess the transcriptome of many biological systems has generated an enormous amount of data. A common technique used to organize and analyze microarray data is to perform cluster analysis. While many clustering algorithms have been developed, they all suffer a significant decrease in computational performance as the size of the dataset being analyzed becomes very large. For example, clustering 10000 genes from an experiment containing 200 microarrays can be quite time consuming and challenging on a desktop PC. One solution to the scalability problem of clustering algorithms is to distribute or parallelize the algorithm across multiple computers. RESULTS: The software described in this paper is a high performance multithreaded application that implements a parallelized version of the K-means Clustering algorithm. Most parallel processing applications are not accessible to the general public and require specialized software libraries (e.g. MPI) and specialized hardware configurations. The parallel nature of the application comes from the use of a web service to perform the distance calculations and cluster assignments. Here we show our parallel implementation provides significant performance gains over a wide range of datasets using as little as seven nodes. The software was written in C# and was designed in a modular fashion to provide both deployment flexibility as well as flexibility in the user interface. CONCLUSION: ParaKMeans was designed to provide the general scientific community with an easy and manageable client-server application that can be installed on a wide variety of Windows operating systems.
CitationBMC Bioinformatics. 2008 Apr 16; 9:200
- TimeClust: a clustering tool for gene expression time series.
- Authors: Magni P, Ferrazzi F, Sacchi L, Bellazzi R
- Issue date: 2008 Feb 1
- ParaSAM: a parallelized version of the significance analysis of microarrays algorithm.
- Authors: Sharma A, Zhao J, Podolsky R, McIndoe RA
- Issue date: 2010 Jun 1
- Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.
- Authors: Bhattacharya A, De RK
- Issue date: 2008 Jun 1
- An improved algorithm for clustering gene expression data.
- Authors: Bandyopadhyay S, Mukhopadhyay A, Maulik U
- Issue date: 2007 Nov 1
- Maximum significance clustering of oligonucleotide microarrays.
- Authors: de Ridder D, Staal FJ, van Dongen JJ, Reinders MJ
- Issue date: 2006 Feb 1