Using two published gene expression data sets as test cases, the research team found that the KL clustering method, which uses a novel measure of similarity not previously used for gene expression analysis, was superior to the most popular method, hierarchical clustering, in separating the data into dense clusters with similar patterns.
In gene expression analysis, the identification of groups of genes with similar temporal patterns of expression is usually a critical step because it provides insights into gene-gene interactions and the underlying biological processes. Experiments suggest that genes with similar function may exhibit similar temporal patterns of co-regulation.
Dr. Raj Acharya, professor and head of the Department of Computer Science and Engineering at Penn State, says that, although the study was conducted with gene data, KL clustering could be applied to any large set of temporal data.
The team published their findings in a paper, "An information theoretic approach for analyzing temporal patterns of gene expression," in the March issue of the journal, Bioinformatics. The authors are Jyotsna Kasturi, Penn State doctoral candidate, Acharya, and Dr. Murali Ramanathan, Department of Pharmaceutical Sciences, University at Buffalo, The State University of New York.
Kasturi explains, "We wanted gene expression data with similar patterns to be put in the same cluster with as little variation as possible, which implies dense clusters."
The team also used the Davies-Bouldin cluster validity index as a primary measure of quality as well as a statistical measure using the chi-square test to assess similarity between the clusters obtain by the different methods.