HIERARCHICAL CLUSTERING OF LUNG CANCER MICROARRAY DATASET USING RANDOM FOREST ALGORITHM BASED ON GO ANNOTATION

Authors

  • Dr.R.Porkodi1, N.Poomani2 Bharathiar University, Department of computer science, school of computer science and Engineering, Bharathiar University, Coimbatore-46.

Abstract

Data mining refers to collecting or mining knowledge from large amounts of data. It is used in various medical applications like tumor classification, protein structure prediction, gene selection, cancer classification based on microarray data, clustering of gene expression data, and statistical model of protein-protein interaction. The emerging research area in bioinformatics is gene enrichment analysis using GO Terms present in the genes given in any dataset. Thus, GO Terms are important feature which considered for almost all researches in this field. In this paper the random forest algorithm to compute score for each gene based on gene ontology Terms which is downloaded using BioMart package. The random forest algorithm analyze extracting most significant GO Terms, extracting top most genes based on GO Terms enrichment and extracting genes with GO Terms mapping. Further, the gene expression profiles in the dataset are clustered based on top ranked GO Terms. Key Words: Data Mining, Random Forest, Microarray, Gene Ontology Terms, Hierarchical Clustering

Downloads

Published

2015-12-30

How to Cite

N.Poomani2, D. (2015). HIERARCHICAL CLUSTERING OF LUNG CANCER MICROARRAY DATASET USING RANDOM FOREST ALGORITHM BASED ON GO ANNOTATION. International Journal of Engineering Technology and Computer Research, 3(6). Retrieved from https://www.ijetcr.org/index.php/ijetcr/article/view/276

Issue

Section

Articles