HIERARCHICAL CLUSTERING OF LUNG CANCER MICROARRAY DATASET USING RANDOM FOREST ALGORITHM BASED ON GO ANNOTATION
Abstract
Data mining refers to collecting or mining knowledge from large amounts of data. It is used in various medical applications like tumor classification, protein structure prediction, gene selection, cancer classification based on microarray data, clustering of gene expression data, and statistical model of protein-protein interaction. The emerging research area in bioinformatics is gene enrichment analysis using GO Terms present in the genes given in any dataset. Thus, GO Terms are important feature which considered for almost all researches in this field. In this paper the random forest algorithm to compute score for each gene based on gene ontology Terms which is downloaded using BioMart package. The random forest algorithm analyze extracting most significant GO Terms, extracting top most genes based on GO Terms enrichment and extracting genes with GO Terms mapping. Further, the gene expression profiles in the dataset are clustered based on top ranked GO Terms. Key Words: Data Mining, Random Forest, Microarray, Gene Ontology Terms, Hierarchical Clustering
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Engineering Technology and Computer Research (IJETCR) by Articles is licensed under a Creative Commons Attribution 4.0 International License.