Graduation Year


Document Type




Degree Granting Department

Biomedical Engineering

Major Professor

Rangachar Kasturi, Ph.D.

Co-Major Professor

Dmitry Goldgof, Ph.D.

Committee Member

Lawrence Hall, Ph.D.

Committee Member

Steven Eschrich, Ph.D.


Microarray, Bioinformatics, Data mining, Feature selection, Classifiers


Cancer is a disease process that emerges out of a series of genetic mutations that cause seemingly uncontrolled multiplication of cells. The molecular genetics of cells indicates that different combinations of genetic events or alternative pathways in cells may lead to cancer. A study of the gene expressions of cancer cells, in combination with the external influential factors, can greatly aid in cancer management such as understanding the initiation and etiology of cancer, as well as detection, assessment and prediction of the progression of cancer.

Gene expression analysis of cells yields a very large number of features that can be used to describe the condition of the cell. Feature selection methods are explored to choose the best of these features that are most relevant to the problem at hand. Random subspace ensembles created using these selected features perform poorly in predicting the 36-month survival for colon cancer patients. A modification to the random subspace scheme is proposed to enhance the accuracy of prediction. The method first applies random subspace ensembles with decision trees to select predictive features. Then, support vector machines are used to analyze the selected gene expression profiles in cancer tissue to predict the survival outcome for a patient.

The proposed method is shown to achieve a weighted accuracy of 58.96%, with 40.54% sensitivity and 77.38% specificity in predicting 36-month survival for new and unknown colon cancer patients. The prediction accuracy of the method is comparable to the baseline classifiers and significantly better than random subspace ensembles on gene expression profiles of colon cancer.