Graduation Year
2020
Document Type
Thesis
Degree
M.A.
Degree Name
Master of Arts (M.A.)
Degree Granting Department
Mathematics and Statistics
Major Professor
Natasha Jonoska, Ph.D.
Committee Member
Theodore Molla, Ph.D.
Committee Member
Masahiko Saito, Ph.D.
Keywords
Ciliates, Gene ontology, Hierarchical, K-means, Topology
Abstract
Clustering is a data analysis method which is used in a large variety of research fields. Many different algorithms exist for clustering, and none of them can be considered universally better than the others. Different methods of clustering are expounded upon, including hierarchical clustering and k-means clustering. Topological data analysis is also described, showing how topology can be used to infer structural information about the data set. We discuss how one finds the validity of clusters, as well as an optimal clustering method, and conclude with how we used various clustering methods to analyze transcriptome data from the ciliate Oxytricha trifallax. We discuss the structure of the data set, how an optimal clustering was chosen for this data set, how the validity of the clusters was confirmed, and how biological information can be extracted using gene ontology.
Scholar Commons Citation
Houfek, Kyle, "Clustering methods for gene expression data of Oxytricha trifallax" (2020). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/8227