Graduation Year


Document Type




Degree Name

Master of Arts (M.A.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Natasha Jonoska, Ph.D.

Committee Member

Theodore Molla, Ph.D.

Committee Member

Masahiko Saito, Ph.D.


Ciliates, Gene ontology, Hierarchical, K-means, Topology


Clustering is a data analysis method which is used in a large variety of research fields. Many different algorithms exist for clustering, and none of them can be considered universally better than the others. Different methods of clustering are expounded upon, including hierarchical clustering and k-means clustering. Topological data analysis is also described, showing how topology can be used to infer structural information about the data set. We discuss how one finds the validity of clusters, as well as an optimal clustering method, and conclude with how we used various clustering methods to analyze transcriptome data from the ciliate Oxytricha trifallax. We discuss the structure of the data set, how an optimal clustering was chosen for this data set, how the validity of the clusters was confirmed, and how biological information can be extracted using gene ontology.

