Graduation Year


Document Type




Degree Name

Master of Arts (M.A.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Masahiko Saito, Ph.D.

Committee Member

Natǎsa Jonoska, Ph.D.

Committee Member

Dmytro Savchuk, Ph.D.


Topological Data Analysis, Homology, Persistent Homology, Vietoris- Rips filtration, ciliates


A ciliate is a phylum of protozoa that has two types of nuclei, macronuclei and micronuclei. There may be more than one of each type of nucleus in the organism [1]. The macronucleus is the structure where protein synthesis and cell metabolism occur [1]. The micronucleus stores genetic information and is mobilized during a sexual reproduction process called conjugation [1]. The somatic macronucleus (MAC) is developed from the germ-line micronucleus (MIC) through genome rearrangement during a sexual reproduction process called conjugation [6, 8]. Segments of the MIC that form the MAC during conjugation are called macronuclear destined sequences (MDSs) [8]. During sequencing each MDS is given coordinates where the MDS sequences begin and end in the MIC. The orientation of a MDS in the MIC can be taken to be positive or negative. If the direction of the MDS in the MIC agrees with the direction in the MAC then the orientation is positive otherwise it is a negative orientation. In this thesis we analyze various aspects of the gene assembly during the rearrangment process of the ciliate Oxytricha trifallax that were recently sequenced [15]. Some of the properties analyzed include overlapping MDSs, orientation, MDSs starting and ending position in the MIC and the gaps of overlapping MDS pairs. A gap of an overlapping MDS pair is the order difference of two MDSs for a particular MAC contig that overlap in the MIC contig. We use 120 MAC contigs from [15] that have overlaps among their own MDSs. These 120 MAC contigs make up the data set we call D4.

We explore the patterns of overlapping MDSs in the MIC in D4. To quantify such patterns, we associate a vector V (An) to each MAC contig An, where V (An) = (v1(An), v2(An), v3(An)) is a vector in R3. The first entry is the number of overlapping MDS pairs divided by the number of MDSs. The second entry is the sum of gaps of overlapping MDS pairs divided by the sum of all possible gaps. The final entry is the total number of overlapping base pairs divided by the total length of the MAC contig. We computed the distance matrixM = (dij) where dij is the Euclidean distance between V (Ai) and V (Aj). The MAC contig vectors and M were computed using Python.

To analyze D4 we applied Topological Data Analysis (TDA). TDA uses topological constructs to assess shapes in data [3, 12]. From the data entries of the distance matrix M = (dij) we applied a Vietoris-Rips filtration to generate the barcodes of the persistent homology in dimensions 0, 1 and 2. The persistence barcode of 0-dimensional homology illustrates clusters of the data while the 1-dimensional homology represents non-trivial loops in the simplicial complex [3, 13]. The application of TDA on the ciliate Oxytricha trifallax identified ten MAC contig clusters at epsilon= 0.1 in D4 and several loops that were persistent for two or three epsilon values. Other TDA methods can be applied to the Vietoris-Rips filtration to further identify which MAC contigs appear in each cluster.

Included in

Mathematics Commons