Graduation Year
2022
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Mathematics and Statistics
Major Professor
Nataša Jonoska, Ph.D.
Co-Major Professor
Masahico Saito, Ph.D.
Committee Member
Dmytro Savchuk, Ph.D.
Committee Member
Theodore Molla, Ph.D.
Committee Member
Margaret Park, Ph.D.
Keywords
Digraph homology, Gene rearrangement, DNA scrambling
Abstract
In this work, we introduce novel tools to study DNA recombination pathways and measure their complexity. Genome rearrangement in some ciliate species can be modeled by subword pattern deletions in double-occurrence words (DOWs), words where each symbol appears exactly twice. The iterated deletions can be represented by a graph whose vertices are DOWs connected by an edge if one word can be obtained from the other through a pattern deletion. On this graph, called the “word graph”, we build a complex comprised of cells defined by Cartesian products of simplicial digraphs where we define a boundary operator and compute homology groups. These topological invariants are used to analyze the word graphs of DOWs from data sets and to make comparisons between the genomes of three ciliate species.
In the general context of directed graphs, we identify generators for the homology groups of the prodsimplicial complex and use them to construct graphs such that their associated complexes have arbitrarily large Betti numbers. Focusing on the word graphs associated to DOWs, we describe the effect of various word operations on the topological features of the complex of the resulting graph. Classes of DOWs whose graphs are contractible, have nontrivial 1-cycles and 2-cycles are also characterized.
As a byproduct of genome rearrangement, circular DNA molecules have been experimentally observed. We study the possibility that the excision process is guided by repeat sequences, called cryptic pointers. Cryptic pointers are identified as the longest common substrings of DNA sequences near circularization cutting sites and their properties are studied. To validate our findings, we generate random DNA sequences and show that they differ from the detected cryptic pointers in a statistically significant way, suggesting these repeat sequences may be the underlying mechanism of circularization.
In the vicinity of cyclic molecules in the Oxytricha trifallax genome there may be segments of up to 58 contigs. In over 90% of cases, segments of up to two contigs appear in regions bounded by circle junctions. We define legal strings as symbol sequences representing arrangements of multiple coding DNA segments on a DNA strand to model this situation. These are enumerated, and correspondences between legal strings and lattice paths are used to define the curvature of a legal string as a novel measure of DNA scrambling. We also generalize the notion of separation, another measure of DNA scrambling, to the context of legal strings and prove that it is invariant under group actions of ℤ2 × ℤ2 and ℤ2 × ℤ2 × ℤ2 on legal strings representing equivalent segment arrangements.
Scholar Commons Citation
Fajardo Gómez, Lina, "Methods in Discrete Mathematics to Study DNA Rearrangement Processes" (2022). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10290