Graduation Year

2022

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Mathematics and Statistics

Major Professor

Nataša Jonoska, Ph.D.

Co-Major Professor

Masahico Saito, Ph.D.

Committee Member

Dmytro Savchuk, Ph.D.

Committee Member

Theodore Molla, Ph.D.

Committee Member

Margaret Park, Ph.D.

Keywords

Digraph homology, Gene rearrangement, DNA scrambling

Abstract

In this work, we introduce novel tools to study DNA recombination pathways and measure their complexity. Genome rearrangement in some ciliate species can be modeled by subword pattern deletions in double-occurrence words (DOWs), words where each symbol appears exactly twice. The iterated deletions can be represented by a graph whose vertices are DOWs connected by an edge if one word can be obtained from the other through a pattern deletion. On this graph, called the “word graph”, we build a complex comprised of cells defined by Cartesian products of simplicial digraphs where we define a boundary operator and compute homology groups. These topological invariants are used to analyze the word graphs of DOWs from data sets and to make comparisons between the genomes of three ciliate species.

In the general context of directed graphs, we identify generators for the homology groups of the prodsimplicial complex and use them to construct graphs such that their associated complexes have arbitrarily large Betti numbers. Focusing on the word graphs associated to DOWs, we describe the effect of various word operations on the topological features of the complex of the resulting graph. Classes of DOWs whose graphs are contractible, have nontrivial 1-cycles and 2-cycles are also characterized.

As a byproduct of genome rearrangement, circular DNA molecules have been experimentally observed. We study the possibility that the excision process is guided by repeat sequences, called cryptic pointers. Cryptic pointers are identified as the longest common substrings of DNA sequences near circularization cutting sites and their properties are studied. To validate our findings, we generate random DNA sequences and show that they differ from the detected cryptic pointers in a statistically significant way, suggesting these repeat sequences may be the underlying mechanism of circularization.

In the vicinity of cyclic molecules in the Oxytricha trifallax genome there may be segments of up to 58 contigs. In over 90% of cases, segments of up to two contigs appear in regions bounded by circle junctions. We define legal strings as symbol sequences representing arrangements of multiple coding DNA segments on a DNA strand to model this situation. These are enumerated, and correspondences between legal strings and lattice paths are used to define the curvature of a legal string as a novel measure of DNA scrambling. We also generalize the notion of separation, another measure of DNA scrambling, to the context of legal strings and prove that it is invariant under group actions of ℤ2 × ℤ2 and ℤ2 × ℤ2 × ℤ2 on legal strings representing equivalent segment arrangements.

Included in

Mathematics Commons

Share

COinS