Graduation Year
2015
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Department
Computer Engineering
Degree Granting Department
Computer Science and Engineering
Major Professor
Yi-Cheng Tu, Ph.D.
Committee Member
Sagar Pandit, Ph.D.
Committee Member
Yao Liu, Ph.D.
Committee Member
Michael Weng, Ph.D.
Committee Member
Wen-Xiu Ma, Ph.D.
Keywords
Molecular Simulations, Streaming, Push-Based, SDH, Big Data
Abstract
Thanks to the advancement of the modern computer simulation systems, many scientific applications generate, and require manipulation of large volumes of data. Scientific exploration substantially relies on effective and accurate data analysis. The shear size of the generated data, however, imposes big challenges in the process of analyzing the system. In this dissertation we propose novel techniques as well as using some known designs in a novel way in order to improve scientific data analysis.
We develop an efficient method to compute an analytical query called spatial distance histogram (SDH). Special heuristics are exploited to process SDH efficiently and accurately. We further develop a mathematical model to analyze the mechanism leading to errors. This gives rise to a new approximate algorithm with improved time/accuracy tradeoff.
Known MS analysis systems follow a pull-based design, where the executed queries mandate the data needed on their part. Such a design introduces redundant and high I/O traffic as well as cpu/data latency. To remedy such issues, we design and implement a push-based system, which uses a sequential scan-based I/O framework that pushes the loaded data to a number of pre-programmed queries.
The efficiency of the proposed system as well as the approximate SDH algorithms is backed by the results of extensive experiments on MS generated data.
Scholar Commons Citation
Grupchev, Vladimir, "Improvements on Scientific System Analysis" (2015). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/5851