Graduation Year
2022
Document Type
Thesis
Degree
M.S.C.S.
Degree Name
MS in Computer Science (M.S.C.S.)
Degree Granting Department
Computer Science and Engineering
Major Professor
Yi-Cheng Tu, Ph.D.
Committee Member
Paul A. Rosen, Ph.D.
Committee Member
Srinivas Katkoori, Ph.D.
Committee Member
Yu Zhang, Ph.D.
Committee Member
Feng Cheng, Ph.D.
Keywords
CUDA, Data Visualization, DBMS, GPU, Query Processing
Abstract
The semiconductor industry has mainly exploited two routes for designing microprocessors. The multi-core route aims to speed up the performance of latency-oriented processing. In contrast, the many-thread route concentrates on throughput-oriented improvement of parallel processing. Many-thread microprocessors, such as Graphics Processing Units (GPUs), are leading the computing capability for this past a decade. According to the current hardware market, at the similar price range, the ratio of peak computing power between multi-core CPUs and many-thread GPUs is up to 15X. This large performance gap on data processing has motivated many practitioners in database community to exploit computation-intensive parts on GPU for query execution. Group-By and aggregate operations are very often used together to summarize data, such that data scientists and domain experts could quickly gain analytical insights over possibly massive amounts of data. They are play fundamental and critical roles in data visualization community and contribute large part of the user experience in the interactive visualization analysis. In this research, we investigate the low-level computing features of GPUs, and we exhibit in-depth study of design, implementation, and optimization of Group-By/Aggregate algorithms on GPUs. We primarily focus on the design and implementation of hash-based Group-By/Aggregate algorithms. We then introduce an adaptive and dynamic Radix-hash algorithm, which is insensitive to input cardinality (number of distinct groups). On the other hand, we present a performance model which guides us to pick a set of bits to proceed radix hash for each pass, such that the overall group-by operation could achieve maximum throughput. We also reproduce the-start-of-art hash-base implementation on both modern CPUs and GPUs. Our experiments verify that our adaptive and dynamic algorithm chooses the optimal solution and deliver highest throughput.
Scholar Commons Citation
Mou, Chengcheng, "Computing Group-By and Aggregate in Massively Parallel Systems" (2022). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10331