Graduation Year

2022

Document Type

Thesis

Degree

M.S.C.S.

Degree Name

MS in Computer Science (M.S.C.S.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yi-Cheng Tu, Ph.D.

Committee Member

Paul A. Rosen, Ph.D.

Committee Member

Srinivas Katkoori, Ph.D.

Committee Member

Yu Zhang, Ph.D.

Committee Member

Feng Cheng, Ph.D.

Keywords

CUDA, Data Visualization, DBMS, GPU, Query Processing

Abstract

The semiconductor industry has mainly exploited two routes for designing microprocessors. The multi-core route aims to speed up the performance of latency-oriented processing. In contrast, the many-thread route concentrates on throughput-oriented improvement of parallel processing. Many-thread microprocessors, such as Graphics Processing Units (GPUs), are leading the computing capability for this past a decade. According to the current hardware market, at the similar price range, the ratio of peak computing power between multi-core CPUs and many-thread GPUs is up to 15X. This large performance gap on data processing has motivated many practitioners in database community to exploit computation-intensive parts on GPU for query execution. Group-By and aggregate operations are very often used together to summarize data, such that data scientists and domain experts could quickly gain analytical insights over possibly massive amounts of data. They are play fundamental and critical roles in data visualization community and contribute large part of the user experience in the interactive visualization analysis. In this research, we investigate the low-level computing features of GPUs, and we exhibit in-depth study of design, implementation, and optimization of Group-By/Aggregate algorithms on GPUs. We primarily focus on the design and implementation of hash-based Group-By/Aggregate algorithms. We then introduce an adaptive and dynamic Radix-hash algorithm, which is insensitive to input cardinality (number of distinct groups). On the other hand, we present a performance model which guides us to pick a set of bits to proceed radix hash for each pass, such that the overall group-by operation could achieve maximum throughput. We also reproduce the-start-of-art hash-base implementation on both modern CPUs and GPUs. Our experiments verify that our adaptive and dynamic algorithm chooses the optimal solution and deliver highest throughput.

Share

COinS