Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yicheng Tu, Ph.D.

Committee Member

Mehran Mozaffari Kermani, Ph.D.

Committee Member

Yu Sun, Ph.D.

Committee Member

Andres Tejada-Martinez, Ph.D.

Committee Member

Sameer Varma, Ph.D.


Algorithm Design, Algorithm Optimization, Emerging Hardware, Parallel Computing, Query Performance


Relational join processing is one of the core functionalities in database management systems. Implementing join algorithms on parallel platforms, especially modern GPUs, has gain a lot of momentum in the past decade. This dissertation addresses the following issues on GPU join algorithms. First, we present empirical evaluations of a state-of-the-art work on GPU-based join processing. Since 2008, the compute capabilities of GPUs have increased following a pace faster than that of the multi-core CPUs. We run a comprehensive set of experiments to study how join operations can benefit from such rapid expansion of GPU capabilities. We also present improved GPU programs that take advantage of new GPU hardware/software features such as read-only data cache, large L2 cache, and shuffle instructions. Second, we report new design and implementation of join algorithms with high performance under today's GPGPU environment. In particular, we overhaul the popular radix hash join and redesign sort-merge join algorithms on GPUs by applying a series of novel techniques to utilize the hardware capacity of latest Nvidia GPU architecture and new features of the CUDA programming framework. Our algorithms take advantage of revised hardware arrangement, larger register file and shared memory, native atomic operation, dynamic parallelism, and CUDA Streams. Lastly, we explore how join processing would benefit from the adaptation of multiple GPUs. We identify the low rate and complex patterns of data transfer among the CPU and GPUs as the main challenges in designing efficient algorithms for large table joins, and we propose three distinctive designs of multi-GPU join algorithms, namely, the nested loop, global sort-merge and hybrid joins to overcome such challenges. Extensive experiments running on multiple databases and two different hardware configurations demonstrate high scalability of our algorithms over data size and significant performance boost brought by the use of multiple GPUs. Furthermore, our algorithms achieve much better performance as compared to existing join algorithms, with a speedup up to 27.5X and 2.9X over best known code developed for multi-core CPUs and GPUs respectively.