Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yicheng Tu, Ph.D.

Committee Member

Adriana Iamnitchi, Ph.D.

Committee Member

Yao Liu, Ph.D.

Committee Member

Hadi Charkhgrad, Ph.D.

Committee Member

Sagar Pandit, Ph.D.


CUDA, Database System, GPGPU, GPU, Parallel Computing


Various types of two-body statistics (2-BS) are regarded as essential components of low-level data analysis in scientific database systems. In relational algebraic terms, a 2-BS is essentially a Cartesian product between two datasets (or two instances of the same dataset) followed by a user-defined aggregate. The quadratic complexity of these computations hinders the timely processing of data. Thus using modern parallel hardware has become an obvious solution to meet such challenges. This dissertation presents our recent work in designing and optimizing parallel algorithms for 2-BS computation on Graphics Processing Units (GPUs). The unique architecture, however, provides abundant opportunities for optimizing the algorithm towards better performance and achieving high utilization of hardware resources. While a typical 2-BS problem can be summarized into a straightforward parallel computing pattern, traditional knowledge from (general) parallel computing often falls short in delivering the best possible performance. Therefore, we present a suite of techniques to decompose 2-BS problems and methods for the effective use of computing resources on GPUs. We also developed analytical models that guided us towards finding the best parameters of our GPU programs. As a result, we achieve the design of highly-optimized 2-BS algorithms that significantly outperform the best-known GPU and CPU implementations. Although 2-BS problems share the same core computations, each 2-BS problem, however, carries its own characteristics that calls for different strategies in code optimization. For that, we developed a software framework that automatically generates high-performance GPU code based on a few parameters and short primer code inputs. We further present two case studies to demonstrate that code generated by this framework reaches a very high level of efficiency. In addition to the general problem, we also studied a particular group of 2-BS problems, in which the computation can be reduced by using an index structure. Whereas the traditional knowledge of the index tree structure cannot utilize the full performance of GPUs, we present a technique to optimize the index searching for GPUs. The GPU index-searching is verified by applications of 2-BS, which show the very high performance.