Graduation Year

2022

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

J. Morris Chang, Ph.D.

Committee Member

Zhuo Lu, Ph.D.

Committee Member

Xinming Ou, Ph.D.

Committee Member

Lu Lu, Ph.D.

Committee Member

Jamal Haque, Ph.D.

Keywords

Database Security, Big Data, Data Privacy, Oblivious Access, On-device Deep Learning, Resource Efficiency

Abstract

Our search queries, online purchase transactions, the videos we watch, and our movie preferences are a few types of information collected and stored daily. This private data collection happens within our mobile devices and computers, on the streets, and in our homes, most of the time even without our consent. Nevertheless, advances in artificial intelligence (AI) in the big data era have increased the capability to capitalize on and benefit from the collection of this private data. Such private data is being used for various machine learning (ML) applications in different domains, such as marketing, insurance, financial services, mobility, social network, healthcare, and many more. Typically, most of these applications leverage the vast amount of data collected from each individual (data owner) to offer certain valuable services to users.

In terms of storing an enormous amount of data in the cloud, different approaches have been proposed by both academia and the industry over the past decades. Even though all these evolving technologies primarily focus on performance guarantees, it is still a major concern how these systems can ensure the security and privacy of the information they handle. On the other hand, most of the newer non-relational database systems have overlooked the security requirements of modern ML/big data applications. This research reviews security implementations in today's leading database models giving more emphasis on security and privacy attributes. A set of standard security mechanisms have been identified and evaluated based on different security classifications. Further, it provides a thorough evaluation and a comprehensive analysis of the maturity of security and privacy implementations in these database models and future directions/enhancements so that data owners can decide on the most appropriate datastore for their data-driven applications.

Toward addressing these challenges in database systems, various privacy and security-enhancing technologies for database systems have been proposed over the past years to achieve the confidentiality of data from curious insiders and malicious outsiders. Some recent work addressed this issue by introducing additional layers (software and/or hardware) to provide encryption mechanisms to protect data-at-rest. While these encrypted database solutions demonstrate the competence and the suitability toward integrating them for data-driven big data applications, it is noteworthy that, in general, the integration of security impacts the performance of the database. Therefore, it is quite challenging to eliminate the performance degradation associated with security enhancement in practice. Hence, it is crucial to evaluate the impact of security on performance to ensure whether they can achieve high performance with scalability when it comes to a large volume of data in cloud-based production environments. In light of that, this study investigated a practical system design and implementation to provide Security-as-a-Service for NoSQL databases (SEC-NoSQL) while supporting the execution of queries over encrypted data with a guaranteed level of system performance. Several different models of implementations are proposed, and their performance is evaluated using the Yahoo! Cloud Serving Benchmark (YCSB) benchmark, considering a large number of clients processing simultaneously. Experimental results show that the proposed design fits well on encrypted data while maintaining high performance and scalability. Moreover, a practical guide establishing Service Level Agreement (SLA) is also included to deploy the solution as a cloud-based service.

While encryption mechanisms have given the potential and much safer means to outsource private information to untrusted and distributed cloud platforms, it is well known that having data encryption alone is inadequate to guarantee the protection for data in outsourced privacy-critical database applications. It is evident that these techniques are usually exposed to different access pattern attacks. To this end, Oblivious Random Access Machine (ORAM) is a security primitive well known for mitigating such attacks; however, direct integration of ORAM into cloud-based database systems is a much more challenging task due to high-performance penalties and minimal query functionalities. This study further proposes a novel data processing framework for database systems in the cloud using distributed ORAM techniques and oblivious data structures, making database queries resilient to access pattern attacks. The framework was implemented on a practical database setup and evaluated the performance based on different industrial metrics. The experimental results demonstrate that the proposed distributed approach has significant benefits for cloud-based database systems compared to the direct integration of ORAM primitives at the database level.

Apart from the aforementioned, in a different context, this study also investigated the deployment of resource-efficient deep learning models on mobile platforms and proposes a novel framework to dynamically chose the best ML model to make more robust and efficient inference decisions during the run-time. Recent breakthrough technological progressions of powerful mobile computing resources such as low-cost mobile GPUs and cutting-edge, open-source software architectures have enabled high-performance deep learning on mobile platforms. These advancements have revolutionized the capabilities of today’s mobile applications in different dimensions to perform data-driven intelligence locally, particularly for smart health applications. However, on the other hand, energy resources in a mobile device are typically limited. Hence, whenever a complex Deep Neural Network (DNN) architecture is fed into the on-device deep learning framework, while it achieves high prediction accuracy (and performance), it also urges huge energy demands during the run-time. Therefore, managing these resources efficiently within the spectrum of performance and energy efficiency is the newest challenge for any mobile application featuring data-driven intelligence beyond experimental evaluations. In this study, a novel framework to dynamically chose the best ML model to make more robust and efficient inference decisions based on the available computing/energy resources in the mobile device is proposed, and the experimental evaluations demonstrate that the proposed approach has significant benefits in terms of energy consumption of the underlying application.

Share

COinS