Graduation Year

2023

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Srinivas Katkoori, Ph.D.

Committee Member

Hao Zheng, Ph.D.

Committee Member

Mehran Mozaffari Kermani, Ph.D.

Committee Member

Yasin Yilmaz, Ph.D.

Committee Member

Sandeep Miryala, Ph.D.

Keywords

ASIC, FPGA, Image Processing, Neural Network, Simulated Annealing

Abstract

The growing demand for fast and energy-efficient hardware for resource-constrained Internet of Things (IoT) edge devices has highlighted the limitations of conventional computing architectures. This research focuses on addressing the demand for fast, optimized, and energy-efficient machine learning inference engines as well as image processing in IoT edge applications. In this work, we address three challenging research problems and devise efficient solutions. Our investigation involved comprehensive exploration and analysis, leading to the proposal of effective approaches for overcoming these demanding issues. Through our work, we contribute novel solutions that offer improved efficiency and effectiveness in handling these research problems.

First, we propose an efficient hardware architecture for edge detection tasks in image processing applications. We propose a novel complementary metal-oxide semiconductor (CMOS) very large-scale integration (VLSI) bit-sliced near-memory computing architecture for rapid edge detection with Sobel, Prewitt, and Roberts cross algorithms. The proposed architecture is highly modular, enabling seamless scalability to accommodate images of any size. Moreover, one image can be processed in constant time, irrespective of the image size. The hardware implementations are optimized using techniques such as operator strength reduction, common term sharing, and bit manipulation. This research focuses on designing efficient custom hardware architectures for IoT edge devices, enabling rapid image processing while minimizing power consumption, maximizing performance, and optimizing hardware area.

Next, we propose an efficient and optimized hardware inference model for neural networks (NNs). We propose simulated annealing (SA) algorithm based optimization to generate hardware-optimized multilayer perceptron (MLP) inference models. In the SA loop, we aim to change hidden layer weights to integer values in two steps: (i) A random subset of the hidden weights is perturbed by an amount proportional to the SA temperature; (ii) the new weights that are proximate to integers are rounded, which will help reduce the hardware due to operation strength reduction. At high temperatures, negative (accuracy reducing) moves are accepted with high probability. We further apply the other hardware optimization techniques such as register resizing, weight clustering, and weight sharing to compress the MLP to generate a lightweight, energy-efficient, and compressed inference model. To validate our proposed approach, we conduct thorough experimental validation on field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware platforms. Our validation process involves a series of estimation and implementation experiments performed on both platforms. Our empirical findings demonstrate the effectiveness of our methodology in generating area efficient hardware architectures for MLP models while maintaining their accuracy.

Finally, we investigate the feasibility of deploying a hybrid machine learning (ML) model. We propose a hybrid ML model that combines principal component analysis (PCA), decision tree (DT), and support vector machine (SVM) classifiers. By utilizing hardware-friendly techniques such as dimensionality reduction, optimized hyperparameters, and the combination of accurate and interpretable classifiers, the proposed hybrid model enables intelligent decision-making at the edge while minimizing computational and energy costs. To assess the performance and resource utilization of the proposed hybrid model, extensive experimental evaluations are conducted. The results demonstrate improved performance compared to traditional approaches and provide valuable insights into the model's effectiveness for IoT edge applications.

Share

COinS