Graduation Year

2020

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

Ismail Uysal, Ph.D.

Co-Major Professor

Yasin Yilmaz, Ph.D.

Committee Member

Kwang-Cheng Chen, Ph.D.

Committee Member

Alfredo Weitzenfeld, Ph.D.

Committee Member

Samuel Mercier, Ph.D.

Keywords

probabilistic graphical models, machine learning, multimodal data fusion, sequential attack detection, variational inference, recommender systems

Abstract

The commercial platforms that use recommender systems can collect relevant information to produce useful recommendations to the platform users. However, these sources usually contain missing values, imbalanced and heterogeneous data, and noisy observations. Such characteristics render the process of exploiting the information nontrivial, as one should carefully address them during the data fusion process. In addition to the degenerative characteristics, some entries can be fake, i.e., they can be the outcomes of malicious intents to manipulate the system. These entries should be eliminated before incorporation to any recommendation task. Detecting such malicious attacks quickly and accurately and then mitigating them is vital to enhance the trustworthiness and robustness of the system, which is another non-trivial process. Recent advances in probabilistic data fusion pave the way for addressing such issues. Such problems can be handled in a principled way by developing proper latent variable models that consider the different nature of the observed data types, and training the latent variable models with computationally efficient learning algorithms. This dissertation aims to develop such latent variable models to effectively fuse multiple heterogeneousinformation sources to improve the accuracy, robustness, and safety of the recommender systems. First, we propose a latent variable model that can fuse the categorical and numerical information sources containing missing values such as the rating matrix, user attributes, and item features. We demonstrate its superior performance over existing collaborative filtering algorithms that only use the rating matrix, and the more recent algorithms that incorporate the side information, i.e., the user attributes and item features, via different techniques. Subsequently, by exploiting the proposed latent variable model, we design a sequential attack detection framework. By extracting uni-variate statistics from the latent space of our model, and using in a sequential change detection algorithm, we obtain a framework that can use multiple diverse information sources to improve the attack detection performance. The experimental results demonstrate the enhancements over other baseline algorithms that only use the rating patterns of the profiles, in terms of both detection rate and sequential detection performance.

Share

COinS