Graduation Year

2016

Document Type

Thesis

Degree

M.S.C.S.

Degree Name

MS in Computer Science (M.S.C.S.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Lawrence O. Hall, Ph.D.

Committee Member

Dmitry Goldgof, Ph.D.

Committee Member

Rangachar Kasturi, Ph.D.

Keywords

Evidential Clustering, Single Pass Fuzzy C-Means

Abstract

Clustering large data sets has become very important as the amount of available unlabeled data increases. Single Pass Fuzzy C-Means (SPFCM) is useful when memory is too limited to load the whole data set. The main idea is to divide dataset into several chunks and to apply fuzzy c-means (FCM) to each chunk. SPFCM uses the weighted cluster centers of the previous chunk in the next data chunks. However, when the number of chunks is increased, the algorithm shows sensitivity to the order in which the data is processed. Hence, we improved SPFCM by recognizing boundary and noisy data in each chunk and using it to influence clustering in the next chunks. The proposed approach transfers the boundary and noisy data as well as the weighted cluster centers to the next chunks. We show that our proposed approach is significantly less sensitive to the order in which the data is loaded in each chunk.

Share

COinS