Graduation Year

2020

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yicheng Tu, Ph.D.

Committee Member

John Licato, Ph.D.

Committee Member

John Murray-Bruce, Ph.D.

Committee Member

Ankit Shah, Ph.D.

Committee Member

Feng Cheng, Ph.D.

Keywords

Data Mining, Graph Mining, Hypergraph

Abstract

In recent years, the popularity of graph datasets has grown rapidly. Frequent subgraph mining (FSM) from graphs becomes an important subject in computer science research. In this dissertation, we study single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. We study the development of support measures, which are basically functions that map a pattern to its frequency count in a database. Our work is based on the hypergraph framework using the concept of occurrence/instance hypergraphs. We present improved hardness and approximation theorems among the major support measures and a general form for minimum-image-based measures. For the purpose of guiding the development of new support measures, we present general sufficient conditions for designing new support measures in hypergraph framework, which can be applied to MNI and other support measures that are not included in the overlap graph framework. We utilize the sufficient conditions to generalize MNI and minimum instance measure (MI) for designing user-defined linear-time measures. From the sufficient conditions, we develop a new efficient polynomial-time support measure named maximum independent subedge set (MISS) measure which combines the advantages of existing measures. We also show that MISS can ll the gap between MIS and MI in computation complexity and support count. Last but not least, we present and review the experimental evaluations of the major support measures.

Share

COinS