Graduation Year
2017
Document Type
Dissertation
Degree
Ph.D.
Degree Name
Doctor of Philosophy (Ph.D.)
Degree Granting Department
Computer Science and Engineering
Major Professor
Sudeep Sarkar, Ph.D.
Co-Major Professor
Anuj Srivastava, Ph.D.
Committee Member
Dmitry Goldgof, Ph.D.
Committee Member
Rangachar Kasturi, Ph.D.
Committee Member
Rajiv Dubey, Ph.D.
Committee Member
Andrew Raij, Ph.D.
Keywords
Pattern Theory, Video Analysis, Activity Recognition, Graphical Models, Compositional Approach
Abstract
Description of human activities in videos results not only in detection of actions and objects but also in identification of their active semantic relationships in the scene. Towards this broader goal, we present a combinatorial approach that assumes availability of algorithms for detecting and labeling objects and actions, albeit with some errors. Given these uncertain labels and detected objects, we link them into interpretative structures using domain knowledge encoded with concepts of Grenander’s general pattern theory. Here a semantic video description is built using basic units, termed generators, that represent labels of objects or actions. These generators have multiple out-bonds, each associated with either a type of domain semantics, spatial constraints, temporal constraints or image/video evidence. Generators combine between each other, according to a set of pre-defined combination rules that capture domain semantics, to form larger structures known as configurations, which here will be used to represent video descriptions. Such connected structures of generators are called configurations. This framework offers a powerful representational scheme for its flexibility in spanning a space of interpretative structures (configurations) of varying sizes and structural complexity. We impose a probability distribution on the configuration space, with inferences generated using a Markov Chain Monte Carlo-based simulated annealing algorithm. The primary advantage of the approach is that it handles known computer vision challenges – appearance variability, errors in object label annotation, object clutter, simultaneous events, temporal dependency encoding, etc. – without the need for a exponentially- large (labeled) training data set.
Scholar Commons Citation
Dias Moreira De Souza, Fillipe, "Semantic Description of Activities in Videos" (2017). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/6649