Graduation Year
2008
Document Type
Dissertation
Degree
Ph.D.
Degree Granting Department
Computer Science and Engineering
Major Professor
Sudeep Sarkar, Ph.D.
Committee Member
Arthur I. Karshmer, Ph.D.
Committee Member
Barbara Loeding, Ph.D.
Committee Member
Dmitry Goldgof, Ph.D.
Committee Member
Rangachar Kasturi, Ph.D.
Keywords
American Sign Language, gestures, human-human interaction, probabilistic distances, motion, learning common motion patterns
Abstract
While recognizing some kinds of human motion patterns requires detailed feature representation and tracking, many of them can be recognized using global features. The global configuration or structure of an object in a frame can be expressed as a probability density function constructed using relational attributes between low-level features, e.g. edge pixels that are extracted from the regions of interest. The probability density changes with motion, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping. Can these frame-wise probability functions, which usually have high dimensionality, be embedded into a low-dimensional space so that we can still estimate various meaningful probabilistic distances in the new space? Given these trajectory-based representations, can one learn models of signs in an unsupervised manner? We address these two fundamental questions in this dissertation.
Existing embedding approaches do not extend easily to preserve meaningful probabilistic distances between the samples. We present an embedding framework to preserve the probabilistic distances like Chernoff, Bhattacharya, Matusita, KL or symmetric-KL based on dot-products between points in this space. It results in computational savings. We experiment with the five different probabilistic distance measures and show the usefulness of the representation in three different contexts - sign recognition of 147 different signs (with large number of possible classes), gesture recognition with 7 different gestures performed by 7 different persons (with person variations) and classification of 8 different kinds of human-human interaction sequences (with segmentation problems).
Currently, researchers in continuous sign language recognition assume that the training signs are already available and often those are manually selected from continuous sentences. It consumes a lot of human time and is tedious. We present an approach for automatically learning signs from multiple sentences by using a probabilistic framework to extract the parts of signs that are present in most of its occurrences, and are robust to variations produced by adjacent signs. We show results by learning 10 signs and 10 spoken words from 136 sign language sentences and 136 spoken sequences respectively.
Scholar Commons Citation
Nayak, Sunita, "Representation and Learning for Sign Language Recognition" (2008). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/425