Graduation Year

2008

Document Type

Dissertation

Degree

Ph.D.

Degree Granting Department

Computer Science and Engineering

Major Professor

Sudeep Sarkar, Ph.D.

Committee Member

Arthur I. Karshmer, Ph.D.

Committee Member

Barbara Loeding, Ph.D.

Committee Member

Dmitry Goldgof, Ph.D.

Committee Member

Rangachar Kasturi, Ph.D.

Keywords

American Sign Language, gestures, human-human interaction, probabilistic distances, motion, learning common motion patterns

Abstract

While recognizing some kinds of human motion patterns requires detailed feature representation and tracking, many of them can be recognized using global features. The global configuration or structure of an object in a frame can be expressed as a probability density function constructed using relational attributes between low-level features, e.g. edge pixels that are extracted from the regions of interest. The probability density changes with motion, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping. Can these frame-wise probability functions, which usually have high dimensionality, be embedded into a low-dimensional space so that we can still estimate various meaningful probabilistic distances in the new space? Given these trajectory-based representations, can one learn models of signs in an unsupervised manner? We address these two fundamental questions in this dissertation.

Existing embedding approaches do not extend easily to preserve meaningful probabilistic distances between the samples. We present an embedding framework to preserve the probabilistic distances like Chernoff, Bhattacharya, Matusita, KL or symmetric-KL based on dot-products between points in this space. It results in computational savings. We experiment with the five different probabilistic distance measures and show the usefulness of the representation in three different contexts - sign recognition of 147 different signs (with large number of possible classes), gesture recognition with 7 different gestures performed by 7 different persons (with person variations) and classification of 8 different kinds of human-human interaction sequences (with segmentation problems).

Currently, researchers in continuous sign language recognition assume that the training signs are already available and often those are manually selected from continuous sentences. It consumes a lot of human time and is tedious. We present an approach for automatically learning signs from multiple sentences by using a probabilistic framework to extract the parts of signs that are present in most of its occurrences, and are robust to variations produced by adjacent signs. We show results by learning 10 signs and 10 spoken words from 136 sign language sentences and 136 spoken sequences respectively.

Share

COinS