Graduation Year


Document Type




Degree Name

MS in Computer Science (M.S.C.S.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yu Sun, Ph.D.

Committee Member

Dmitry B. Goldgof, Ph.D.

Committee Member

Sudeep Sarkar, Ph.D.


Deep learning, Embedding, Motion code


In the last years, modern action recognition frameworks with deep architectures have achieved impressive results on the large-scale activity datasets. All state-of-the-art models share one common attribute: two-stream architectures. One deep model takes RGB frames, while the other model is fed with pre-computed optical flow vectors. The outputs of both models are combined to be used as a final probability distribution for the action classes. When comparing the results of individual models with the fused model, it is common to see that that latter method is more superior. Researchers explain that phenomena with the fact that optical flow vectors serve as low-level motion features.

With the idea of representing motion features in a more explainable way, we develop a motion prediction framework that extracts high-level motion features from videos represented as the binary motion codes. We derive the motion codes from the motion taxonomy, a hierarchical structure that defines salient motion attributes. We also integrate the extracted motion features into the state-of-the-art action recognition model and achieve improved performance over the baseline model.

In addition to the motion representation, we develop a framework based on the cross-modal embedding concepts to learn an action recognition model that does not encode its labels with one-hot vectors. More specifically, we represent the narrated annotation words via embedded word vectors and learn to embed visual and text data into a shared vector space. The resulting model eliminates the shortcomings of one-hot vectors and achieves performance competitive with conventional baselines on the coarse-grained action classification task.