Graduation Year

2021

Document Type

Dissertation

Degree

Ph.D.

Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yu Y. Sun, Ph.D.

Committee Member

Shaun Canavan, Ph.D.

Committee Member

Heather Culbertson, Ph.D.

Committee Member

John Licato, Ph.D.

Committee Member

Kyle Reed, Ph.D.

Keywords

Calorie Estimation, Cooking Video Understanding, Ingredient Recognition, Knowledge Representation, Video Understanding, Deep Learning

Abstract

In this dissertation, we discuss our work on analyzing cooking content for the ultimate goal ofautomatic robotic manipulation. For a robot to perform a cooking task, it will need to both have an understanding of the scene and utilize prior knowledge. We will explore two main sub-problems: knowledge extraction and inference, and visual understanding of the scene in this dissertation. Visual understanding of a scene, requires algorithms that can visually infer information from a single image or video. Many algorithms in the area of image classification, object detection, or activity recognition can be used in this area. Although great advances has been achieved by the emergence of deep learning, state-of-the art algorithms in this area have limitations. To attempt to overcome this lack of performance, we propose to use structured knowledge representations combined with state of the art deep learning techniques for visual understanding of cooking videos. Besides objects, and motions, we recognize that states of objects are also very important in interpreting the scene and therefore extensively explore the problem of states in visual cooking content. We introduce the state identification challenge in cooking applications and collect a dataset for research in the area of ingredient state analysis. We further look into the problem of simultaneous knowledge extraction from a single image and extracting information about ingredients, their states, the inter-connection between different objects in the scene and the motion-object interconnections. This problem requires an algorithm that can model the correlation of various concepts in a single image simultaneously. Using deep algorithms that can take as input multiple inputs and generate multiple outputs are fit for this problem. Therefore we propose to incorporate auto-regressive self-attention based mechanisms to extract knowledge from a single image. We show that the knowledge acquired from a single image can be used for calorie estimation. We suggest that total knowledge extraction from a single image can be used in future work for task graph inference.

Scholar Commons Citation

Babaeian Jelodar, Ahmad Babaeian, "Knowledge Extraction and Inference Based on Visual Understanding of Cooking Contents" (2021). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/9278

Download

Included in

Computer Sciences Commons

COinS

USF Tampa Graduate Theses and Dissertations

Knowledge Extraction and Inference Based on Visual Understanding of Cooking Contents

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Committee Member

Committee Member

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

USF Tampa Graduate Theses and Dissertations

Knowledge Extraction and Inference Based on Visual Understanding of Cooking Contents

Author

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Committee Member

Committee Member

Committee Member

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links