Graduation Year

2022

Document Type

Thesis

Degree

M.S.C.S.

Degree Name

MS in Computer Science (M.S.C.S.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Yu Sun, Ph.D.

Committee Member

Dmitry B. Goldgof, Ph.D.

Committee Member

Shaun Canavan, Ph.D.

Keywords

Contrastive Learning, Q-Learning, Temporal Cycle Consistency

Abstract

Robotic manipulation for cooking requires a thorough understanding of the cooking environment. The robot must understand the cooking objects and their states at each intermediate level as the process continues. To understand these states, we need frame-level annotations. To overcome this frame-level dependency, we introduce a self-supervised learning method to obtain the frame-level state representation with ”temporal video alignment” and ”contrastive learning.”In this work, we use self-supervised learning to train a model using multiple videos of the same action being performed in various settings. This model can extract frame-level embedding space and align videos via simple distance-based matching. We show that this learned embedding space can be used to perform state and progress estimation and anomaly detection. Finally, we demonstrate how these embeddings can be used to perform robotic mixing by capturing state progression from offline videos. We use Q-learning and UR5 robotic arm to perform mixing.

Share

COinS