Graduation Year
2025
Document Type
Thesis
Degree
M.S.Cp.
Degree Name
MS in Computer Engineering (M.S.C.P.)
Degree Granting Department
Computer Science and Engineering
Major Professor
Sudeep Sarkar, Ph.D.
Committee Member
Shaun Canavan, Ph.D.
Committee Member
Mauricio Pamplona Segundo, Ph.D.
Keywords
Transformers, Online Machine Learning, Edge Computing, Event Understanding, Representation Learning
Abstract
Event segmentation is the practice of autonomously detecting the boundaries of semantically connected sequences of actions within a video. It is connected to many areas of computer vision, including action recognition and event understanding. Nearly all current work requires the entire video to be stored in memory to be processed in multiple passes. This takes up valuable resources, increases processing time, prevents the use of low-power, low-memory devices, and precludes long-form content and live videos from undergoing event segmentation.
This thesis introduces EDGE-STREAMER, a real-time, lightweight, self-supervised, transformer architecture capable of performing event segmentation in a single pass. EDGE-STREAMER uses techniques from reinforcement learning models to store representative information about the past to predict events, rather than the entire video. It takes advantage of weight sharing to reduce model size, further reducing its memory footprint and its training requirements. During training the model processes unlabeled video data in a single pass, producing event boundaries in real time.
On Epic-Kitchens EDGE-STREAMER achieves the same MoF performance as STREAMER within 2\%. On an NVIDIA Jetson Orin Nano, it sustains over 16 fps, showing that it is capable of real-time performance on an edge device.
The contributions of this thesis are 1. Four low-memory event segmentation architectures. 2. A thorough analysis comparing the accuracies of EDGE-STREAMER to the state-of-the-art. 3. Experimental data showing the viability of EDGE-STREAMER on edge hardware. These results open the door for embedded device-based cameras, such as those used for surveillance, autonomous drones, and wearables, and provide possibilities for future work, including multi-modal streaming and embedded-device continual learning.
Scholar Commons Citation
Miller, Lucas, "EDGE-STREAMER: A Lightweight, Self-Supervised Event Segmentation Model" (2025). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/10887
