Graduation Year
2010
Document Type
Thesis
Degree
M.S.Cp.E.
Degree Granting Department
Computer Science and Engineering
Major Professor
Sudeep Sarkar, Ph.D.
Committee Member
Rangachar Kasturi, Ph.D.
Committee Member
Dmitry Goldgof, Ph.D.
Keywords
Conversation change, Temporal scales, Turn pattern, Multimedia analysis, Taxonomy
Abstract
Automatic analysis of conversations is important for extracting high-level descriptions of meetings. In this work, as an alternative to linguistic approaches, we develop a novel, purely bottom-up representation, constructed from both audio and video signals that help us characterize and build a rich description of the content at multiple temporal scales. Nonverbal communication plays an important role in describing information about the communication and the nature of the conversation. We consider simple audio and video features to extract these changes in conversation. In order to detect these changes, we consider the evolution of the detected change, using the Bayesian Information Criterion (BIC) at multiple temporal scales to build an audio-visual change scale-space. Peaks detected in this representation yields group turn based conversational changes at different temporal scales. We use the NIST Meeting Room corpus to test our approach. Four clips of eight minutes are extracted from this corpus at random, and the other ten are extracted after 90 seconds of the start of the entire video in the corpus. A single microphone and a single camera are used from the dataset. The group turns detected in this test gave an overall detection result, when compared with different thresholds with fixed group turn scale range, of 82%, and a best result of 91% for a single video. Conversation overlaps, changes and their inferred models offer an intermediate-level description of meeting videos that are useful in summarization and indexing of meetings. Since the proposed solutions are computationally efficient, require no training and use little domain knowledge, they can be easily added as a feature to other multimedia analysis techniques.
Scholar Commons Citation
Krishnan, Ravikiran, "Detecting Group Turns of Speaker Groups in Meeting Room Conversations Using Audio-Video Change Scale-Space" (2010). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/3644