Automated Audio-visual Activity Analysis

Stauffer, Chris

Author(s)

Stauffer, Chris

DownloadMIT-CSAIL-TR-2005-057.ps (31.37Mb)

Additional downloads

Metadata

Show full item record

Abstract

Current computer vision techniques can effectively monitor gross activities in sparse environments. Unfortunately, visual stimulus is often not sufficient for reliably discriminating between many types of activity. In many cases where the visual information required for a particular task is extremely subtle or non-existent, there is often audio stimulus that is extremely salient for a particular classification or anomaly detection task. Unfortunately unlike visual events, independent sounds are often very ambiguous and not sufficient to define useful events themselves. Without an effective method of learning causally-linked temporal sequences of sound events that are coupled to the visual events, these sound events are generally only useful for independent anomalous sounds detection, e.g., detecting a gunshot or breaking glass. This paper outlines a method for automatically detecting a set of audio events and visual events in a particular environment, for determining statistical anomalies, for automatically clustering these detected events into meaningful clusters, and for learning salient temporal relationships between the audio and visual events. This results in a compact description of the different types of compound audio-visual events in an environment.

Date issued

2005-09-20

URI

http://hdl.handle.net/1721.1/30568

Other identifiers

MIT-CSAIL-TR-2005-057

AIM-2005-026

Series/Report no.

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Keywords

AI, Unsupervised, activity analysis, scene modeling, tracking, event detection

Collections

CSAIL Technical Reports (July 1, 2003 - present)