Label-Efficient and Compute-Efficient Video Analytics

Bastani, Favyen

Author(s)

Bastani, Favyen

DownloadThesis PDF (14.27Mb)

Advisor

Madden, Samuel

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The ability to analyze large-scale video datasets is useful in an increasing range of applications. For example, a traffic planner may want to analyze traffic camera video to compare the frequency of hard braking at different junctions, while an ecology researcher may be interested in identifying instances of various behaviors between pairs of birds in video of a bird feeder. However, implementing machine learning (ML) pipelines for video analytics tasks remains challenging for two reasons. First, these tasks generally require applying expensive ML models to robustly detect and track objects such as cars and birds. These models are both label-intensive, often requiring thousands of labeled examples to achieve high-accuracy, and compute-intensive, executing at tens of frames per second even on datacenter GPUs. Second, in addition to applying ML models, these tasks often require several auxiliary operations to pre-process the input video and associated metadata, and to post-process model outputs to extract useful insights. For example, counting hard braking incidents necessitates post-processing object tracks of cars to identify sharp decelerations. In this thesis, we present SkyhookML, a platform for analytics tasks over large-scale video datasets. To reduce the cost of video analytics, we integrate approximate video query processing optimizations, efficient video pre-processing methods, and self-supervised learning techniques into SkyhookML. Approximate processing optimizations sacrifice a small amount of accuracy for large gains in throughput by avoiding applying the most accurate but also most expensive models on every video frame. Efficient pre-processing methods extract general-purpose insights from video that can be reused across several analytics tasks. Self-supervised learning techniques can substantially reduce the labeling effort needed to train robust models by deriving learning signals from unlabeled data. By employing novel approaches in each of these three categories that are specialized for analyzing object detections and tracks that appear in video data, SkyhookML addresses the label- and compute-intensiveness of video analytics and enables users to efficiently develop and deploy ML pipelines.

Date issued

2021-09

URI

https://hdl.handle.net/1721.1/140178

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses