MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Label-Efficient and Compute-Efficient Video Analytics

Author(s)
Bastani, Favyen
Thumbnail
DownloadThesis PDF (14.27Mb)
Advisor
Madden, Samuel
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
The ability to analyze large-scale video datasets is useful in an increasing range of applications. For example, a traffic planner may want to analyze traffic camera video to compare the frequency of hard braking at different junctions, while an ecology researcher may be interested in identifying instances of various behaviors between pairs of birds in video of a bird feeder. However, implementing machine learning (ML) pipelines for video analytics tasks remains challenging for two reasons. First, these tasks generally require applying expensive ML models to robustly detect and track objects such as cars and birds. These models are both label-intensive, often requiring thousands of labeled examples to achieve high-accuracy, and compute-intensive, executing at tens of frames per second even on datacenter GPUs. Second, in addition to applying ML models, these tasks often require several auxiliary operations to pre-process the input video and associated metadata, and to post-process model outputs to extract useful insights. For example, counting hard braking incidents necessitates post-processing object tracks of cars to identify sharp decelerations. In this thesis, we present SkyhookML, a platform for analytics tasks over large-scale video datasets. To reduce the cost of video analytics, we integrate approximate video query processing optimizations, efficient video pre-processing methods, and self-supervised learning techniques into SkyhookML. Approximate processing optimizations sacrifice a small amount of accuracy for large gains in throughput by avoiding applying the most accurate but also most expensive models on every video frame. Efficient pre-processing methods extract general-purpose insights from video that can be reused across several analytics tasks. Self-supervised learning techniques can substantially reduce the labeling effort needed to train robust models by deriving learning signals from unlabeled data. By employing novel approaches in each of these three categories that are specialized for analyzing object detections and tracks that appear in video data, SkyhookML addresses the label- and compute-intensiveness of video analytics and enables users to efficiently develop and deploy ML pipelines.
Date issued
2021-09
URI
https://hdl.handle.net/1721.1/140178
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.