MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Understanding the Robustness of Vision Models and Humans to Occlusion-Based Corruptions

Author(s)
Lu, David
Thumbnail
DownloadThesis PDF (11.01Mb)
Advisor
Katz, Boris
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Humans are excellent object recognizers. Not only can they identify fully visible objects, but they can also recognize objects that are partially blocked from view (i.e., occluded). Moreover, vision models have made substantial progress in object recognition over the past decade. However, their proficiency in identifying occluded objects has not been thoroughly investigated. In this work, we analyze the robustness of models and humans to occlusions by building artificial occlusion transforms that mask out parts of images. We design occlusion transforms to model a diverse range of occlusion scenarios, varying two key factors: (1) the percentage of the image that is occluded, and (2) the granularity of the occlusion pattern, from large chunks to fine-grained pepper noise. We then evaluate the performance of humans and models on these occluded images. Our experiments yield several key findings. Intriguingly, pretrained models exhibit a U-shaped accuracy curve, with medium-granularity occlusions posing the greatest challenge. This pattern closely aligns with the one observed in our human experiments, which is particularly surprising, considering the substantial disparities between human visual systems and machine-based perception. Additionally, we explore whether performance losses caused by occlusions can be mitigated through two approaches: finetuning using occluded images and inpainting occluded pixels before classification. We discover that finetuning leads to a considerable increase in accuracy, but we suspect that finetuned models are relying on a different set of features. Inpainting helps significantly for mid- and high-frequency occlusions, but has the disadvantage of misleading both models and humans at low frequencies. Lastly, we introduce a new adversarial occlusion task, and propose two attack methods based on differential evolution and Grad-CAM. We find that occluding fewer than 10% of pixels is enough to fool vision classifiers. This demonstrates that adversarial attacks can be executed by eliminating image content rather than introducing perturbations. Complementing our analysis of a variety of state-of-the-art models, we offer our occlusion benchmark as a resource for researchers to evaluate the performance of future models intended for real-world deployment.
Date issued
2023-06
URI
https://hdl.handle.net/1721.1/151342
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.