Robust Computer Vision beyond Lₚ Adversaries
Author(s)
Leclerc, Guillaume
DownloadThesis PDF (33.87Mb)
Advisor
Madry, Aleksander
Terms of use
Metadata
Show full item recordAbstract
Deep learning computer vision systems, integral to technologies such as self-driving cars, facial recognition, and content moderation, require robustness against diverse perturbations to ensure reliability and safety. Examples of such perturbations include variations in lighting conditions, occlusions, or changes in object orientations and backgrounds, all of which can significantly impact the performance of these systems. This thesis investigates the effects of Lₚ-perturbations on neural networks, focusing on models trained to be robust against these disturbances. We demonstrate that such models exhibit behaviors more akin to human vision, which is widely regarded as the benchmark for robust vision, compared to models trained using other methods. However, our measurements indicate that, from the perspective of Lₚ perturbations, these models perform on par with humans. In contrast, in practical vision tasks, humans still significantly outperform these models. This finding suggests that while Lₚ perturbations provide a useful measure of robustness, exploring more sophisticated perturbations may be necessary to achieve a level of robustness closer to that of human vision. In response, the thesis introduces a novel simulation-based framework, 3DB, designed for experimenting with arbitrary and semantically preserving perturbations on physically accurate 3D models directly instead of 2D images. The utility of this framework is demonstrated through its ability to diagnose and enhance the understanding of the behavior of off-the-shelf models. These insights suggest that the framework could be beneficial not only for debugging models but also for generating training data. However, the potential of this application is currently constrained by the lack of publicly available and open datasets of 3D objects. Furthermore, the absence of scalable methods for capturing such detailed representations limits the research community’s ability to collect these datasets. To address this limitation, a cost-effective 3D scanning process is proposed. This process facilitates the simultaneous capture of geometry and material properties, which are essential for simulating light behavior and crucial for modeling real-world perturbations within the 3DB framework and other rendering engines. The availability of large-scale, physically accurate datasets of 3D objects, when integrated with differentiable rendering engines, could enable adversarial training with application-specific perturbations. This approach has the potential to significantly narrow the gap between the robustness of neural networks and their biological counterparts.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology