Towards Automated Design of Machine Perception Systems
Author(s)
Klinghoffer, Tzofi
DownloadThesis PDF (16.02Mb)
Advisor
Raskar, Ramesh
Terms of use
Metadata
Show full item recordAbstract
Animal's visual perception systems have evolved to their environment over billions of years, enabling them to navigate, avoid predators, and hunt prey. In contrast, machine perception systems designed by humans require significant engineering and often use standard cameras that may not be well suited to their task or environment. Consider building a robot to pick up trash. The choice of robot sensors impacts which type of trash it can detect, e.g. perhaps an infrared sensor is needed to detect plastic bottles. In addition, animals are able to understand their environment from different viewpoints and under variable lighting, while machine perception systems often fail to generalize beyond the distribution of training data. Inspired by the evolution of animal's visual perception systems, this thesis explores two distinct but related problems: (1) automated design of machine perception systems, and (2) robustness of machine perception systems to physical phenomena, such as lighting and camera viewpoint. Machine perception systems -- also referred to as imaging systems in this thesis -- consist of cameras and perception models. Cameras are used to sense the environment and capture observations, while perception models are used to analyze captured observations. Cameras contain (1) illumination sources, (2) optical elements, and (3) sensors, while perception models use (4) algorithms. Directly searching over all combinations of these four building blocks to design a machine perception system is challenging due to the size of the search space. In Part I of this thesis, we introduce DISeR: Designing Imaging Systems with Reinforcement Learning, a method that allows task-specific imaging systems to be created and optimized in simulation. In Part II of this thesis, we study the robustness of machine perception systems to physical phenomena. We introduce two methods to mitigate the susceptibility of deep learning models to failure when exposed to out of distribution lighting and camera viewpoints. The first method uses disentanglement of features to improve robustness, while the second method modifies pixels to improve robustness. We evaluate our work using standard benchmarks and peer-reviewed publication.
Date issued
2023-06Department
Program in Media Arts and Sciences (Massachusetts Institute of Technology)Publisher
Massachusetts Institute of Technology