Interpretable neural networks via alignment and dpstribution Propagation
Author(s)Malalur, Paresh(Paresh G.)
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
In this thesis, we aim to develop methodologies to better understand and improve the performance of Deep Neural Networks in various settings where data is limited or missing. Unlike data-rich tasks where neural networks have achieved human-level performance, other problems are naturally data limited where these models have fallen short of human level performance and where there is abundant room for improvement. We focus on three types of problems where data is limited - one-shot learning and open-set recognition in the one-shot setting, unsupervised learning, and classification with missing data. The first setting of limited data that we tackle is when there are only few examples per object type. During object classification, an attention mechanism can be used to highlight the area of the image that the model focuses on thus offering a narrow view into the mechanism of classification.We expand on this idea by forcing the method to explicitly align images to be classified to reference images representing the classes. The mechanism of alignment is learned and therefore does not require that the reference objects are anything like those being classified. Beyond explanation, our exemplar based cross-alignment method enables classification with only a single example per category (one-shot) or in the absence of any labels about new classes (open-set). While one-shot and open-set recognition operate in cases where complete data is available for few examples, unsupervised and missing data setting focus on cases where the labels are missing or where only partial input is available correspondingly. Variational Auto-encoders are a popular unsupervised learning model which learn how to map the input distribution into a simple latent distribution.We introduce a mechanism of approximate propagation of Gaussian densities through neural networks using the Hellinger distance metric to find the best approximation and demonstrate how to use this framework to improve the latent code efficiency of Variational Auto- Encoders. Expanding on this idea further, we introduce a novel method to learn the mapping between the input space and latent space which further improves the efficiency of the latent code by overcoming the variational bound. The final limited data setting we explore is when the input data is incomplete or very noisy. Neural Networks are inherently feed-forward and hence inference methods developed for probabilistic models can not be applied directly. We introduce two different methods to handle missing data. We first introduce a simple feed-forward model that redefines the linear operator as an ensemble to reweight the activations when portions of its receptive field are missing.We then use some of the insights gained to develop deep networks that propagate distributions of activations instead of point activations allowing us to use message passing methods to compensate for missing data while maintaining the feed-forward style approach when data is not missing.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 145-150).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.