Dissection of Deep Neural Networks

Bau, David

Author(s)

Bau, David

DownloadThesis PDF (41.93Mb)

Advisor

Torralba, Antonio

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

We investigate the role of neurons within the internal computations of deep neural networks for computer vision. We introduce network dissection, a method for quantifying the alignment between human-interpretable visual concepts and individual neurons in a deep network. We apply network dissection to examine and compare the internal computations of several networks trained to classify and represent images, and we ask how well human-understandable concepts align with neurons at different layers, in different architectures, with various training objectives; we also compare neurons to random linear combinations of neurons, and examine emergence of concepts as training proceeds. Then, we adapt network dissection to analyze generative adversarial networks. In GAN dissection, human-understandable neurons are identified by applying a semantic segmentation model to generated output. We find that small sets of neurons control the presence of specific objects within synthesized scenes. We also find that activating neurons reveals modeled rules and interactions between objects and their context. We then ask how to dissect and understand the omissions of a generative network. Omissions of human-understandable objects can be quantified by comparing semantic segmentation statistics between the training distribution and the generated distribution. Then we develop a method that can invert and reconstruct generated images in a progressive GAN, and show that this reconstruction can visualize specific cases in which the GAN omits identified object classes. Finally, we ask how rules within a generative model are represented. We hypothesize that the layers of a generative model serve as a memory that stores associations from representations of concepts at the input of a layer to patterns of concepts at the output to the layer, and we develop a method for rewriting the weights of a model by directly rewriting one memorized association. We show that our method can be used to rewrite several individual associative memories in a Progressive GAN or StyleGAN, altering learned rules that govern the appearance of specific object parts in the model.

Date issued

2021-09

URI

https://hdl.handle.net/1721.1/140048

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses