Transformer Pruning Relation and General Neural Network Augmentation

Lim, Yong Hui

dc.contributor.advisor	Shavit, Nir
dc.contributor.author	Lim, Yong Hui
dc.date.accessioned	2022-01-14T15:19:02Z
dc.date.available	2022-01-14T15:19:02Z
dc.date.issued	2021-06
dc.date.submitted	2021-06-17T20:13:36.140Z
dc.identifier.uri	https://hdl.handle.net/1721.1/139547
dc.description.abstract	In this thesis, a method of initializing neural networks with weights transferred from smaller trained neural network weights was investigated. We name this process augmentation and present a few versions of it, some of which involve pruning. Firstly, the pruning relation of testing loss against density was found for the GPT-2 transformer network on a causal language modeling task. An interesting double plateau of testing loss was found whenever the attention weights were pruned. Next, augmentation on low dimensional datasets and shallow networks was investigated. We found that performing a step of zeroing final layer initializations (ZFLI) results in better augmentation. With this insight, we proceeded to investigate a variety of datasets and networks. Two forms of augmentation were investigated: basic augmentation and pruned augmentation. However, both forms of augmentation were found to not produce any consistent improvement in testing accuracy/loss.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Transformer Pruning Relation and General Neural Network Augmentation
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: Lim-yhlim-meng-eecs-2021-thesis.pdf
Size:: 6.035Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record