Show simple item record

dc.contributor.advisorPoggio, Tomaso
dc.contributor.authorNasimov, Umarbek
dc.date.accessioned2023-07-31T19:57:08Z
dc.date.available2023-07-31T19:57:08Z
dc.date.issued2023-06
dc.date.submitted2023-06-06T16:35:02.790Z
dc.identifier.urihttps://hdl.handle.net/1721.1/151660
dc.description.abstractThere is a recurring observation in deep learning that neural networks can be combined simply with arithmetic averages over their parameters. This observation has led to many new research directions in model ensembling, meta-learning, federated learning, and optimization. We investigate the evolution of this phenomenon during the training trajectory of neural network models initialized from a common set of parameters (parent). Surprisingly, the benefit of averaging the parameters persists over long child trajectories from parent parameters with minimal training. Furthermore, we find that the parent can be merged with a single child with significant improvement in both training and test loss. Through analysis of the loss landscape, we find that the loss becomes sufficiently convex early on in training, and, as a consequence, models obtained by averaging multiple children often outperform any individual child.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleHow early can we average Neural Networks?
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record