Compositional Models for Few Shot Sequence Learning

Akyurek, Ekin

dc.contributor.advisor	Andreas, Jacob
dc.contributor.author	Akyurek, Ekin
dc.date.accessioned	2022-01-14T15:15:19Z
dc.date.available	2022-01-14T15:15:19Z
dc.date.issued	2021-06
dc.date.submitted	2021-06-24T19:07:45.398Z
dc.identifier.uri	https://hdl.handle.net/1721.1/139493
dc.description.abstract	Flexible neural sequence models outperform grammar- and automaton-based counterparts on a variety of tasks. However, neural models perform poorly in settings requiring compositional generalization beyond the training data—particularly to rare or unseen subsequences. Past work has found symbolic scaffolding (e.g. grammars or automata) essential in these settings. We describe two simpler and more general modeling approaches that enable a large category of compositional generalizations without appeal to latent symbolic structure. The first is a data augmentation scheme called R&R, built from two components: recombination of original training examples via a prototype-based generative model and esampling of generated examples to encourage extrapolation. Training an ordinary neural sequence model on a dataset augmented with recombined and resampled examples significantly improves generalization in two language processing problems—instruction following SCAN and morphological analysis SIGMORPHON (2018)—where R&R enables learning of new constructions and tenses from as few as eight initial examples. The second is a lexical translation mechanism for neural sequence modeling. Previous work shows that many failures of systematic generalization arise from neural models' inability to disentangle lexical phenomena from syntactic ones. To address this, we augment neural decoders with a lexical translation mechanism that generalizes existing copy mechanisms to incorporate learned, decontextualized, token-level translation rules. We describe how to initialize this mechanism using a variety of lexicon learning algorithms, and show that it improves systematic generalization on a diverse set of sequence modeling tasks drawn from cognitive science, logical semantics, and machine translation.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Compositional Models for Few Shot Sequence Learning
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: Akyurek-akyurek-SM-EECS-2021-t ...
Size:: 1.285Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record