Show simple item record

dc.contributor.advisorJacob Andreas and Hendrik Strobelt.en_US
dc.contributor.authorBensaid, Eden.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2021-05-24T19:40:13Z
dc.date.available2021-05-24T19:40:13Z
dc.date.copyright2021en_US
dc.date.issued2021en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/130680
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021en_US
dc.descriptionCataloged from the official PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 41-45).en_US
dc.description.abstractStorytelling is an open-ended task that entails creative thinking and requires a constant flow of ideas. Generative models have recently gained momentum thanks to their ability to identify complex data's inner structure and learn efficiently from unlabeled data [34]. Natural language generation (NLG) for storytelling is especially challenging because it requires the generated text to follow an overall theme while remaining creative and diverse to engage the reader [26]. Competitive story generation models still suffer from repetition [19], are unable to consistently condition on a theme [51] and struggle to produce a grounded, evolving storyboard [43]. Published story visualization architectures that generate images require a descriptive text to depict the scene to illustrate [30]. Therefore, it seems promising to evaluate an interactive multimodal generative platform that collaborates with writers to face the complex story-generation task. With co-creation, writers contribute their creative thinking, while generative models contribute to their constant workflow. In this work, we introduce a system and a web-based demo, FairyTailor¹, for machine-in-the-loop visual story co-creation. Users can create a cohesive children's story by weaving generated texts and retrieved images with their input. FairyTailor adds another modality and modifies the text generation process to produce a coherent and creative sequence of text and images. To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-creation of both texts and images. It allows users to give feedback on co-created stories and share their results. We release the demo source code² for other researchers' use.en_US
dc.description.statementofresponsibilityby Eden Bensaid.en_US
dc.format.extent45 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleMultimodal generative models for storytellingen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1251773235en_US
dc.description.collectionM.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2021-05-24T19:40:13Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record