dc.contributor.advisor | Jacob Andreas and Hendrik Strobelt. | en_US |
dc.contributor.author | Bensaid, Eden. | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2021-05-24T19:40:13Z | |
dc.date.available | 2021-05-24T19:40:13Z | |
dc.date.copyright | 2021 | en_US |
dc.date.issued | 2021 | en_US |
dc.identifier.uri | https://hdl.handle.net/1721.1/130680 | |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021 | en_US |
dc.description | Cataloged from the official PDF of thesis. | en_US |
dc.description | Includes bibliographical references (pages 41-45). | en_US |
dc.description.abstract | Storytelling is an open-ended task that entails creative thinking and requires a constant flow of ideas. Generative models have recently gained momentum thanks to their ability to identify complex data's inner structure and learn efficiently from unlabeled data [34]. Natural language generation (NLG) for storytelling is especially challenging because it requires the generated text to follow an overall theme while remaining creative and diverse to engage the reader [26]. Competitive story generation models still suffer from repetition [19], are unable to consistently condition on a theme [51] and struggle to produce a grounded, evolving storyboard [43]. Published story visualization architectures that generate images require a descriptive text to depict the scene to illustrate [30]. Therefore, it seems promising to evaluate an interactive multimodal generative platform that collaborates with writers to face the complex story-generation task. With co-creation, writers contribute their creative thinking, while generative models contribute to their constant workflow. In this work, we introduce a system and a web-based demo, FairyTailor¹, for machine-in-the-loop visual story co-creation. Users can create a cohesive children's story by weaving generated texts and retrieved images with their input. FairyTailor adds another modality and modifies the text generation process to produce a coherent and creative sequence of text and images. To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-creation of both texts and images. It allows users to give feedback on co-created stories and share their results. We release the demo source code² for other researchers' use. | en_US |
dc.description.statementofresponsibility | by Eden Bensaid. | en_US |
dc.format.extent | 45 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Multimodal generative models for storytelling | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.identifier.oclc | 1251773235 | en_US |
dc.description.collection | M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science | en_US |
dspace.imported | 2021-05-24T19:40:13Z | en_US |
mit.thesis.degree | Master | en_US |
mit.thesis.department | EECS | en_US |