Show simple item record

dc.contributor.advisorFredo Durand and Aude Oliva.en_US
dc.contributor.authorBylinskii, Zoyaen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2019-02-14T15:22:24Z
dc.date.available2019-02-14T15:22:24Z
dc.date.copyright2018en_US
dc.date.issued2018en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/120375
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 171-192).en_US
dc.description.abstractMultimodal documents occur in a variety of forms, as graphs in technical reports, diagrams in textbooks, and graphic designs in bulletins. Humans can efficiently process the visual and textual information contained within to make decisions on topics including business, healthcare, and science. Building the computational tools to understand multimodal documents can have important applications for web search, information retrieval, captioning and summarization, and automated design. This thesis makes contributions on two fronts: (i) to the development of data collection methods for measuring how humans perceive multimodal documents (i.e., where they look, what they find important), and (ii) to the development of computer vision tools for automatically parsing and making predictions about multimodal documents (i.e., the subject matter they are about). Specifically, the crowdsourced attention data captured from our novel user interfaces is used to train neural network models to predict where people look in graphic designs and information visualizations, with demonstrated applications to thumbnailing, design retargeting, and interactive feedback within graphic design tools. Separately, our models for detecting visual elements and parsing text elements in infographics (information graphics) are used for topic prediction and to present a system for automatic summarization. This thesis makes contributions at the interface of human and computer vision, with applications to human-computer interfaces and design.en_US
dc.description.statementofresponsibilityby Zoya Bylinskii.en_US
dc.format.extent192 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleComputational perception for multi-modal document understandingen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc1084273965en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record