Computational perception for multi-modal document understanding
Author(s)
Bylinskii, Zoya
DownloadFull printable version (64.43Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Fredo Durand and Aude Oliva.
Terms of use
Metadata
Show full item recordAbstract
Multimodal documents occur in a variety of forms, as graphs in technical reports, diagrams in textbooks, and graphic designs in bulletins. Humans can efficiently process the visual and textual information contained within to make decisions on topics including business, healthcare, and science. Building the computational tools to understand multimodal documents can have important applications for web search, information retrieval, captioning and summarization, and automated design. This thesis makes contributions on two fronts: (i) to the development of data collection methods for measuring how humans perceive multimodal documents (i.e., where they look, what they find important), and (ii) to the development of computer vision tools for automatically parsing and making predictions about multimodal documents (i.e., the subject matter they are about). Specifically, the crowdsourced attention data captured from our novel user interfaces is used to train neural network models to predict where people look in graphic designs and information visualizations, with demonstrated applications to thumbnailing, design retargeting, and interactive feedback within graphic design tools. Separately, our models for detecting visual elements and parsing text elements in infographics (information graphics) are used for topic prediction and to present a system for automatic summarization. This thesis makes contributions at the interface of human and computer vision, with applications to human-computer interfaces and design.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 171-192).
Date issued
2018Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.