Show simple item record

dc.contributor.advisorCésar Hidalgo.en_US
dc.contributor.authorHu, Kevin Zeng.en_US
dc.contributor.otherProgram in Media Arts and Sciences (Massachusetts Institute of Technology)en_US
dc.date.accessioned2020-01-23T17:00:57Z
dc.date.available2020-01-23T17:00:57Z
dc.date.copyright2019en_US
dc.date.issued2019en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/123624
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2019en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 162-180).en_US
dc.description.abstractDemand for data visualization has exploded in recent years with the increasing availability and use of data across domains. Traditional visualization techniques require users to manually specify visual encodings of data through code or clicks. While manual specification is necessary to create bespoke visualizations, it renders visualization inaccessible to those without technical backgrounds. As a result, visualization recommender systems, which automatically generate results for users to search and select, have gained popularity. Here, I present systems, methods, and data repositories to contextualize and improve visualization recommender systems. The first contribution is DIVE, a publicly available and open source system that combines rule-based recommender systems with manual specification. DIVE integrates state-of-the-art data model inference, visualization, statistical analysis, and storytelling capabilities into a unified workflow.en_US
dc.description.abstractIn a controlled experiment, we show that DIVE significantly improves task performance among a group of 67 professional data scientists. Over 15K users have uploaded 7.5K datasets to DIVE since its release. In response to the limitations of rule-based recommender systems, VizML is a machine learning-based method for visualization recommendation. VizML uses neural networks trained on a large corpus of datasetvisualization pairs to predict visualization design choices, such as visualization type and axis encoding, with an accuracy of over 85%, exceeding that of base rates and baseline models. Benchmarking with a crowdsourced test set, we show that our model achieves human-level performance when predicting consensus visualization type. To support learned visualization systems, VizNet is a large-scale visualization learning and benchmarking repository consisting of over 31M real-world datasets.en_US
dc.description.abstractTo demonstrate VizNet's utility as a platform for conducting crowdsourced experiments with ecologically valid data, we replicate a prior perceptual effectiveness study, and demonstrate how a metric of visualization effectiveness can be learned from experimental results. Our results suggest a promising method for efficiently crowdsourcing the annotations necessary to train and evaluate machine learning-based visualization recommendation at scale. Enabled by the availability of real-world data, Sherlock is a deep learning approach to semantic type detection. We train Sherlock on 686K data columns retrieved from the VizNet corpus by matching 78 semantic types from DBpedia to column headers. We characterize each matched column with 1, 588 features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values.en_US
dc.description.abstractA multi-input neural network achieves a support-weighted F1 score of 0.89, exceeding that of a decision tree baseline, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations. I conclude by discussing three opportunities for future research. The first describes design considerations for mixed-initiative interactions in AI-infused visualization systems such as DIVE. The second reviews recent work on statistical validity of insights derived from visualization recommenders, which is an especially important consideration with learned systems such as VizML. Lastly, I assess the benefits of learning visualization design from non-experts then present experimental evidence towards measuring the gaps between expert and non-expert judgment.en_US
dc.description.statementofresponsibilityby Kevin Zeng Hu.en_US
dc.format.extent180 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectProgram in Media Arts and Sciencesen_US
dc.titleAutomating data visualization through recommendationen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentProgram in Media Arts and Sciences (Massachusetts Institute of Technology)en_US
dc.identifier.oclc1136131313en_US
dc.description.collectionPh.D. Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciencesen_US
dspace.imported2020-01-23T17:00:56Zen_US
mit.thesis.degreeDoctoralen_US
mit.thesis.departmentMediaen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record