Towards an automatic predictive question formulation

Schreck, Benjamin J

dc.contributor.advisor	Kalyan Veeramachaneni.	en_US
dc.contributor.author	Schreck, Benjamin J	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2016-12-22T15:16:38Z
dc.date.available	2016-12-22T15:16:38Z
dc.date.copyright	2016	en_US
dc.date.issued	2016	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/105963
dc.description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.	en_US
dc.description	This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.	en_US
dc.description	Cataloged from student-submitted PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 119-121).	en_US
dc.description.abstract	In this thesis, we designed a formal language, called Trane, for describing prediction problems over relational datasets, implemented a system that allows humans to specify problems in that language, and allows them to build models that solve them using real data. We show that this language is able to describe all 54 prediction problems on the Kaggle data science competition website[14] and so is comprehensive. The implemented system consists of a web application connected to a server-side interpreter, which translates input from the web application into a series of transformation and aggregation operations to apply to a dataset in order to generate labels that can be used to train a supervised machine learning classifier. Using a smaller subset of this language, we developed software that enumerated 1077 prediction problems automatically for the Walmart Store Sales Forecasting dataset found on Kaggle[16], and built models that attempted to solve them, for which we produced 235 AUC scores. The web application also allowed us to collect 157 ratings from humans on the meaningfulness of randomly-generated prediction problems. We used these ratings along with an enumeration of 6105 prediction problems and 7 datasets to train a collaborative-filtering based recommendation system to propose meaningful prediction problems on new, unseen datasets.	en_US
dc.description.statementofresponsibility	by Benjamin J. Schreck.	en_US
dc.format.extent	121 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Towards an automatic predictive question formulation	en_US
dc.type	Thesis	en_US
dc.description.degree	M. Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	965551096	en_US

Files in this item

Name:: 965551096-MIT.pdf
Size:: 3.615Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record