Show simple item record

dc.contributor.advisorKalyan Veeramachaneni.en_US
dc.contributor.authorWu, Michael (Michael Q.)en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2016-01-04T20:52:55Z
dc.date.available2016-01-04T20:52:55Z
dc.date.copyright2015en_US
dc.date.issued2015en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/100681
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (page 103).en_US
dc.description.abstractIt's now possible to take all of your favorite courses online. With growing popularity, Massive Open Online Courses (MOOCs) offer a learning opportunity to anyone with a computer - as well as an opportunity for researchers to investigate student learning through the accumulation of data about student-course interactions. Unfortunately, efforts to mine student data for information are currently limited by privacy concerns over how the data can be distributed. In this thesis, we present a generative model that learns from student data at the click-by-click level. When fully trained, this model is able to generate synthetic student data at the click-by-click level that can be released to the public. To develop a model at such granularity, we had to learn problem submission tendencies, characterize time spent viewing webpages and problem submission grades, and analyze how student activity transitions from week to week. We further developed a novel multi-level time-series model that goes beyond the classic Markov model and HMM methods used by most state-of-the art ML methods for weblogs, and showed that our model performs better than these methods. After training our model on a 6.002x course on edX, we generated synthetic data and found that a classifier that predicts student dropout is 93% as effective (by AUC) when trained on the simulated data as when trained on the real data. Lastly, we found that using features learned by our model improves dropout prediction performance by 9.5%.en_US
dc.description.statementofresponsibilityby Michael Wu.en_US
dc.format.extent103 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleThe synthetic student : a machine learning model to simulate MOOC dataen_US
dc.title.alternativeMachine learning model to simulate Massive Open Online Course dataen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc932633198en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record