dc.contributor.advisor | Kalyan Veeramachaneni. | en_US |
dc.contributor.author | Wu, Michael (Michael Q.) | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2016-01-04T20:52:55Z | |
dc.date.available | 2016-01-04T20:52:55Z | |
dc.date.copyright | 2015 | en_US |
dc.date.issued | 2015 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/100681 | |
dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. | en_US |
dc.description | Cataloged from PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (page 103). | en_US |
dc.description.abstract | It's now possible to take all of your favorite courses online. With growing popularity, Massive Open Online Courses (MOOCs) offer a learning opportunity to anyone with a computer - as well as an opportunity for researchers to investigate student learning through the accumulation of data about student-course interactions. Unfortunately, efforts to mine student data for information are currently limited by privacy concerns over how the data can be distributed. In this thesis, we present a generative model that learns from student data at the click-by-click level. When fully trained, this model is able to generate synthetic student data at the click-by-click level that can be released to the public. To develop a model at such granularity, we had to learn problem submission tendencies, characterize time spent viewing webpages and problem submission grades, and analyze how student activity transitions from week to week. We further developed a novel multi-level time-series model that goes beyond the classic Markov model and HMM methods used by most state-of-the art ML methods for weblogs, and showed that our model performs better than these methods. After training our model on a 6.002x course on edX, we generated synthetic data and found that a classifier that predicts student dropout is 93% as effective (by AUC) when trained on the simulated data as when trained on the real data. Lastly, we found that using features learned by our model improves dropout prediction performance by 9.5%. | en_US |
dc.description.statementofresponsibility | by Michael Wu. | en_US |
dc.format.extent | 103 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | The synthetic student : a machine learning model to simulate MOOC data | en_US |
dc.title.alternative | Machine learning model to simulate Massive Open Online Course data | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M. Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 932633198 | en_US |