| dc.contributor.advisor | Andrew Sliwinski. | en_US |
| dc.contributor.author | Abdalla, Lena(Lena A.) | en_US |
| dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
| dc.date.accessioned | 2021-02-19T20:26:08Z | |
| dc.date.available | 2021-02-19T20:26:08Z | |
| dc.date.copyright | 2020 | en_US |
| dc.date.issued | 2020 | en_US |
| dc.identifier.uri | https://hdl.handle.net/1721.1/129862 | |
| dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020 | en_US |
| dc.description | Cataloged from student-submitted PDF of thesis. | en_US |
| dc.description | Includes bibliographical references (pages 133-136). | en_US |
| dc.description.abstract | Scratch is a graphical programming platform that empowers children to create computer programs and realize their ideas. Although the Scratch online community is filled with a variety of diverse projects, many of these projects also share similarities. For example, they tend to fall into certain categories, including games, animations, stories, and more. Throughout this thesis, I describe the application of Natural Language Processing (NLP) techniques to vectorize and classify Scratch projects by type. This effort included constructing a labeled dataset of 873 Scratch projects and their corresponding types, to be used for training a supervised classifier model. This dataset was constructed through a collective process of consensus-based annotation by experts. To realize the goal of classifying Scratch projects by type, I first train an unsupervised model of meaningful vector representations for Scratch blocks based on the composition of 500,000 projects. Using the unsupervised model as a basis for representing Scratch blocks, I then train a supervised classifier model that categorizes Scratch projects by type into one of: "animation", "game", and "other". After an extensive hyperparameter tuning process, I am able to train a classifier model with an F1 Score of 0.737. I include in this paper an in-depth analysis of the unsupervised and supervised models, and explore the different elements that were learned during training. Overall, I demonstrate that NLP techniques can be used in the classification of computer programs to a reasonable level of accuracy. | en_US |
| dc.description.statementofresponsibility | by Lena Abdalla. | en_US |
| dc.format.extent | 136 pages | en_US |
| dc.language.iso | eng | en_US |
| dc.publisher | Massachusetts Institute of Technology | en_US |
| dc.rights | MIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided. | en_US |
| dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
| dc.subject | Electrical Engineering and Computer Science. | en_US |
| dc.title | Classification of computer programs in the Scratch online community | en_US |
| dc.type | Thesis | en_US |
| dc.description.degree | M. Eng. | en_US |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
| dc.identifier.oclc | 1237279491 | en_US |
| dc.description.collection | M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science | en_US |
| dspace.imported | 2021-02-19T20:25:38Z | en_US |
| mit.thesis.degree | Master | en_US |
| mit.thesis.department | EECS | en_US |