Show simple item record

dc.contributor.advisorAndrew Sliwinski.en_US
dc.contributor.authorAbdalla, Lena(Lena A.)en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2021-02-19T20:26:08Z
dc.date.available2021-02-19T20:26:08Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/129862
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020en_US
dc.descriptionCataloged from student-submitted PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 133-136).en_US
dc.description.abstractScratch is a graphical programming platform that empowers children to create computer programs and realize their ideas. Although the Scratch online community is filled with a variety of diverse projects, many of these projects also share similarities. For example, they tend to fall into certain categories, including games, animations, stories, and more. Throughout this thesis, I describe the application of Natural Language Processing (NLP) techniques to vectorize and classify Scratch projects by type. This effort included constructing a labeled dataset of 873 Scratch projects and their corresponding types, to be used for training a supervised classifier model. This dataset was constructed through a collective process of consensus-based annotation by experts. To realize the goal of classifying Scratch projects by type, I first train an unsupervised model of meaningful vector representations for Scratch blocks based on the composition of 500,000 projects. Using the unsupervised model as a basis for representing Scratch blocks, I then train a supervised classifier model that categorizes Scratch projects by type into one of: "animation", "game", and "other". After an extensive hyperparameter tuning process, I am able to train a classifier model with an F1 Score of 0.737. I include in this paper an in-depth analysis of the unsupervised and supervised models, and explore the different elements that were learned during training. Overall, I demonstrate that NLP techniques can be used in the classification of computer programs to a reasonable level of accuracy.en_US
dc.description.statementofresponsibilityby Lena Abdalla.en_US
dc.format.extent136 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleClassification of computer programs in the Scratch online communityen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1237279491en_US
dc.description.collectionM.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2021-02-19T20:25:38Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record