Classification of computer programs in the Scratch online community
Author(s)
Abdalla, Lena(Lena A.)
Download1237279491-MIT.pdf (8.799Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Andrew Sliwinski.
Terms of use
Metadata
Show full item recordAbstract
Scratch is a graphical programming platform that empowers children to create computer programs and realize their ideas. Although the Scratch online community is filled with a variety of diverse projects, many of these projects also share similarities. For example, they tend to fall into certain categories, including games, animations, stories, and more. Throughout this thesis, I describe the application of Natural Language Processing (NLP) techniques to vectorize and classify Scratch projects by type. This effort included constructing a labeled dataset of 873 Scratch projects and their corresponding types, to be used for training a supervised classifier model. This dataset was constructed through a collective process of consensus-based annotation by experts. To realize the goal of classifying Scratch projects by type, I first train an unsupervised model of meaningful vector representations for Scratch blocks based on the composition of 500,000 projects. Using the unsupervised model as a basis for representing Scratch blocks, I then train a supervised classifier model that categorizes Scratch projects by type into one of: "animation", "game", and "other". After an extensive hyperparameter tuning process, I am able to train a classifier model with an F1 Score of 0.737. I include in this paper an in-depth analysis of the unsupervised and supervised models, and explore the different elements that were learned during training. Overall, I demonstrate that NLP techniques can be used in the classification of computer programs to a reasonable level of accuracy.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2020 Cataloged from student-submitted PDF of thesis. Includes bibliographical references (pages 133-136).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.