Feature engineering and evaluation in lightweight systems
Author(s)
Lu, Kelvin,M. Eng.Massachusetts Institute of Technology.
Download1129235741-MIT.pdf (2.073Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Kalyan Veeramachaneni.
Terms of use
Metadata
Show full item recordAbstract
This thesis presents Ballet, a lightweight, feature engineering framework that allows users to contribute to an open-source data science project. Specifically, it provides a framework for users to easily write flexible and high-quality features from a raw dataset. In addition, it provides a series of tests to ensure that all features in a project follow a consistent API and all provide some level of predictive power towards a target column. For the latter task, we modified and implemented GFSSF, a feature selection algorithm designed specifically for grouped, streaming features. This included building a performant entropy and mutual information estimator for datasets, as well as integrating this algorithm into Travis Cl, a popular continuous integration tool. We then evaluated our framework on a popular test dataset for data scientists, evaluating for performance and ease of development.
Description
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 69-70).
Date issued
2019Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.