Author(s)Veeramachaneni, Kalyan; Arnaldo, Ignacio; Derby, Owen; O’Reilly, Una-May
MetadataShow full item record
We describe FlexGP, the first Genetic Programming system to perform symbolic regression on large-scale datasets on the cloud via massive data-parallel ensemble learning. FlexGP provides a decentralized, fault tolerant parallelization framework that runs many copies of Multiple Regression Genetic Programming, a sophisticated symbolic regression algorithm, on the cloud. Each copy executes with a different sample of the data and different parameters. The framework can create a fused model or ensemble on demand as the individual GP learners are evolving. We demonstrate our framework by deploying 100 independent GP instances in a massive data-parallel manner to learn from a dataset composed of 515K exemplars and 90 features, and by generating a competitive fused model in less than 10 minutes.
DepartmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
Journal of Grid Computing
Veeramachaneni, Kalyan et al. “FlexGP: Cloud-Based Ensemble Learning with Genetic Programming for Large Regression Problems.” Journal of Grid Computing 13.3 (2015): 391–407.
Author's final manuscript