Tuplex: Robust, Efficient Analytics When Python Rules
Author(s)
Spiegelberg, Leonhard F; Kraska, Tim
DownloadPublished version (444.3Kb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
© 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools which better integrate with the Python landscape and do not have the impedance mismatch like Spark. In this paper, we demonstrate Tuplex (short for tuples and exceptions), a Pythonnative data preparation framework that allows users to develop and deploy pipelines faster and more robustly while providing bare-metal execution times through code compilation whenever possible.
Date issued
2019Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of the VLDB Endowment
Publisher
VLDB Endowment