Tuplex: robust, efficient analytics when Python rules

Spiegelberg, Leonhard F; Kraska, Tim

Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/132284.2

Author(s)

Spiegelberg, Leonhard F; Kraska, Tim

DownloadPublished version (444.3Kb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

© 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools which better integrate with the Python landscape and do not have the impedance mismatch like Spark. In this paper, we demonstrate Tuplex (short for tuples and exceptions), a Pythonnative data preparation framework that allows users to develop and deploy pipelines faster and more robustly while providing bare-metal execution times through code compilation whenever possible.

URI

https://hdl.handle.net/1721.1/132284

Journal

Proceedings of the VLDB Endowment

Publisher

VLDB Endowment

Collections

MIT Open Access Articles

Version	Item	Date	Summary
2	1721.1/132284.2	2021-12-17T16:20:33Z	Verified or entered authority metadata.
1	1721.1/132284*	2021-09-20T18:21:39Z

DSpace@MIT

Notice