Notice
This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/132276.2
Choosing a cloud DBMS: architectures and tradeoffs
dc.contributor.author | Tan, Junjay | |
dc.contributor.author | Ghanem, Thanaa | |
dc.contributor.author | Perron, Matthew | |
dc.contributor.author | Yu, Xiangyao | |
dc.contributor.author | Stonebraker, Michael | |
dc.contributor.author | DeWitt, David | |
dc.contributor.author | Serafini, Marco | |
dc.contributor.author | Aboulnaga, Ashraf | |
dc.contributor.author | Kraska, Tim | |
dc.date.accessioned | 2021-09-20T18:21:37Z | |
dc.date.available | 2021-09-20T18:21:37Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/132276 | |
dc.description.abstract | © 2019 VLDB Endowment. As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries. | en_US |
dc.language.iso | en | |
dc.publisher | VLDB Endowment | en_US |
dc.relation.isversionof | 10.14778/3352063.3352133 | en_US |
dc.rights | Creative Commons Attribution-NonCommercial-NoDerivs License | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | en_US |
dc.source | VLDB Endowment | en_US |
dc.title | Choosing a cloud DBMS: architectures and tradeoffs | en_US |
dc.type | Article | en_US |
dc.relation.journal | Proceedings of the VLDB Endowment | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2021-01-11T15:10:36Z | |
dspace.orderedauthors | Tan, J; Ghanem, T; Perron, M; Yu, X; Stonebraker, M; DeWitt, D; Serafini, M; Aboulnaga, A; Kraska, T | en_US |
dspace.date.submission | 2021-01-11T15:10:40Z | |
mit.journal.volume | 12 | en_US |
mit.journal.issue | 12 | en_US |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Authority Work and Publication Information Needed |