Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/132276.2

Show simple item record

dc.contributor.authorTan, Junjay
dc.contributor.authorGhanem, Thanaa
dc.contributor.authorPerron, Matthew
dc.contributor.authorYu, Xiangyao
dc.contributor.authorStonebraker, Michael
dc.contributor.authorDeWitt, David
dc.contributor.authorSerafini, Marco
dc.contributor.authorAboulnaga, Ashraf
dc.contributor.authorKraska, Tim
dc.date.accessioned2021-09-20T18:21:37Z
dc.date.available2021-09-20T18:21:37Z
dc.identifier.urihttps://hdl.handle.net/1721.1/132276
dc.description.abstract© 2019 VLDB Endowment. As analytic (OLAP) applications move to the cloud, DBMSs have shifted from employing a pure shared-nothing design with locally attached storage to a hybrid design that combines the use of shared-storage (e.g., AWS S3) with the use of shared-nothing query execution mechanisms. This paper sheds light on the resulting tradeoffs, which have not been properly identified in previous work. To this end, it evaluates the TPC-H benchmark across a variety of DBMS offerings running in a cloud environment (AWS) on fast 10Gb+ networks, specifically database-as-a-service offerings (Redshift, Athena), query engines (Presto, Hive), and a traditional cloud agnostic OLAP database (Vertica). While these comparisons cannot be apples-to-apples in all cases due to cloud configuration restrictions, we nonetheless identify patterns and design choices that are advantageous. These include prioritizing low-cost object stores like S3 for data storage, using system agnostic yet still performant columnar formats like ORC that allow easy switching to other systems for different workloads, and making features that benefit subsequent runs like query precompilation and caching remote data to faster storage optional rather than required because they disadvantage ad hoc queries.en_US
dc.language.isoen
dc.publisherVLDB Endowmenten_US
dc.relation.isversionof10.14778/3352063.3352133en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourceVLDB Endowmenten_US
dc.titleChoosing a cloud DBMS: architectures and tradeoffsen_US
dc.typeArticleen_US
dc.relation.journalProceedings of the VLDB Endowmenten_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-01-11T15:10:36Z
dspace.orderedauthorsTan, J; Ghanem, T; Perron, M; Yu, X; Stonebraker, M; DeWitt, D; Serafini, M; Aboulnaga, A; Kraska, Ten_US
dspace.date.submission2021-01-11T15:10:40Z
mit.journal.volume12en_US
mit.journal.issue12en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version