Show simple item record

dc.contributor.authorLiu, Chunwei
dc.contributor.authorPavlenko, Anna
dc.contributor.authorInterlandi, Matteo
dc.contributor.authorHaynes, Brandon
dc.date.accessioned2025-08-12T18:17:21Z
dc.date.available2025-08-12T18:17:21Z
dc.date.issued2025-03-19
dc.identifier.urihttps://hdl.handle.net/1721.1/162354
dc.description.abstractThis paper evaluates the suitability of Apache Arrow, Parquet, and ORC as formats for subsumption in an analytical DBMS. We systematically identify and explore the high-level features that are important to support efficient querying in modern OLAP DBMSs and evaluate the ability of each format to support these features. We find that each format has trade-offs that make it more or less suitable for use as a format in a DBMS and identify opportunities to more holistically co-design a unified in-memory and on-disk data representation. Notably, for certain popular machine learning tasks, none of these formats perform optimally, highlighting significant opportunities for advancing format design. Our hope is that this study can be used as a guide for system developers designing and using these formats, as well as provide the community with directions to pursue for improving these common open formats.en_US
dc.publisherSpringer Berlin Heidelbergen_US
dc.relation.isversionofhttps://doi.org/10.1007/s00778-025-00911-1en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceSpringer Berlin Heidelbergen_US
dc.titleData formats in analytical DBMSs: performance trade-offs and future directionsen_US
dc.typeArticleen_US
dc.identifier.citationLiu, C., Pavlenko, A., Interlandi, M. et al. Data formats in analytical DBMSs: performance trade-offs and future directions. The VLDB Journal 34, 30 (2025).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.relation.journalThe VLDB Journalen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2025-07-18T15:30:28Z
dc.language.rfc3066en
dc.rights.holderThe Author(s)
dspace.embargo.termsN
dspace.date.submission2025-07-18T15:30:28Z
mit.journal.volume34en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record