Show simple item record

dc.contributor.authorElmore, Aaron J.
dc.contributor.authorParameswaran, Aditya
dc.contributor.authorDeshpande, Amol
dc.contributor.authorGoehring, David G.
dc.contributor.authorMaddox, Michael A
dc.contributor.authorMadden, Samuel R
dc.date.accessioned2017-12-01T21:17:17Z
dc.date.available2017-12-01T21:17:17Z
dc.date.issued2016-05
dc.identifier.issn2150-8097
dc.identifier.urihttp://hdl.handle.net/1721.1/112346
dc.description.abstractAs scientific endeavors and data analysis become increasingly collaborative, there is a need for data management systems that natively support the versioning or branching of datasets to enable concurrent analysis, cleaning, integration, manipulation, or curation of data across teams of individuals. Common practice for sharing and collaborating on datasets involves creating or storing multiple copies of the dataset, one for each stage of analysis, with no provenance information tracking the relationships between these datasets. This results not only in wasted storage, but also makes it challenging to track and integrate modifications made by different users to the same dataset. In this paper, we introduce the Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address these short-comings. We present our initial design for Decibel and provide a thorough evaluation of three versioned storage engine designs that focus on efficient query processing with minimal storage overhead. We also develop an exhaustive benchmark to enable the rigorous testing of these and future versioned storage engine designs.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (1513972)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (1513407)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (1513443)en_US
dc.description.sponsorshipIntel Science and Technology Center for Big Dataen_US
dc.language.isoen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionofhttp://dx.doi.org/10.14778/2947618.2947619en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT Web Domainen_US
dc.titleDecibel: the relational dataset branching systemen_US
dc.typeArticleen_US
dc.identifier.citationMaddox, Michael, David Goehring, Aaron J. Elmore, Samuel Madden, Aditya Parameswaran, and Amol Deshpande. “Decibel.” Proceedings of the VLDB Endowment 9, no. 9 (May 1, 2016): 624–635.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.contributor.mitauthorGoehring, David G.
dc.contributor.mitauthorMaddox, Michael A
dc.contributor.mitauthorMadden, Samuel R
dc.relation.journalProceedings of the VLDB Endowmenten_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsMaddox, Michael; Goehring, David; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya; Deshpande, Amolen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-4703-6281
dc.identifier.orcidhttps://orcid.org/0000-0002-5775-8571
dc.identifier.orcidhttps://orcid.org/0000-0002-7470-3265
mit.licenseOPEN_ACCESS_POLICYen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record