Show simple item record

dc.contributor.authorSeering, Adam
dc.contributor.authorCudre-Mauroux, Philippe
dc.contributor.authorStonebraker, Michael
dc.contributor.authorMadden, Samuel R.
dc.date.accessioned2014-09-26T12:50:15Z
dc.date.available2014-09-26T12:50:15Z
dc.date.issued2012-04
dc.identifier.isbn978-0-7695-4747-3
dc.identifier.isbn978-1-4673-0042-1
dc.identifier.issn1063-6382
dc.identifier.urihttp://hdl.handle.net/1721.1/90380
dc.description.abstractIn this paper, we describe a versioned database storage manager we are developing for the SciDB scientific database. The system is designed to efficiently store and retrieve array-oriented data, exposing a "no-overwrite" storage model in which each update creates a new "version" of an array. This makes it possible to perform comparisons of versions produced at different times or by different algorithms, and to create complex chains and trees of versions. We present algorithms to efficiently encode these versions, minimizing storage space while still providing efficient access to the data. Additionally, we present an optimal algorithm that, given a long sequence of versions, determines which versions to encode in terms of each other (using delta compression) to minimize total storage space or query execution cost. We compare the performance of these algorithms on real world data sets from the National Oceanic and Atmospheric Administration (NOAA), Open Street Maps, and several other sources. We show that our algorithms provide better performance than existing version control systems not optimized for array data, both in terms of storage size and access time, and that our delta-compression algorithms are able to substantially reduce the total storage space when versions exist with a high degree of similarity.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant IIS/III-1111371)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (Grant SI2-1047955)en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/ICDE.2012.102en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleEfficient Versioning for Scientific Array Databasesen_US
dc.typeArticleen_US
dc.identifier.citationSeering, Adam, Philippe Cudre-Mauroux, Samuel Madden, and Michael Stonebraker. “Efficient Versioning for Scientific Array Databases.” 2012 IEEE 28th International Conference on Data Engineering (April 2012).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorSeering, Adamen_US
dc.contributor.mitauthorCudre-Mauroux, Philippeen_US
dc.contributor.mitauthorMadden, Samuel R.en_US
dc.contributor.mitauthorStonebraker, Michaelen_US
dc.relation.journalProceedings of the 2012 IEEE 28th International Conference on Data Engineeringen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsSeering, Adam; Cudre-Mauroux, Philippe; Madden, Samuel; Stonebraker, Michaelen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-7470-3265
dc.identifier.orcidhttps://orcid.org/0000-0001-9184-9058
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record