Show simple item record

dc.contributor.authorWu, Eugene
dc.contributor.authorMadden, Samuel R.
dc.contributor.authorStonebraker, Michael
dc.date.accessioned2014-10-09T19:06:46Z
dc.date.available2014-10-09T19:06:46Z
dc.date.issued2013-04
dc.identifier.isbn978-1-4673-4910-9
dc.identifier.isbn978-1-4673-4909-3
dc.identifier.isbn978-1-4673-4908-6
dc.identifier.issn1063-6382
dc.identifier.otherINSPEC Accession Number: 13598422
dc.identifier.urihttp://hdl.handle.net/1721.1/90854
dc.description.abstractData lineage is a key component of provenance that helps scientists track and query relationships between input and output data. While current systems readily support lineage relationships at the file or data array level, finer-grained support at an array-cell level is impractical due to the lack of support for user defined operators and the high runtime and storage overhead to store such lineage. We interviewed scientists in several domains to identify a set of common semantics that can be leveraged to efficiently store fine-grained lineage. We use the insights to define lineage representations that efficiently capture common locality properties in the lineage data, and a set of APIs so operator developers can easily export lineage information from user defined operators. Finally, we introduce two benchmarks derived from astronomy and genomics, and show that our techniques can reduce lineage query costs by up to 10× while incuring substantially less impact on workflow runtime and storage.en_US
dc.language.isoen_US
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/ICDE.2013.6544881en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourceMIT web domainen_US
dc.titleSubZero: A fine-grained lineage system for scientific databasesen_US
dc.typeArticleen_US
dc.identifier.citationWu, Eugene, Samuel Madden, and Michael Stonebraker. “SubZero: A Fine-Grained Lineage System for Scientific Databases.” 2013 IEEE 29th International Conference on Data Engineering (ICDE) (April 8-12, 2013). Brisbane, QLD. IEEE. p.865-876.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorWu, Eugeneen_US
dc.contributor.mitauthorMadden, Samuel R.en_US
dc.contributor.mitauthorStonebraker, Michaelen_US
dc.relation.journalProceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE)en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsWu, E.; Madden, S.; Stonebraker, M.en_US
dc.identifier.orcidhttps://orcid.org/0000-0001-9184-9058
dc.identifier.orcidhttps://orcid.org/0000-0002-7470-3265
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record