MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

SubZero: A fine-grained lineage system for scientific databases

Author(s)
Wu, Eugene; Madden, Samuel R.; Stonebraker, Michael
Thumbnail
DownloadStonebraker_Subzero.pdf (457.8Kb)
OPEN_ACCESS_POLICY

Open Access Policy

Creative Commons Attribution-Noncommercial-Share Alike

Terms of use
Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/
Metadata
Show full item record
Abstract
Data lineage is a key component of provenance that helps scientists track and query relationships between input and output data. While current systems readily support lineage relationships at the file or data array level, finer-grained support at an array-cell level is impractical due to the lack of support for user defined operators and the high runtime and storage overhead to store such lineage. We interviewed scientists in several domains to identify a set of common semantics that can be leveraged to efficiently store fine-grained lineage. We use the insights to define lineage representations that efficiently capture common locality properties in the lineage data, and a set of APIs so operator developers can easily export lineage information from user defined operators. Finally, we introduce two benchmarks derived from astronomy and genomics, and show that our techniques can reduce lineage query costs by up to 10× while incuring substantially less impact on workflow runtime and storage.
Date issued
2013-04
URI
http://hdl.handle.net/1721.1/90854
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Journal
Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE)
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Wu, Eugene, Samuel Madden, and Michael Stonebraker. “SubZero: A Fine-Grained Lineage System for Scientific Databases.” 2013 IEEE 29th International Conference on Data Engineering (ICDE) (April 8-12, 2013). Brisbane, QLD. IEEE. p.865-876.
Version: Author's final manuscript
Other identifiers
INSPEC Accession Number: 13598422
ISBN
978-1-4673-4910-9
978-1-4673-4909-3
978-1-4673-4908-6
ISSN
1063-6382

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.