Show simple item record

dc.contributor.authorCurino, Carlo
dc.contributor.authorJones, Evan Philip Charles
dc.contributor.authorZhang, Yang
dc.contributor.authorMadden, Samuel R.
dc.date.accessioned2012-09-27T15:23:55Z
dc.date.available2012-09-27T15:23:55Z
dc.date.issued2010-09
dc.identifier.issn2150-8097
dc.identifier.urihttp://hdl.handle.net/1721.1/73347
dc.description.abstractWe present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partitions. Schism consists of two phases: i) a workload-driven, graph-based replication/partitioning phase and ii) an explanation and validation phase. The first phase creates a graph with a node per tuple (or group of tuples) and edges between nodes accessed by the same transaction, and then uses a graph partitioner to split the graph into k balanced partitions that minimize the number of cross-partition transactions. The second phase exploits machine learning techniques to find a predicate-based explanation of the partitioning strategy (i.e., a set of range predicates that represent the same replication/partitioning scheme produced by the partitioner). The strengths of Schism are: i) independence from the schema layout, ii) effectiveness on n-to-n relations, typical in social network databases, iii) a unified and fine-grained approach to replication and partitioning. We implemented and tested a prototype of Schism on a wide spectrum of test cases, ranging from classical OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios derived from social network websites (e.g., Epinions.com), whose schema contains multiple n-to-n relationships, which are known to be hard to partition. Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.en_US
dc.description.sponsorshipQuanta Computer (Firm) (T-Party Project)en_US
dc.language.isoen_US
dc.publisherVery Large Data Base Endowment Inc. (VLDB Endowment)en_US
dc.relation.isversionofhttp://dl.acm.org/citation.cfm?id=1920853en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourceMIT web domainen_US
dc.titleSchism: a Workload-Driven Approach to Database Replication and Partitioningen_US
dc.typeArticleen_US
dc.identifier.citationCarlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proc. VLDB Endow. 3, 1-2 (September 2010), 48-57.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorCurino, Carlo
dc.contributor.mitauthorJones, Evan Philip Charles
dc.contributor.mitauthorZhang, Yang
dc.contributor.mitauthorMadden, Samuel R.
dc.relation.journalProceedings of the VLDB Endowmenten_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-7470-3265
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record