Advanced Search
DSpace@MIT

GenBase: A Complex Analytics Genomics Benchmark

Research and Teaching Output of the MIT Community

Show simple item record

dc.contributor.advisor Sam Madden
dc.contributor.author Taft, Rebecca en_US
dc.contributor.author Vartak, Manasi en_US
dc.contributor.author Satish, Nadathur Rajagopalan en_US
dc.contributor.author Sundaram, Narayanan en_US
dc.contributor.author Madden, Samuel en_US
dc.contributor.author Stonebraker, Michael en_US
dc.contributor.other Database en
dc.date.accessioned 2013-11-20T17:00:05Z
dc.date.available 2013-11-20T17:00:05Z
dc.date.issued 2013-11-19
dc.identifier.uri http://hdl.handle.net/1721.1/82517
dc.description.abstract This paper introduces a new benchmark, designed to test database management system (DBMS) performance on a mix of data management tasks (joins, filters, etc.) and complex analytics (regression, singular value decomposition, etc.) Such mixed workloads are prevalent in a number of application areas, including most science workloads and web analytics. As a specific use case, we have chosen genomics data for our benchmark, and have constructed a collection of typical tasks in this area. In addition to being representative of a mixed data management and analytics workload, this benchmark is also meant to scale to large dataset sizes and multiple nodes across a cluster. Besides presenting this benchmark, we have run it on a variety of storage systems including traditional row stores, newer column stores, Hadoop, and an array DBMS. We present performance numbers on all systems on single and multiple nodes, and show that performance differs by orders of magnitude between the various solutions. In addition, we demonstrate that most platforms have scalability issues. We also test offloading the analytics onto a coprocessor. The intent of this benchmark is to focus research interest in this area; to this end, all of our data, data generators, and scripts are available on our web site. en_US
dc.format.extent 12 p. en_US
dc.relation.ispartofseries MIT-CSAIL-TR-2013-028
dc.rights Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported *
dc.rights.uri http://creativecommons.org/licenses/by-nc-sa/3.0/ en_US
dc.title GenBase: A Complex Analytics Genomics Benchmark en_US
dc.date.updated 2013-11-20T17:00:05Z


Files in this item

Name Size Format Description
MIT-CSAIL-TR-2013 ... 2.014Mb PDF

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported Except where otherwise noted, this item's license is described as Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
MIT-Mirage