Show simple item record

dc.contributor.authorDeshpande, Amol
dc.contributor.authorElmore, Aaron J.
dc.contributor.authorParameswaran, Aditya
dc.contributor.authorWu, Eugene
dc.contributor.authorZhang, Rebecca
dc.contributor.authorBhardwaj, Anant P.
dc.contributor.authorKarger, David R.
dc.contributor.authorMadden, Samuel R.
dc.contributor.authorSubramanyam, Harihar G.
dc.date.accessioned2016-01-20T01:34:22Z
dc.date.available2016-01-20T01:34:22Z
dc.date.issued2015-08
dc.identifier.issn21508097
dc.identifier.urihttp://hdl.handle.net/1721.1/100937
dc.description.abstractWhile there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various data-processing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook---an IPython-based notebook for analyzing data and storing the results of data analysis.en_US
dc.language.isoen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionofhttp://dx.doi.org/10.14778/2824032.2824100en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/en_US
dc.sourceMIT web domainen_US
dc.titleCollaborative data analytics with DataHuben_US
dc.typeArticleen_US
dc.identifier.citationBhardwaj, Anant, Amol Deshpande, Aaron J. Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, and Rebecca Zhang. “Collaborative Data Analytics with DataHub.” Proceedings of the VLDB Endowment 8, no. 12 (August 1, 2015): 1916–1919.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorBhardwaj, Anant P.en_US
dc.contributor.mitauthorKarger, David R.en_US
dc.contributor.mitauthorMadden, Samuel R.en_US
dc.contributor.mitauthorSubramanyam, Harihar G.en_US
dc.contributor.mitauthorZhang, Rebeccaen_US
dc.relation.journalProceedings of the VLDB Endowmenten_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dspace.orderedauthorsBhardwaj, Anant; Deshpande, Amol; Elmore, Aaron J.; Karger, David; Madden, Sam; Parameswaran, Aditya; Subramanyam, Harihar; Wu, Eugene; Zhang, Rebeccaen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-7470-3265
dc.identifier.orcidhttps://orcid.org/0000-0001-8720-7458
dc.identifier.orcidhttps://orcid.org/0000-0002-0024-5847
dc.identifier.orcidhttps://orcid.org/0000-0002-4642-1869
mit.licensePUBLISHER_CCen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record