MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Collaborative data analytics with DataHub

Author(s)
Deshpande, Amol; Elmore, Aaron J.; Parameswaran, Aditya; Wu, Eugene; Zhang, Rebecca; Bhardwaj, Anant P.; Karger, David R.; Madden, Samuel R.; Subramanyam, Harihar G.; ... Show more Show less
Thumbnail
DownloadMadden_Collaborative data.pdf (689.3Kb)
PUBLISHER_CC

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License http://creativecommons.org/licenses/by-nc-nd/3.0/
Metadata
Show full item record
Abstract
While there have been many solutions proposed for storing and analyzing large volumes of data, all of these solutions have limited support for collaborative data analytics, especially given the many individuals and teams are simultaneously analyzing, modifying and exchanging datasets, employing a number of heterogeneous tools or languages for data analysis, and writing scripts to clean, preprocess, or query data. We demonstrate DataHub, a unified platform with the ability to load, store, query, collaboratively analyze, interactively visualize, interface with external applications, and share datasets. We will demonstrate the following aspects of the DataHub platform: (a) flexible data storage, sharing, and native versioning capabilities: multiple conference attendees can concurrently update the database and browse the different versions and inspect conflicts; (b) an app ecosystem that hosts apps for various data-processing activities: conference attendees will be able to effortlessly ingest, query, and visualize data using our existing apps; (c) thrift-based data serialization permits data analysis in any combination of 20+ languages, with DataHub as the common data store: conference attendees will be able to analyze datasets in R, Python, and Matlab, while the inputs and the results are still stored in DataHub. In particular, conference attendees will be able to use the DataHub notebook---an IPython-based notebook for analyzing data and storing the results of data analysis.
Date issued
2015-08
URI
http://hdl.handle.net/1721.1/100937
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Journal
Proceedings of the VLDB Endowment
Publisher
Association for Computing Machinery (ACM)
Citation
Bhardwaj, Anant, Amol Deshpande, Aaron J. Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, and Rebecca Zhang. “Collaborative Data Analytics with DataHub.” Proceedings of the VLDB Endowment 8, no. 12 (August 1, 2015): 1916–1919.
Version: Author's final manuscript
ISSN
21508097

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.