dc.contributor.author | Bhardwaj, Anant P. | |
dc.contributor.author | Bhattacherjee, Souvik | |
dc.contributor.author | Chavan, Amit | |
dc.contributor.author | Deshpande, Amol | |
dc.contributor.author | Elmore, Aaron J. | |
dc.contributor.author | Madden, Samuel R. | |
dc.contributor.author | Parameswaran, Aditya | |
dc.date.accessioned | 2016-01-19T03:17:20Z | |
dc.date.available | 2016-01-19T03:17:20Z | |
dc.date.issued | 2015-01 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/100919 | |
dc.description.abstract | Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DATA HUB, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale. | en_US |
dc.language.iso | en_US | |
dc.relation.isversionof | http://cidrdb.org/cidr2015/program.html | en_US |
dc.rights | Creative Commons Attribution | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/ | en_US |
dc.source | MIT web domain | en_US |
dc.title | DataHub: Collaborative Data Science & Dataset Version Management at Scale | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Bhardwaj, Anant, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, Aditya Parameswaran. "DataHub: Collaborative Data Science & Dataset Version Management at Scale." 7th Biennial Conference on Innovative Data Systems Research (CIDR ’15) (January 2015). | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.contributor.mitauthor | Bhardwaj, Anant P. | en_US |
dc.contributor.mitauthor | Elmore, Aaron J. | en_US |
dc.contributor.mitauthor | Madden, Samuel R. | en_US |
dc.contributor.mitauthor | Parameswaran, Aditya | en_US |
dc.relation.journal | Proceeings of the 7th Biennial Conference on Innovative Data Systems Research (CIDR ’15) | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dspace.orderedauthors | Bhardwaj, Anant; Bhattacherjee, Souvik; Chavan, Amit; Deshpande, Amol; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya | en_US |
dc.identifier.orcid | https://orcid.org/0000-0002-7470-3265 | |
dc.identifier.orcid | https://orcid.org/0000-0002-4642-1869 | |
mit.license | PUBLISHER_CC | en_US |