Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/137529.2

Show simple item record

dc.contributor.authorRezig, El Kindi
dc.contributor.authorCao, Lei
dc.contributor.authorStonebraker, Michael
dc.contributor.authorSimonini, Giovanni
dc.contributor.authorTao, Wenbo
dc.contributor.authorMadden, Samuel
dc.contributor.authorOuzzani, Mourad
dc.contributor.authorTang, Nan
dc.contributor.authorElmagarmid, Ahmed K
dc.date.accessioned2021-11-05T15:28:28Z
dc.date.available2021-11-05T15:28:28Z
dc.date.issued2019
dc.identifier.urihttps://hdl.handle.net/1721.1/137529
dc.description.abstract© 2019 VLDB Endowment. Data scientists spend over 80% of their time (1) parameter-tuning machine learning models and (2) iterating between data cleaning and machine learning model execution. While there are existing efforts to support the first requirement, there is currently no integrated workflow system that couples data cleaning and machine learning development. The previous version of Data Civilizer was geared towards data cleaning and discovery using a set of pre-defined tools. In this paper, we introduce Data Civilizer 2.0, an end-to-end workflow system satisfying both requirements. In addition, this system also supports a sophisticated data debugger and a workflow visualization system. In this demo, we will show how we used Data Civilizer 2.0 to help scientists at the Massachusetts General Hospital build their cleaning and machine learning pipeline on their 30TB brain activity dataset.en_US
dc.language.isoen
dc.publisherVLDB Endowmenten_US
dc.relation.isversionof10.14778/3352063.3352108en_US
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs Licenseen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en_US
dc.sourceVLDB Endowmenten_US
dc.titleData Civilizer 2.0: a holistic framework for data preparation and analyticsen_US
dc.typeArticleen_US
dc.identifier.citationRezig, El Kindi, Cao, Lei, Stonebraker, Michael, Simonini, Giovanni, Tao, Wenbo et al. 2019. "Data Civilizer 2.0: a holistic framework for data preparation and analytics." Proceedings of the VLDB Endowment, 12 (12).
dc.relation.journalProceedings of the VLDB Endowmenten_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-01-29T18:18:03Z
dspace.orderedauthorsRezig, EK; Cao, L; Stonebraker, M; Simonini, G; Tao, W; Madden, S; Ouzzani, M; Tang, N; Elmagarmid, AKen_US
dspace.date.submission2021-01-29T18:18:11Z
mit.journal.volume12en_US
mit.journal.issue12en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version