Show simple item record

dc.contributor.authorMansour, Essam
dc.contributor.authorDeng, Dong
dc.contributor.authorCastro Fernandez, Raul
dc.contributor.authorQahtan, Abdulhakim A.
dc.contributor.authorTao, Wenbo
dc.contributor.authorAbedjan, Ziawasch
dc.contributor.authorElmagarmid, Ahmed
dc.contributor.authorIlyas, Ihab F.
dc.contributor.authorMadden, Samuel R
dc.contributor.authorOuzzani, Mourad
dc.contributor.authorStonebraker, Michael
dc.contributor.authorTang, Nan
dc.date.accessioned2022-01-07T16:09:25Z
dc.date.available2021-11-09T13:26:50Z
dc.date.available2022-01-07T16:09:25Z
dc.date.issued2018-04
dc.identifier.urihttps://hdl.handle.net/1721.1/137857.2
dc.description.abstract© 2018 IEEE. In order for an enterprise to gain insight into its internal business and the changing outside environment, it is essential to provide the relevant data for in-depth analysis. Enterprise data is usually scattered across departments and geographic regions and is often inconsistent. Data scientists spend the majority of their time finding, preparing, integrating, and cleaning relevant data sets. Data Civilizer is an end-To-end data preparation system. In this paper, we present the complete system, focusing on our new workflow engine, a superior system for entity matching and consolidation, and new cleaning tools. Our workflow engine allows data scientists to author, execute and retrofit data preparation pipelines of different data discovery and cleaning services. Our end-To-end demo scenario is based on data from the MIT data warehouse and e-commerce data sets.en_US
dc.language.isoen
dc.publisherIEEEen_US
dc.relation.isversionof10.1109/icde.2018.00184en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcewebsiteen_US
dc.titleBuilding Data Civilizer Pipelines with an Advanced Workflow Engineen_US
dc.typeArticleen_US
dc.identifier.citationMansour, Essam, Deng, Dong, Fernandez, Raul Castro, Qahtan, Abdulhakim A., Tao, Wenbo et al. 2018. "Building Data Civilizer Pipelines with an Advanced Workflow Engine."en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2019-06-18T16:54:43Z
dspace.date.submission2019-06-18T16:54:44Z
mit.licenseOPEN_ACCESS_POLICY
mit.metadata.statusPublication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version