Profiling relational data: a survey

Abedjan, Ziawasch; Golab, Lukasz; Naumann, Felix

Author(s)

Abedjan, Ziawasch; Golab, Lukasz; Naumann, Felix

Download778_2015_Article_389.pdf (1.029Mb)

PUBLISHER_POLICY

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases.

Date issued

2015-06

URI

http://hdl.handle.net/1721.1/106176

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Journal

The VLDB Journal

Publisher

Springer Berlin Heidelberg

Citation

Abedjan, Ziawasch, Lukasz Golab, and Felix Naumann. “Profiling Relational Data: A Survey.” The VLDB Journal 24.4 (2015): 557–581.

Version: Author's final manuscript

ISSN

1066-8888

0949-877X

Collections

MIT Open Access Articles

DSpace@MIT