Cost-effective conceptual design using taxonomies
Author(s)Chodpathumwan, Yodsawalai; Vakilian, Ali; Termehchy, Arash; Nayyeri, Amir
MetadataShow full item record
It is known that annotating entities in unstructured and semi-structured datasets by their concepts improves the effectiveness of answering queries over these datasets. Ideally, one would like to annotate entities of all relevant concepts in a dataset. However, it takes substantial time and computational resources to annotate concepts in large datasets, and an organization may have sufficient resources to annotate only a subset of relevant concepts. Clearly, it would like to annotate a subset of concepts that provides the most effective answers to queries over the dataset. We propose a formal framework that quantifies the amount by which annotating entities of concepts from a taxonomy in a dataset improves the effectiveness of answering queries over the dataset. Because the problem is NP-hard, we propose efficient approximation and pseudo-polynomial time algorithms for several cases of the problem. Our extensive empirical studies validate our framework and show accuracy and efficiency of our algorithms.
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Springer Science and Business Media LLC
Chodpathumwan, Yodsawalai et al. "Cost-effective conceptual design using taxonomies." VLDB Journal 27, 3 (March 2018): 369–394 © 2018 Springer-Verlag
Author's final manuscript