Top-K Queries on Uncertain Data: On Score Distribution and Typical Answers

Ge, Tingjian; Zdonik, Stan; Madden, Samuel R.

dc.contributor.author	Ge, Tingjian
dc.contributor.author	Zdonik, Stan
dc.contributor.author	Madden, Samuel R.
dc.date.accessioned	2012-08-17T17:38:41Z
dc.date.available	2012-08-17T17:38:41Z
dc.date.issued	2009
dc.identifier.isbn	978-1-60558-551-2
dc.identifier.uri	http://hdl.handle.net/1721.1/72189
dc.description.abstract	Uncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.	en_US
dc.description.sponsorship	National Natural Science Foundation (Grant number IIS-0086057)	en_US
dc.description.sponsorship	National Natural Science Foundation (Grant number IIS- 0325838)	en_US
dc.description.sponsorship	National Natural Science Foundation (Grant number IIS-0448124)	en_US
dc.language.iso	en_US
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1145/1559845.1559886	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike 3.0	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Top-K Queries on Uncertain Data: On Score Distribution and Typical Answers	en_US
dc.type	Article	en_US
dc.identifier.citation	Tingjian Ge, Stan Zdonik, and Samuel Madden. 2009. Top-k queries on uncertain data: on score distribution and typical answers. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 375-388.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.approver	Madden, Samuel R.
dc.contributor.mitauthor	Madden, Samuel R.
dc.relation.journal	SIGMOD '09 Proceedings of the 2009 ACM SIGMOD International Conference on Management of data	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
dspace.orderedauthors	Ge, Tingjian; Zdonik, Stan; Madden, Samuel	en
dc.identifier.orcid	https://orcid.org/0000-0002-7470-3265
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Madden_Top-k queries.pdf
Size:: 330.9Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record