Show simple item record

dc.contributor.authorGe, Tingjian
dc.contributor.authorZdonik, Stan
dc.contributor.authorMadden, Samuel R.
dc.date.accessioned2012-08-17T17:38:41Z
dc.date.available2012-08-17T17:38:41Z
dc.date.issued2009
dc.identifier.isbn978-1-60558-551-2
dc.identifier.urihttp://hdl.handle.net/1721.1/72189
dc.description.abstractUncertain data arises in a number of domains, including data integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain data sets. As several recent papers have observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs between reporting high-scoring tuples and tuples with a high probability of being in the resulting data set. In this paper, we demonstrate the need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions. One option would be to display the complete distribution of all potential top-k tuple vectors, but this set is too large to compute. Instead, we propose to provide a number of typical vectors that effectively sample this distribution. We propose efficient algorithms to compute these vectors. We also extend the semantics and algorithms to the scenario of score ties, which is not dealt with in the previous work in the area. Our work includes a systematic empirical study on both real dataset and synthetic datasets.en_US
dc.description.sponsorshipNational Natural Science Foundation (Grant number IIS-0086057)en_US
dc.description.sponsorshipNational Natural Science Foundation (Grant number IIS- 0325838)en_US
dc.description.sponsorshipNational Natural Science Foundation (Grant number IIS-0448124)en_US
dc.language.isoen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/1559845.1559886en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourceMIT web domainen_US
dc.titleTop-K Queries on Uncertain Data: On Score Distribution and Typical Answersen_US
dc.typeArticleen_US
dc.identifier.citationTingjian Ge, Stan Zdonik, and Samuel Madden. 2009. Top-k queries on uncertain data: on score distribution and typical answers. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 375-388.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.approverMadden, Samuel R.
dc.contributor.mitauthorMadden, Samuel R.
dc.relation.journalSIGMOD '09 Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
dspace.orderedauthorsGe, Tingjian; Zdonik, Stan; Madden, Samuelen
dc.identifier.orcidhttps://orcid.org/0000-0002-7470-3265
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record