The expected metric principle for probabilistic information retrieval

Chen, Harr

dc.contributor.advisor	David R. Karger.	en_US
dc.contributor.author	Chen, Harr	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2007-08-29T20:42:13Z
dc.date.available	2007-08-29T20:42:13Z
dc.date.copyright	2007	en_US
dc.date.issued	2007	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/38672
dc.description	Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.	en_US
dc.description	Includes bibliographical references (leaves 125-128).	en_US
dc.description.abstract	Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the Probability Ranking Principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the user's information need. One example is when the user would be satisfied with some limited number of relevant documents, rather than needing all relevant documents. We show that in such a scenario, an attempt to return many relevant documents can actually reduce the chances of finding any relevant documents. In this thesis, we introduce the Expected Metric Principle, which generalizes the Probability Ranking Principle in a way that intimately connects the evaluation metric and the retrieval model. We observe that given a probabilistic model of relevance, it is appropriate to rank so as to directly optimize these metrics in expectation.	en_US
dc.description.abstract	(cont.) We consider a number of metrics from the literature, such as the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations, as well as introducing our own new metrics. While direct optimization of a metric's expected value may be computationally intractable, we explore heuristic search approaches, and show that a simple approximate greedy optimization algorithm produces rankings for TREC queries that outperform the standard approach based on the probability ranking principle.	en_US
dc.description.statementofresponsibility	by Harr Chen.	en_US
dc.format.extent	128 leaves	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	The expected metric principle for probabilistic information retrieval	en_US
dc.type	Thesis	en_US
dc.description.degree	S.M.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	163943285	en_US

Files in this item

Name:: 163943285-MIT.pdf
Size:: 5.582Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record