Rare Probability Estimation under Regularly Varying Heavy Tails

Ohannessian, Mesrob I.; Dahleh, Munther A.

Author(s)

Ohannessian, Mesrob I.; Dahleh, Munther A.

DownloadDahleh_Rare probability.pdf (413.4Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

This paper studies the problem of estimating the probability of symbols that have occurred very rarely, in samples drawn independently from an unknown, possibly infinite, discrete distribution. In particular, we study the multiplicative consistency of estimators, defined as the ratio of the estimate to the true quantity converging to one. We first show that the classical Good-Turing estimator is not universally consistent in this sense, despite enjoying favorable additive properties. We then use Karamata's theory of regular variation to prove that regularly varying heavy tails are sufficient for consistency. At the core of this result is a multiplicative concentration that we establish both by extending the McAllester-Ortiz additive concentration for the missing mass to all rare probabilities and by exploiting regular variation. We also derive a family of estimators which, in addition to being consistent, address some of the shortcomings of the Good-Turing estimator. For example, they perform smoothing implicitly and have the absolute discounting structure of many heuristic algorithms. This also establishes a discrete parallel to extreme value theory, and many of the techniques therein can be adapted to the framework that we set forth.

Date issued

2012

URI

http://hdl.handle.net/1721.1/99945

Department

Massachusetts Institute of Technology. Institute for Data, Systems, and Society; Massachusetts Institute of Technology. Laboratory for Information and Decision Systems

Journal

Journal of Machine Learning Research: Workshop and Conference Proceedings

Publisher

Journal of Machine Learning Research

Citation

Ohannessian, Mesrob I., and Munther A. Dahleh. "Rare Probability Estimation under Regularly Varying Heavy Tails." Journal of Machine Learning Research: Workshop and Conference Proceedings 23 (2012), 21.1-21.24.

Version: Author's final manuscript

ISSN

1938-7228

Collections

MIT Open Access Articles

DSpace@MIT