Optimal stopping rules for sequential hypothesis testing

Daskalakis, C; Kawase, Y

Author(s)

Daskalakis, C; Kawase, Y

DownloadPublished version (556.1Kb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution 3.0 unported license https://creativecommons.org/licenses/by/3.0/

Metadata

Show full item record

Abstract

Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis "p = q" after seeing as few samples as possible, when p ≠q, while we never want to reject the null, when p = q. Well-known results show that ϵ(√ n/ϵ2) samples are necessary and sufficient for distinguishing whether p equals q versus p is ϵ -far from q in total variation distance. However, this requires the distinguishing radius ϵ to be fixed prior to deciding how many samples to request. Our goal is instead to design sequential hypothesis testers, i.e. online algorithms that request i.i.d. samples from p and stop as soon as they can confidently reject the hypothesis p = q, without being given a lower bound on the distance between p and q, when p ≠q. In particular, we want to minimize the number of samples requested by our tests as a function of the distance between p and q, and if p = q we want the algorithm, with high probability, to never reject the null. Our work is motivated by and addresses the practical challenge of sequential A/B testing in Statistics. We show that, when n = 2, any sequential hypothesis test must see Ω (1 /dtv(p,q)2 log log 1 dtv(p,q) ) samples, with high (constant) probability, before it rejects p = q, where dtv(p, q) is the-unknown to the tester-Total variation distance between p and q. We match the dependence of this lower bound on dtv(p, q) by proposing a sequential tester that rejects p = q from at most O √n/dtv(p,q)2 log log 1/dtv(p,q) samples with high (constant) probability. The Ω (√ n) dependence on the support size n is also known to be necessary. We similarly provide two-sample sequential hypothesis testers, when sample access is given to both p and q, and discuss applications to sequential A/B testing.

Date issued

2017-09-01

URI

https://hdl.handle.net/1721.1/143459

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Journal

Leibniz International Proceedings in Informatics, LIPIcs

Citation

Daskalakis, C and Kawase, Y. 2017. "Optimal stopping rules for sequential hypothesis testing." Leibniz International Proceedings in Informatics, LIPIcs, 87.

Version: Final published version

Collections

MIT Open Access Articles