Optimal stopping rules for sequential hypothesis testing

Daskalakis, C; Kawase, Y

dc.contributor.author	Daskalakis, C
dc.contributor.author	Kawase, Y
dc.date.accessioned	2022-06-17T14:20:50Z
dc.date.available	2022-06-17T14:20:50Z
dc.date.issued	2017-09-01
dc.identifier.uri	https://hdl.handle.net/1721.1/143459
dc.description.abstract	Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis "p = q" after seeing as few samples as possible, when p ≠q, while we never want to reject the null, when p = q. Well-known results show that ϵ(√ n/ϵ2) samples are necessary and sufficient for distinguishing whether p equals q versus p is ϵ -far from q in total variation distance. However, this requires the distinguishing radius ϵ to be fixed prior to deciding how many samples to request. Our goal is instead to design sequential hypothesis testers, i.e. online algorithms that request i.i.d. samples from p and stop as soon as they can confidently reject the hypothesis p = q, without being given a lower bound on the distance between p and q, when p ≠q. In particular, we want to minimize the number of samples requested by our tests as a function of the distance between p and q, and if p = q we want the algorithm, with high probability, to never reject the null. Our work is motivated by and addresses the practical challenge of sequential A/B testing in Statistics. We show that, when n = 2, any sequential hypothesis test must see Ω (1 /dtv(p,q)2 log log 1 dtv(p,q) ) samples, with high (constant) probability, before it rejects p = q, where dtv(p, q) is the-unknown to the tester-Total variation distance between p and q. We match the dependence of this lower bound on dtv(p, q) by proposing a sequential tester that rejects p = q from at most O √n/dtv(p,q)2 log log 1/dtv(p,q) samples with high (constant) probability. The Ω (√ n) dependence on the support size n is also known to be necessary. We similarly provide two-sample sequential hypothesis testers, when sample access is given to both p and q, and discuss applications to sequential A/B testing.	en_US
dc.language.iso	en
dc.relation.isversionof	10.4230/LIPIcs.ESA.2017.32	en_US
dc.rights	Creative Commons Attribution 3.0 unported license	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/3.0/	en_US
dc.source	DROPS	en_US
dc.title	Optimal stopping rules for sequential hypothesis testing	en_US
dc.type	Article	en_US
dc.identifier.citation	Daskalakis, C and Kawase, Y. 2017. "Optimal stopping rules for sequential hypothesis testing." Leibniz International Proceedings in Informatics, LIPIcs, 87.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journal	Leibniz International Proceedings in Informatics, LIPIcs	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2022-06-17T14:15:30Z
dspace.orderedauthors	Daskalakis, C; Kawase, Y	en_US
dspace.date.submission	2022-06-17T14:15:31Z
mit.journal.volume	87	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: LIPIcs-ESA-2017-32.pdf
Size:: 556.1Kb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record