dc.contributor.author | Daskalakis, C | |
dc.contributor.author | Kawase, Y | |
dc.date.accessioned | 2022-06-17T14:20:50Z | |
dc.date.available | 2022-06-17T14:20:50Z | |
dc.date.issued | 2017-09-01 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/143459 | |
dc.description.abstract | Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis "p = q" after seeing as few samples as possible, when p ≠q, while we never want to reject the null, when p = q. Well-known results show that ϵ(√ n/ϵ2) samples are necessary and sufficient for distinguishing whether p equals q versus p is ϵ -far from q in total variation distance. However, this requires the distinguishing radius ϵ to be fixed prior to deciding how many samples to request. Our goal is instead to design sequential hypothesis testers, i.e. online algorithms that request i.i.d. samples from p and stop as soon as they can confidently reject the hypothesis p = q, without being given a lower bound on the distance between p and q, when p ≠q. In particular, we want to minimize the number of samples requested by our tests as a function of the distance between p and q, and if p = q we want the algorithm, with high probability, to never reject the null. Our work is motivated by and addresses the practical challenge of sequential A/B testing in Statistics. We show that, when n = 2, any sequential hypothesis test must see Ω (1 /dtv(p,q)2 log log 1 dtv(p,q) ) samples, with high (constant) probability, before it rejects p = q, where dtv(p, q) is the-unknown to the tester-Total variation distance between p and q. We match the dependence of this lower bound on dtv(p, q) by proposing a sequential tester that rejects p = q from at most O √n/dtv(p,q)2 log log 1/dtv(p,q) samples with high (constant) probability. The Ω (√ n) dependence on the support size n is also known to be necessary. We similarly provide two-sample sequential hypothesis testers, when sample access is given to both p and q, and discuss applications to sequential A/B testing. | en_US |
dc.language.iso | en | |
dc.relation.isversionof | 10.4230/LIPIcs.ESA.2017.32 | en_US |
dc.rights | Creative Commons Attribution 3.0 unported license | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/3.0/ | en_US |
dc.source | DROPS | en_US |
dc.title | Optimal stopping rules for sequential hypothesis testing | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Daskalakis, C and Kawase, Y. 2017. "Optimal stopping rules for sequential hypothesis testing." Leibniz International Proceedings in Informatics, LIPIcs, 87. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | |
dc.relation.journal | Leibniz International Proceedings in Informatics, LIPIcs | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2022-06-17T14:15:30Z | |
dspace.orderedauthors | Daskalakis, C; Kawase, Y | en_US |
dspace.date.submission | 2022-06-17T14:15:31Z | |
mit.journal.volume | 87 | en_US |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |