Show simple item record

dc.contributor.authorChen, Sitan
dc.contributor.authorLi, Jerry
dc.contributor.authorMoitra, Ankur
dc.date.accessioned2022-11-22T15:49:37Z
dc.date.available2021-11-05T15:09:02Z
dc.date.available2022-11-22T15:49:37Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/1721.1/137507.2
dc.description.abstract© 2020 ACM. We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution over 1, ..., n. Each user sends a batch of k i.i.d. samples from this distribution; however an "-fraction of users are untrustworthy and can send adversarially chosen responses. The goal of the algorithm is to learn in total variation distance. When k = 1 this is the standard robust univariate density estimation setting and it is well-understood that (") error is unavoidable. Suprisingly, Qiao and Valiant gave an estimator which improves upon this rate when k is large. Unfortunately, their algorithms run in time which is exponential in either n or k. We first give a sequence of polynomial time algorithms whose estimation error approaches the information-theoretically optimal bound for this problem. Our approach is based on recent algorithms derived from the sum-of-squares hierarchy, in the context of high-dimensional robust estimation. We show that algorithms for learning from untrusted batches can also be cast in this framework, but by working with a more complicated set of test functions. It turns out that this abstraction is quite powerful, and can be generalized to incorporate additional problem specific constraints. Our second and main result is to show that this technology can be leveraged to build in prior knowledge about the shape of the distribution. Crucially, this allows us to reduce the sample complexity of learning from untrusted batches to polylogarithmic in n for most natural classes of distributions, which is important in many applications. To do so, we demonstrate that these sum-of-squares algorithms for robust mean estimation can be made to handle complex combinatorial constraints (e.g. those arising from VC theory), which may be of independent technical interest.en_US
dc.language.isoen
dc.publisherACMen_US
dc.relation.isversionof10.1145/3357713.3384337en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceACMen_US
dc.titleEfficiently learning structured distributions from untrusted batchesen_US
dc.typeArticleen_US
dc.identifier.citationChen, Sitan, Li, Jerry and Moitra, Ankur. 2020. "Efficiently learning structured distributions from untrusted batches." Proceedings of the Annual ACM Symposium on Theory of Computing.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Media Laboratory
dc.relation.journalProceedings of the Annual ACM Symposium on Theory of Computingen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2021-05-24T18:23:48Z
dspace.orderedauthorsChen, S; Li, J; Moitra, Aen_US
dspace.date.submission2021-05-24T18:23:50Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version