Distribution testing : classical and new paradigms
Author(s)
Aliakbarpour, Maryam.
Download1227516069-MIT.pdf (1.645Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Ronitt Rubinfeld.
Terms of use
Metadata
Show full item recordAbstract
Hypothesis testing is a fundamental topic in statistics. To put it simply, hypothesis testing is a framework to examine whether a hypothesized model is in line with the observed data. Hypothesis testing has been widely used in experimental research in a variety of fields, such as biology, medical science, and social sciences. Despite a century of constant use, there is still a lot left to be done for the evolving needs in the practical world. Some of the high-priority challenges we face are preserving privacy, working with high-dimensional distributions, handling noisy data, and dealing with data that is gathered from multiple sources. In this thesis, we focus on basic statistical problems in a more recently considered setting of hypothesis testing, referred to as property testing, in which we aim to address the challenges mentioned above. In particular, we have the following contributions: 1. We investigate the problem of testing whether a distribution has the shape property that it is monotone according to some (partial) order of the domain elements, or it is far from being such a distribution. Among other results, our main contribution is that testing monotonicity over a high dimensional domain, a boolean hypercube needs almost linearly many samples in terms of the domain size. 2. We consider well-studied identity and closeness testing problems in a new mixture based noise model. We provide testers with optimal sample complexity for these problems under various scenarios that differ in terms of how the tester can access the distribution, or what knowledge about the noise is available to the tester. 3. We developed differentially private testers for several fundamental problems in testing, such as testing uniformity, identity, closeness, and independence. The conceptual message of our work is there exist private hypothesis testers that are nearly as sample-efficient as their non-private counterparts. 4. We consider a new model in distribution testing for multiple data sources when only a few samples are available from each source. This assumption is in contrast to the common distribution testing model, which views the data as i.i.d. samples from a single distribution. We generalized uniformity, identity, and closeness testing problems to this setting, and developed sample-optimal tester for these problems.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020 Cataloged from student-submitted PDF of thesis. Includes bibliographical references (pages 191-198).
Date issued
2020Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.