Distribution testing : classical and new paradigms
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Hypothesis testing is a fundamental topic in statistics. To put it simply, hypothesis testing is a framework to examine whether a hypothesized model is in line with the observed data. Hypothesis testing has been widely used in experimental research in a variety of fields, such as biology, medical science, and social sciences. Despite a century of constant use, there is still a lot left to be done for the evolving needs in the practical world. Some of the high-priority challenges we face are preserving privacy, working with high-dimensional distributions, handling noisy data, and dealing with data that is gathered from multiple sources. In this thesis, we focus on basic statistical problems in a more recently considered setting of hypothesis testing, referred to as property testing, in which we aim to address the challenges mentioned above. In particular, we have the following contributions: 1.We investigate the problem of testing whether a distribution has the shape property that it is monotone according to some (partial) order of the domain elements, or it is far from being such a distribution. Among other results, our main contribution is that testing monotonicity over a high dimensional domain, a boolean hypercube needs almost linearly many samples in terms of the domain size. 2. We consider well-studied identity and closeness testing problems in a new mixture based noise model. We provide testers with optimal sample complexity for these problems under various scenarios that differ in terms of how the tester can access the distribution, or what knowledge about the noise is available to the tester. 3. We developed differentially private testers for several fundamental problems in testing, such as testing uniformity, identity, closeness, and independence.The conceptual message of our work is there exist private hypothesis testers that are nearly as sample-efficient as their non-private counterparts. 4. We consider a new model in distribution testing for multiple data sources when only a few samples are available from each source. This assumption is in contrast to the common distribution testing model, which views the data as i.i.d. samples from a single distribution. We generalized uniformity, identity, and closeness testing problems to this setting, and developed sample-optimal tester for these problems.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September, 2020Cataloged from student-submitted PDF of thesis.Includes bibliographical references (pages 191-198).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.