A Minimax Approach for Learning Gaussian Mixtures

Wang, William Wei

Author(s)

Wang, William Wei

DownloadThesis PDF (1.461Mb)

Advisor

Jadbabaie, Ali

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Generative adversarial networks (GANs) learn the distribution of observed samples through a zero-sum game between two machine players, a generator and a discriminator. While GANs achieve great success in learning the complex distribution of image, sound, and text data, they perform suboptimally in learning multi-modal distributionlearning benchmarks including Gaussian mixture models (GMMs). In this thesis, we propose Generative Adversarial Training for Gaussian Mixture Models (GATGMM), a minimax GAN framework for learning GMMs. Motivated by optimal transport theory, we design the zero-sum game in GAT-GMM using a random linear generator and a softmax-based quadratic discriminator architecture, which leads to a non-convex concave minimax optimization problem. We show that a Gradient Descent Ascent (GDA) method converges to an approximate stationary minimax point of the GAT-GMM optimization problem. In the benchmark case of a mixture of two symmetric, well-separated Gaussians, we further show this stationary point recovers the true parameters of the underlying GMM. We discuss the application of the proposed GAT-GMM framework for learning GMMs in the distributed federated learning setting, where the widely-used expectation-maximization (EM) algorithm can incur great computational and communication costs. On the other hand, we show that GAT-GMM provides a scalable learning approach and a distributed GDA algorithm can still solve the GAT-GMM minimax problem without incurring extra computation costs. We numerically support our theoretical results by performing experiments which show that our minimax framework is successful in centralized learning tasks and can outperform standard EM-type algorithms in the federated setting.

Date issued

2021-09

URI

https://hdl.handle.net/1721.1/139991

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses