Flexible Low-Rank Statistical Modeling with Missing Data and Side Information

Fithian, William; Mazumder, Rahul

Author(s)

Fithian, William; Mazumder, Rahul

Download1308.4211.pdf (628.9Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

We explore a general statistical framework for low-rank modeling of matrix-valued data, based on convex optimization with a generalized nuclear norm penalty. We study several related problems: the usual low-rank matrix completion problem with flexible loss functions arising from generalized linear models; reduced-rank regression and multi-task learning; and generalizations of both problems where side information about rows and columns is available, in the form of features or smoothing kernels. We show that our approach encompasses maximum a posteriori estimation arising from Bayesian hierarchical modeling with latent factors, and discuss ramifications of the missing-data mechanism in the context of matrix completion. While the above problems can be naturally posed as rank-constrained optimization problems, which are nonconvex and computationally difficult, we show how to relax them via generalized nuclear norm regularization to obtain convex optimization problems. We discuss algorithms drawing inspiration from modern convex optimization methods to address these large scale convex optimization computational tasks. Finally, we illustrate our flexible approach in problems arising in functional data reconstruction and ecological species distribution modeling.

Date issued

2017-08

URI

http://hdl.handle.net/1721.1/120549

Department

Sloan School of Management

Journal

Statistical Science

Publisher

Institute of Mathematical Statistics

Citation

Fithian, William and Rahul Mazumder. “Flexible Low-Rank Statistical Modeling with Missing Data and Side Information.” Statistical Science 33, 2 (May 2018): 238–260 © 2018 Institute of Mathematical Statistics

Version: Original manuscript

ISSN

0883-4237

Collections

MIT Open Access Articles

DSpace@MIT