Sequential data inference via matrix estimation : causal inference, cricket and retail

Amjad, Muhammad Jehangir

Author(s)

Amjad, Muhammad Jehangir

DownloadFull printable version (5.046Mb)

Other Contributors

Massachusetts Institute of Technology. Operations Research Center.

Advisor

Devavrat Shah.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

This thesis proposes a unified framework to capture the temporal and longitudinal variation across multiple instances of sequential data. Examples of such data include sales of a product over a period of time across several retail locations; trajectories of scores across cricket games; and annual tobacco consumption across the United States over a period of decades. A key component of our work is the latent variable model (LVM) which views the sequential data as a matrix where the rows correspond to multiple sequences while the columns represent the sequential aspect. The goal is to utilize information in the data within the sequence and across different sequences to address two inferential questions: (a) imputation or "filling missing values" and "de-noising" observed values, and (b) forecasting or predicting "future" values, for a given sequence of data. Using this framework, we build upon the recent developments in "matrix estimation" to address the inferential goals in three different applications. First, a robust variant of the popular "synthetic control" method used in observational studies to draw causal statistical inferences. Second, a score trajectory forecasting algorithm for the game of cricket using historical data. This leads to an unbiased target resetting algorithm for shortened cricket games which is an improvement upon the biased incumbent approach (Duckworth-Lewis-Stern). Third, an algorithm which leads to a consistent estimator for the time- and location-varying demand of products using censored observations in the context of retail. As a final contribution, the algorithms presented are implemented and packaged as a scalable open-source library for the imputation and forecasting of sequential data with applications beyond those presented in this work.

Description

Thesis: Ph. D., Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2018.

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 185-193).

Date issued

2018

URI

http://hdl.handle.net/1721.1/120190

Department

Massachusetts Institute of Technology. Operations Research Center; Sloan School of Management

Publisher

Massachusetts Institute of Technology

Keywords

Operations Research Center.

Collections

Doctoral Theses