Ranked personalized recommendations using discrete choice models
Author(s)Ammar, Ammar (Ammar T.)
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
MetadataShow full item record
Personalized recommendation modules have become an integral part of most consumer information systems. Whether you are looking for a movie to watch, a restaurant to dine, or a news article to read, the number of available option has exploded significantly. Furthermore, the commensurate growth in data collection and processing has created a unique opportunity, where the successful identification of a relevant/desired item in a timely and efficient manner can have serious ramifications for the underlying business in terms of consumer satisfaction, operational efficiency, or both. Taken together, these developments create a need for a principled, scalable, and efficient approach for distilling the available consumer data into compact and accurate representations that can be utilized for making inference about future behavior and preference. In this work, we address the problem of providing such recommendations using ranked data, both as system input and output . In particular, we consider two concrete, and interrelated, scenarios, that capture a large number of applications in a variety of domains. In the first scenario, we consider a set-up where the desired goal is to identify a single global ranking, as we would in a tournament. This setup is analogous to the problem of rank aggregation, historically studied in political science and economics, and more recently in computer science and operations research. In the second scenario, we extend the setup to include multiple 'prominent' rankings. Taken together, these rankings reflect the intrinsic heterogeneity of the population, where each ranking can be viewed as a profile for a subset of said population. In both scenarios, the goal is to (i) devise a model to explain and compress the data, (ii) provide efficient algorithms to identify the relevant ranking for a given user, and (iii) provide a theoretical characterization of the difficulty of this task together with conditions under which this difficulty can be avoided. To that end, and drawing on ideas from econometrics and computer science, we propose a model for the single ranking problem where the data is assumed to be generated from a Multi-Nomial Logit (MNL) model, a parametric probability distribution over permutations used in applications ranging from the ranking of players in online gaming platforms to the pricing of airline tickets. We then devise a simple algorithm for learning the underlying ranking directly from data, and show that this algorithm is consistent for a large subset of the so called Random Utility Models (RUM). Building on the insight from the single ranking case, we handle the multiple ranking scenario using a mixture of Multi-Nomial Logit models. We then provide a theoretical illustration of the difficulty in learning models from this class, which is not surprising given the richness of the model class, and the notorious difficulties inherent in dealing with ranked data. Finally, we devise a simple algorithm for estimating the model under plausible realistic conditions, together with theoretical guarantees on the performance together with an experimental evaluation.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 79-82).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.