MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Finding patterns in features and observations : new machine learning models with applications in computational criminology, marketing, and medicine

Author(s)
Wang, Tong, Ph. D. Massachusetts Institute of Technology
Thumbnail
DownloadFull printable version (13.36Mb)
Alternative title
New machine learning models with applications in computational criminology, marketing, and medicine
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Cynthia Rudin.
Terms of use
MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
The revolution of "Big Data" has reached various fields like marketing, healthcare, and criminology, where domain experts wish to find and understand interesting patterns from data. This thesis studies patterns that are defined by subsets of observations or subsets of features. The first part of the thesis studies patterns defined by subsets of observations. We look at a specific type of pattern, crime series (a set of crimes committed by the same individual or group) and develop two pattern detection algorithms. The first method is a sequential pattern building algorithm called Series Finder, which resembles how crime analysts process information instinctively and grows a crime series starting from a couple of seed crimes. The second method is a subspace clustering with cluster-specific feature selection, which is supervised when learning similarity graphs in order to reduce computation. Both methods we propose achieved promising results on a decade's worth of crime pattern data collected by the Crime Analysis Unit of the Cambridge Police Department. The second part of the thesis studies patterns defined by subsets of features. We develop methods and theory for building Rule Set models with the hallmark of interpretability. Interpretability is inherent in using association rules to explain predicted results. We first design two methods for building rule sets for binary classification. The first method Bayesian Rule Set (BRS) uses a Bayesian framework with priors that favor small models. The Bayesian priors also bring significant computational benefits to MAP inferences by reducing the search space and restraining the sampling chain within appropriate regions. We apply BRS models to an in-vehicle recommender system data set we collected via Amazon Mechanical Turk to study the customers and contexts that would encourage acceptance of coupons. We develop another model Optimized Rule Set (ORS) using optimization methods to directly construct rule sets from data, without pre-mining rules or discretizing continuous attributes. As a main application of ORS, we build a diagnostic screening tool for obstructive sleep apnea trained on data provided by the Sleep Lab at Mass General Hospital. Our models achieve high accuracy with a substantial gain in interpretability over other methods. Lastly, we build a Causal Rule Set (CRS) model for causal analysis, to identify subgroups that can benefit from a treatment. CRS combines BRS and Bayesian Logistic Regression. We take advantage of different strategies in inference algorithm to speed up computation. Simulations and experiments show that distributing treatment according to CRS models enhances average treatment effect.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (pages 173-180).
 
Date issued
2016
URI
http://hdl.handle.net/1721.1/107357
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.