Show simple item record

dc.contributor.authorGoh, Siong Thye
dc.contributor.authorRudin, Cynthia
dc.date.accessioned2015-10-05T16:19:53Z
dc.date.available2015-10-05T16:19:53Z
dc.date.issued2014-08
dc.identifier.isbn9781450329569
dc.identifier.urihttp://hdl.handle.net/1721.1/99143
dc.description.abstractThe vast majority of real world classification problems are imbalanced, meaning there are far fewer data from the class of interest (the positive class) than from other classes. We propose two machine learning algorithms to handle highly imbalanced classification problems. The classifiers are disjunctions of conjunctions, and are created as unions of parallel axis rectangles around the positive examples, and thus have the benefit of being interpretable. The first algorithm uses mixed integer programming to optimize a weighted balance between positive and negative class accuracies. Regularization is introduced to improve generalization performance. The second method uses an approximation in order to assist with scalability. Specifically, it follows a \textit{characterize then discriminate} approach, where the positive class is characterized first by boxes, and then each box boundary becomes a separate discriminative classifier. This method has the computational advantages that it can be easily parallelized, and considers only the relevant regions of feature space.en_US
dc.description.sponsorshipSiemens Corporationen_US
dc.language.isoen_US
dc.publisherAssociation for Computing Machinery (ACM)en_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/2623330.2623648en_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alikeen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/en_US
dc.sourcearXiven_US
dc.titleBox drawings for learning with imbalanced dataen_US
dc.typeArticleen_US
dc.identifier.citationSiong Thye Goh and Cynthia Rudin. 2014. Box drawings for learning with imbalanced data. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14). ACM, New York, NY, USA, 333-342.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Operations Research Centeren_US
dc.contributor.departmentSloan School of Managementen_US
dc.contributor.mitauthorGoh, Siong Thyeen_US
dc.contributor.mitauthorRudin, Cynthiaen_US
dc.relation.journalProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '14)en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsGoh, Siong Thye; Rudin, Cynthiaen_US
dc.identifier.orcidhttps://orcid.org/0000-0001-7563-0961
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record