Optimal classification trees

Bertsimas, Dimitris; Dunn, Jack

dc.contributor.author	Bertsimas, Dimitris
dc.contributor.author	Dunn, Jack William
dc.date.accessioned	2017-06-27T18:54:24Z
dc.date.available	2018-02-04T06:00:05Z
dc.date.issued	2017-04
dc.identifier.issn	0885-6125
dc.identifier.issn	1573-0565
dc.identifier.uri	http://hdl.handle.net/1721.1/110328
dc.description.abstract	State-of-the-art decision tree methods apply heuristics recursively to create each split in isolation, which may not capture well the underlying characteristics of the dataset. The optimal decision tree problem attempts to resolve this by creating the entire decision tree at once to achieve global optimality. In the last 25 years, algorithmic advances in integer optimization coupled with hardware improvements have resulted in an astonishing 800 billion factor speedup in mixed-integer optimization (MIO). Motivated by this speedup, we present optimal classification trees, a novel formulation of the decision tree problem using modern MIO techniques that yields the optimal decision tree for axes-aligned splits. We also show the richness of this MIO formulation by adapting it to give optimal classification trees with hyperplanes that generates optimal decision trees with multivariate splits. Synthetic tests demonstrate that these methods recover the true decision tree more closely than heuristics, refuting the notion that optimal methods overfit the training data. We comprehensively benchmark these methods on a sample of 53 datasets from the UCI machine learning repository. We establish that these MIO methods are practically solvable on real-world datasets with sizes in the 1000s, and give average absolute improvements in out-of-sample accuracy over CART of 1–2 and 3–5% for the univariate and multivariate cases, respectively. Furthermore, we identify that optimal classification trees are likely to outperform CART by 1.2–1.3% in situations where the CART accuracy is high and we have sufficient training data, while the multivariate version outperforms CART by 4–7% when the CART accuracy or dimension of the dataset is low.	en_US
dc.publisher	Springer US	en_US
dc.relation.isversionof	http://dx.doi.org/10.1007/s10994-017-5633-9	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Springer US	en_US
dc.title	Optimal classification trees	en_US
dc.type	Article	en_US
dc.identifier.citation	Bertsimas, Dimitris, and Jack Dunn. “Optimal Classification Trees.” Machine Learning 106.7 (2017): 1039–1082.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Operations Research Center	en_US
dc.contributor.mitauthor	Dunn, Jack William
dc.relation.journal	Machine Learning	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2017-06-23T03:51:27Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s)
dspace.orderedauthors	Bertsimas, Dimitris; Dunn, Jack	en_US
dspace.embargo.terms	N	en
dc.identifier.orcid	https://orcid.org/0000-0002-6936-4502
mit.license	OPEN_ACCESS_POLICY	en_US

Files in this item

Name:: 10994_2017_5633_ReferencePDF.pdf
Size:: 931.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record