Pareto-optimal data compression for binary classification tasks

Tegmark, Max Erik; Wu, Tailin

dc.contributor.author	Tegmark, Max Erik
dc.contributor.author	Wu, Tailin
dc.date.accessioned	2020-05-28T15:16:20Z
dc.date.available	2020-05-28T15:16:20Z
dc.date.issued	2019-12-19
dc.date.submitted	2019-10
dc.identifier.issn	1099-4300
dc.identifier.uri	https://hdl.handle.net/1721.1/125546
dc.description.abstract	The goal of lossy data compression is to reduce the storage cost of a data set X while retaining as much information as possible about something (Y) that you care about. For example, what aspects of an image X contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping X→Z≡f(X) that maximizes the mutual information I(Z,Y) while the entropy H(Z) is kept below some fixed threshold. We present a new method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable X (an image, say) drawn from a class Y∈{1,…,n} can be distilled into a vector W=f(X)∈Rn−1 losslessly, so that I(W,Y)=I(X,Y) ; for example, for a binary classification task of cats and dogs, each image X is mapped into a single real number W retaining all information that helps distinguish cats from dogs. For the n=2 case of binary classification, we then show how W can be further compressed into a discrete variable Z=gβ(W)∈{1,…,mβ} by binning W into mβ bins, in such a way that varying the parameter β sweeps out the full Pareto frontier, solving a generalization of the discrete information bottleneck (DIB) problem. We argue that the most interesting points on this frontier are “corners” maximizing I(Z,Y) for a fixed number of bins m=2,3,… which can conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm. We find that these Pareto frontiers are not concave, and that recently reported DIB phase transitions correspond to transitions between these corners, changing the number of clusters. Keywords: information; bottleneck; compression; classification	en_US
dc.description.sponsorship	TWCF (grant no. 0322)	en_US
dc.publisher	Multidisciplinary Digital Publishing Institute	en_US
dc.relation.isversionof	10.3390/e22010007	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Multidisciplinary Digital Publishing Institute	en_US
dc.title	Pareto-optimal data compression for binary classification tasks	en_US
dc.type	Article	en_US
dc.identifier.citation	Tegmark, Max, and Tailin Wu, "Pareto-optimal data compression for binary classification tasks." Entropy 22, 1 (Dec. 2019): no. 7 doi 10.3390/e22010007 ©2019 Author(s)	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Physics	en_US
dc.contributor.department	MIT Kavli Institute for Astrophysics and Space Research	en_US
dc.relation.journal	Entropy	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2020-03-02T13:00:09Z
dspace.date.submission	2020-03-02T13:00:09Z
mit.journal.volume	22	en_US
mit.journal.issue	1	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Complete

Files in this item

Name:: entropy-22-00007-v3.pdf
Size:: 5.378Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record