Pareto-Optimal Data Compression for Binary Classification Tasks
Author(s)
Tegmark, Max Erik; Wu, Tailin
DownloadPublished version (5.378Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
The goal of lossy data compression is to reduce the storage cost of a data setXwhileretaining as much information as possible about something (Y) that you care about. For example, whataspects of an imageXcontain the most information about whether it depicts a cat? Mathematically,this corresponds to finding a mappingX→Z≡f(X)that maximizes the mutual informationI(Z,Y)while the entropyH(Z)is kept below some fixed threshold. We present a new methodfor mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retainedentropy and class information. We first show how a random variableX(an image, say) drawnfrom a classY∈ {1, ...,n}can be distilled into a vectorW=f(X)∈Rn−1losslessly, so thatI(W,Y) =I(X,Y); for example, for a binary classification task of cats and dogs, each imageXis mapped into a single real numberWretaining all information that helps distinguish cats fromdogs. For then=2 case of binary classification, we then show howWcan be further compressedinto a discrete variableZ=gβ(W)∈ {1, ...,mβ}by binningWintomβbins, in such a way thatvarying the parameterβsweeps out the full Pareto frontier, solving a generalization of the discreteinformation bottleneck (DIB) problem. We argue that the most interesting points on this frontierare “corners” maximizingI(Z,Y)for a fixed number of binsm=2, 3, ...which can convenientlybe found without multiobjective optimization. We apply this method to the CIFAR-10, MNISTand Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoreticallyoptimal image clustering algorithm. We find that these Pareto frontiers are not concave, and thatrecently reported DIB phase transitions correspond to transitions between these corners, changingthe number of clusters.
Date issued
2019-12Department
Massachusetts Institute of Technology. Department of PhysicsJournal
Entropy
Publisher
MDPI AG
Citation
Tegmark, Max and Tailin Wu. “Pareto-Optimal Data Compression for Binary Classification Tasks” Entropy, vol. 22, no. 1, 2020, pp. e22010007 © 2020 The Author(s)
Version: Final published version
ISSN
1524-2080