Pareto-Optimal Data Compression for Binary Classification Tasks

Tegmark, Max Erik; Wu, Tailin

Author(s)

Tegmark, Max Erik; Wu, Tailin

DownloadPublished version (5.378Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/

Metadata

Show full item record

Abstract

The goal of lossy data compression is to reduce the storage cost of a data setXwhileretaining as much information as possible about something (Y) that you care about. For example, whataspects of an imageXcontain the most information about whether it depicts a cat? Mathematically,this corresponds to finding a mappingX→Z≡f(X)that maximizes the mutual informationI(Z,Y)while the entropyH(Z)is kept below some fixed threshold. We present a new methodfor mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retainedentropy and class information. We first show how a random variableX(an image, say) drawnfrom a classY∈ {1, ...,n}can be distilled into a vectorW=f(X)∈Rn−1losslessly, so thatI(W,Y) =I(X,Y); for example, for a binary classification task of cats and dogs, each imageXis mapped into a single real numberWretaining all information that helps distinguish cats fromdogs. For then=2 case of binary classification, we then show howWcan be further compressedinto a discrete variableZ=gβ(W)∈ {1, ...,mβ}by binningWintomβbins, in such a way thatvarying the parameterβsweeps out the full Pareto frontier, solving a generalization of the discreteinformation bottleneck (DIB) problem. We argue that the most interesting points on this frontierare “corners” maximizingI(Z,Y)for a fixed number of binsm=2, 3, ...which can convenientlybe found without multiobjective optimization. We apply this method to the CIFAR-10, MNISTand Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoreticallyoptimal image clustering algorithm. We find that these Pareto frontiers are not concave, and thatrecently reported DIB phase transitions correspond to transitions between these corners, changingthe number of clusters.

Date issued

2019-12

URI

https://hdl.handle.net/1721.1/132190

Department

Massachusetts Institute of Technology. Department of Physics

Journal

Entropy

Publisher

MDPI AG

Citation

Version: Final published version

ISSN

1524-2080

Collections

MIT Open Access Articles