dc.contributor.author | Tegmark, Max Erik | |
dc.contributor.author | Wu, Tailin | |
dc.date.accessioned | 2021-09-20T18:21:17Z | |
dc.date.available | 2021-09-20T18:21:17Z | |
dc.date.issued | 2019-12 | |
dc.identifier.issn | 1524-2080 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/132190 | |
dc.description.abstract | The goal of lossy data compression is to reduce the storage cost of a data setXwhileretaining as much information as possible about something (Y) that you care about. For example, whataspects of an imageXcontain the most information about whether it depicts a cat? Mathematically,this corresponds to finding a mappingX→Z≡f(X)that maximizes the mutual informationI(Z,Y)while the entropyH(Z)is kept below some fixed threshold. We present a new methodfor mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retainedentropy and class information. We first show how a random variableX(an image, say) drawnfrom a classY∈ {1, ...,n}can be distilled into a vectorW=f(X)∈Rn−1losslessly, so thatI(W,Y) =I(X,Y); for example, for a binary classification task of cats and dogs, each imageXis mapped into a single real numberWretaining all information that helps distinguish cats fromdogs. For then=2 case of binary classification, we then show howWcan be further compressedinto a discrete variableZ=gβ(W)∈ {1, ...,mβ}by binningWintomβbins, in such a way thatvarying the parameterβsweeps out the full Pareto frontier, solving a generalization of the discreteinformation bottleneck (DIB) problem. We argue that the most interesting points on this frontierare “corners” maximizingI(Z,Y)for a fixed number of binsm=2, 3, ...which can convenientlybe found without multiobjective optimization. We apply this method to the CIFAR-10, MNISTand Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoreticallyoptimal image clustering algorithm. We find that these Pareto frontiers are not concave, and thatrecently reported DIB phase transitions correspond to transitions between these corners, changingthe number of clusters. | en_US |
dc.language.iso | en | |
dc.publisher | MDPI AG | en_US |
dc.relation.isversionof | https://dx.doi.org/10.3390/E22010007 | en_US |
dc.rights | Creative Commons Attribution 4.0 International license | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_US |
dc.source | MDPI | en_US |
dc.title | Pareto-Optimal Data Compression for Binary Classification Tasks | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Tegmark, Max and Tailin Wu. “Pareto-Optimal Data Compression for Binary Classification Tasks” Entropy, vol. 22, no. 1, 2020, pp. e22010007 © 2020 The Author(s) | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Physics | en_US |
dc.relation.journal | Entropy | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2020-05-19T12:47:28Z | |
dspace.date.submission | 2020-05-19T12:47:32Z | |
mit.journal.volume | 22 | en_US |
mit.journal.issue | 1 | en_US |
mit.license | PUBLISHER_CC | |
mit.metadata.status | Authority Work and Publication Information Needed | |