Show simple item record

dc.contributor.authorTegmark, Max Erik
dc.contributor.authorWu, Tailin
dc.date.accessioned2021-09-20T18:21:17Z
dc.date.available2021-09-20T18:21:17Z
dc.date.issued2019-12
dc.identifier.issn1524-2080
dc.identifier.urihttps://hdl.handle.net/1721.1/132190
dc.description.abstractThe goal of lossy data compression is to reduce the storage cost of a data setXwhileretaining as much information as possible about something (Y) that you care about. For example, whataspects of an imageXcontain the most information about whether it depicts a cat? Mathematically,this corresponds to finding a mappingX→Z≡f(X)that maximizes the mutual informationI(Z,Y)while the entropyH(Z)is kept below some fixed threshold. We present a new methodfor mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retainedentropy and class information. We first show how a random variableX(an image, say) drawnfrom a classY∈ {1, ...,n}can be distilled into a vectorW=f(X)∈Rn−1losslessly, so thatI(W,Y) =I(X,Y); for example, for a binary classification task of cats and dogs, each imageXis mapped into a single real numberWretaining all information that helps distinguish cats fromdogs. For then=2 case of binary classification, we then show howWcan be further compressedinto a discrete variableZ=gβ(W)∈ {1, ...,mβ}by binningWintomβbins, in such a way thatvarying the parameterβsweeps out the full Pareto frontier, solving a generalization of the discreteinformation bottleneck (DIB) problem. We argue that the most interesting points on this frontierare “corners” maximizingI(Z,Y)for a fixed number of binsm=2, 3, ...which can convenientlybe found without multiobjective optimization. We apply this method to the CIFAR-10, MNISTand Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoreticallyoptimal image clustering algorithm. We find that these Pareto frontiers are not concave, and thatrecently reported DIB phase transitions correspond to transitions between these corners, changingthe number of clusters.en_US
dc.language.isoen
dc.publisherMDPI AGen_US
dc.relation.isversionofhttps://dx.doi.org/10.3390/E22010007en_US
dc.rightsCreative Commons Attribution 4.0 International licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceMDPIen_US
dc.titlePareto-Optimal Data Compression for Binary Classification Tasksen_US
dc.typeArticleen_US
dc.identifier.citationTegmark, Max and Tailin Wu. “Pareto-Optimal Data Compression for Binary Classification Tasks” Entropy, vol. 22, no. 1, 2020, pp. e22010007 © 2020 The Author(s)en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Physicsen_US
dc.relation.journalEntropyen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2020-05-19T12:47:28Z
dspace.date.submission2020-05-19T12:47:32Z
mit.journal.volume22en_US
mit.journal.issue1en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Needed


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record