Show simple item record

dc.contributor.authorTan, Andrew K.
dc.contributor.authorTegmark, Max
dc.contributor.authorChuang, Isaac L.
dc.date.accessioned2022-06-10T13:07:43Z
dc.date.available2022-06-10T13:07:43Z
dc.date.issued2022-05-30
dc.identifier.urihttps://hdl.handle.net/1721.1/142922
dc.description.abstractAt the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the <i>primal</i> DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.en_US
dc.publisherMultidisciplinary Digital Publishing Instituteen_US
dc.relation.isversionofhttp://dx.doi.org/10.3390/e24060771en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0en_US
dc.sourceMultidisciplinary Digital Publishing Instituteen_US
dc.titlePareto-Optimal Clustering with the Primal Deterministic Information Bottlenecken_US
dc.typeArticleen_US
dc.identifier.citationEntropy 24 (6): 771 (2022)en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Physics
dc.contributor.departmentMassachusetts Institute of Technology. Research Laboratory of Electronics
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2022-06-09T13:40:40Z
dspace.date.submission2022-06-09T13:40:40Z
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record