Hierarchical Clustering: Objective Functions and Algorithms

Cohen-addad, Vincent; Kanade, Varun; Mallmann-Trenn, Frederik; Mathieu, Claire

dc.contributor.author	Cohen-addad, Vincent
dc.contributor.author	Kanade, Varun
dc.contributor.author	Mallmann-Trenn, Frederik
dc.contributor.author	Mathieu, Claire
dc.date.accessioned	2022-10-21T17:38:48Z
dc.date.available	2019-06-27T18:01:44Z
dc.date.available	2022-10-21T17:38:48Z
dc.date.issued	2019-06-05
dc.date.submitted	2019-03
dc.identifier.issn	0004-5411
dc.identifier.issn	1557-735X
dc.identifier.uri	https://hdl.handle.net/1721.1/121430.2
dc.description.abstract	Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, Dasgupta framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a “good” hierarchical clustering is one that minimizes a particular cost function [23]. He showed that this cost function has certain desirable properties: To achieve optimal cost, disconnected components (namely, dissimilar elements) must be separated at higher levels of the hierarchy, and when the similarity between data elements is identical, all clusterings achieve the same cost. We take an axiomatic approach to defining “good” objective functions for both similarity- and dissimilarity-based hierarchical clustering. We characterize a set of admissible objective functions having the property that when the input admits a “natural” ground-truth hierarchical clustering, the ground-truth clustering has an optimal value. We show that this set includes the objective function introduced by Dasgupta. Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better and faster algorithms for hierarchical clustering. We also initiate a beyond worst-case analysis of the complexity of the problem and design algorithms for this scenario.	en_US
dc.language.iso	en_US
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1145/3321386	en_US
dc.rights	Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.	en_US
dc.source	ACM	en_US
dc.title	Hierarchical Clustering: Objective Functions and Algorithms	en_US
dc.type	Article	en_US
dc.identifier.citation	Cohen-Addad, Vincent et al. "Hierarchical Clustering: Objective Functions and Algorithms." Journal of the ACM 66, 4 (June 2019): 26 © 2019 The Authors	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.approver	Mallmann-Trenn, Frederik	en_US
dc.relation.journal	Journal of the ACM	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.embargo.terms	N	en_US
dspace.date.submission	2019-04-04T10:37:40Z
mit.journal.volume	66	en_US
mit.journal.issue	4	en_US
mit.license	PUBLISHER_POLICY	en_US
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3321386.pdf
Size:: 937.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/121430.2*	2022-10-21T17:36:55Z	File update: final published version
1	1721.1/121430	2019-06-27T18:01:44Z

DSpace@MIT

Hierarchical Clustering: Objective Functions and Algorithms

Files in this item

This item appears in the following Collection(s)

Version History