Scalable fair clustering

Indyk, Piotr; Onak, Krzysztof; Vakilian, Ali; Wagner, Tal

Author(s)

Indyk, Piotr; Onak, Krzysztof; Vakilian, Ali; Wagner, Tal

DownloadAccepted version (1.667Mb)

Open Access Policy

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

We study the fair variant of the classic k-median problem introduced by Chierichetti et al. (Chierichetti et al., 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard k-median problem while ensuring that all clusters have an "approximately equal" number of points of each color. Chierichetti et al. proposed a two-phase algorithm for fair k-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.

Date issued

2019-06

URI

https://hdl.handle.net/1721.1/129421

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

36th International Conference on Machine Learning, ICML 2019

Publisher

International Machine Learning Society

Citation

Backurs, Arturs et al. “Scalable fair clustering.” Paper presented at the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, June 10, 2019 - June 15, 2019, International Machine Learning Society © 2019 The Author(s)

Version: Author's final manuscript

ISSN

2640-3498

Collections

MIT Open Access Articles