Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections

Cutting, Douglass R; Karger, David R; Pedersen, Jan O; Tukey, John W

Author(s)

Cutting, Douglass R; Karger, David R; Pedersen, Jan O; Tukey, John W

DownloadAccepted version (220.9Kb)

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval. We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs docum-ent clustering as its primary operation. We also present fast (linear time) clustering algorithm.

Date issued

2017

URI

https://hdl.handle.net/1721.1/134850

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Journal

ACM SIGIR Forum

Publisher

Association for Computing Machinery (ACM)

Collections

MIT Open Access Articles