Show simple item record

dc.contributor.advisorSamuel R. Madden.en_US
dc.contributor.authorTaratoris, Evangelos.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2018-01-12T21:15:11Z
dc.date.available2018-01-12T21:15:11Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/113168en_US
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 79-80).en_US
dc.description.abstractThe problem of clustering multi-dimensional data has been well researched in the scientific community. It is a problem with wide scope and applications. With the rapid growth of very large databases, traditional clustering algorithms become inefficient due to insufficient memory capacity. Grid-based algorithms try to solve this problem by dividing the space into cells and then performing clustering on the cells. However these algorithms also become inefficient when even the grid becomes too large to be saved in memory. This thesis presents a new algorithm, SingleClus, that is performing clustering on a 2-dimensional dataset with a single pass of the dataset. Moreover, it optimizes the amount of disk I/0 operations while making modest use of main memory. Therefore it is theoretically optimal in terms of performance. It modifies and improves on the Hoshen-Kopelman clustering algorithm while dealing with the algorithm's fundamental challenges when operating in a Big Data setting.en_US
dc.description.statementofresponsibilityby Evangelos Taratoris.en_US
dc.format.extent80 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleA single-pass grid-based algorithm for clustering big data on spatial databasesen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1017485602en_US
dc.description.collectionM.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2019-06-17T20:35:52Zen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record