Show simple item record

dc.contributor.advisorDaniela Rus.en_US
dc.contributor.authorXiang, Chongyuanen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2017-01-12T18:19:00Z
dc.date.available2017-01-12T18:19:00Z
dc.date.copyright2016en_US
dc.date.issued2016en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/106394
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 77-80).en_US
dc.description.abstractToday is a new era of big data. We contribute our personal data for the common good simply by using our smart phones, searching the web and doing online transactions. Researchers, companies and governments use the collected data to learn various user behavior patterns and make impactful decisions based on that. Is it possible to publish and run queries on those databases without disclosing information about any specific individual? Differential privacy is a strong notion of privacy which guarantees that very little will be learned about individual records in the database, no matter what the attackers already know or wish to learn. Still, there is no practical system applying differential privacy algorithms for clustering points on real databases. This thesis describes the construction of small coresets for computing k-means clustering of a set of points while preserving differential privacy. As a result, it gives the first 𝑘-means clustering algorithm that is both differentially private, and has an approximation error that depends sub-linearly on the data’s dimension d. Previous results introduced errors that are exponential in d. This thesis implements this algorithm and uses it to create differentially private location data from GPS tracks. Specifically the algorithm allows clustering GPS databases generated from mobile nodes, while letting the user control the introduced noise due to privacy. This thesis also provides experimental results for the system and algorithms, and compares them to existing techniques. To the best of my knowledge, this is the first practical system that enables differentially private clustering on real data.en_US
dc.description.statementofresponsibilityby Chongyuan Xiang.en_US
dc.format.extent80 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titlePrivate k-means clustering : algorithms and applicationsen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc967666900en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record