Show simple item record

dc.contributor.advisorDaniela Rus.en_US
dc.contributor.authorVolkov, Mikhail, Ph. D. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2017-04-05T16:00:43Z
dc.date.available2017-04-05T16:00:43Z
dc.date.copyright2016en_US
dc.date.issued2016en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/107865
dc.descriptionThesis: Ph. D. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 160-174).en_US
dc.description.abstractIn this thesis, we develop a family of real-time data reduction algorithms for large data streams, by computing a compact and meaningful representation of the data called a coreset. This representation can then be used to enable efficient analysis such as segmentation, summarization, classification, and prediction. Our proposed algorithms support large streams and datasets that axe too large to store in memory, allow easy parallelization, and generalize to different data types and analyses. We discuss some of the challenges that arise when dealing with real Big Data systems. Such systems are designed to routinely process unseen, possibly unbounded, data streams; are expected to perform reliably, online, in real-time, in the presence of noise, and under many performance and bandwidth limitations; and are required to produce results that are provably close to optimal. We will motivate the need for new data reduction techniques, in the form of theoretical and practical open problems in computer science, robotics, and medicine, and show how coresets can help to overcome these challenges and enable us to build several practical systems that meet these specifications. We propose a theoretical framework for constructing several coreset algorithms that efficiently compress the data while preserving its semantic content. We provide an efficient construction of our algorithms and present several systems that are capable of handling unbounded, real-time data streams, and are easily scalable and parallelizable. Finally, we demonstrate the performance of our systems with numerous experimental results on a variety of data sources, from financial price data to laparoscopic surgery video.en_US
dc.description.statementofresponsibilityby Mikhail Volkov.en_US
dc.format.extent174 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleMachine learning and coresets for automated real-time data segmentation and summarizationen_US
dc.typeThesisen_US
dc.description.degreePh. D. in Computer Science and Engineeringen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc976168223en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record