MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Sifter : a generalized, efficient, and scalable big data corpus generator

Author(s)
Wu, Sherwin Zhang
Thumbnail
DownloadFull printable version (5.735Mb)
Alternative title
Generalized, efficient, and scalable big data corpus generator
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Samuel Madden.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Big data has reached the point where the volume, velocity, and variety of data place significant limitations on the computer systems which process and analyze them. Working with very large data sets has becoming increasingly unweildly. Therefore, our goal was to create a system that can support efficient extraction of data subsets to a size that can be manipulated on a single machine. Sifter was developed as a big data corpus generator for scientists to generate these smaller datasets from an original larger one. Sifter's three-layer architecture allows for client users to easily create their own custom data corpus jobs, while allowing administrative users to easily integrate additional core data sets into Sifter. This thesis presents the implemented Sifter system deployed on an initial Twitter dataset. We further show how we added support for a secondary MIMIC medical dataset, as well as demonstrate the scalability of Sifter with very large datasets.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
 
Cataloged from PDF version of thesis.
 
Includes bibliographical references (page 61).
 
Date issued
2015
URI
http://hdl.handle.net/1721.1/100684
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.