Show simple item record

dc.contributor.advisorAlan Edelman.en_US
dc.contributor.authorChen, Alexander Yen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2017-12-20T17:24:39Z
dc.date.available2017-12-20T17:24:39Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/112835
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 109-111).en_US
dc.description.abstractAs data science is impacting more and more fields and proving to be effective in a wide variety of applications, the importance of easy-to-understand, high-performance data science tools is growing. Tools tend to exhibit one of two general forms: composable or template-based. We have researched and developed examples of each of these forms. The first project is an implementation of the D4M schema in the Julia language. This implementation has been tested to be faster than the optimized versions in both Matlab and Octave. With this combination of technology, we hope to provide an effective means to represent data and compute on them. This implementation enables an interface with the common DataFrame representation used in data science. We also implemented a D4M.jl interface with an emerging database technology, TileDB. The second project is the MEDIC framework, aiming to map the process of taking a common machine learning engine into a streaming context. We implemented two versions of our solution to the Twitter Trend Prediction problem: one in Julia and one in Spark. We have verified our solution is valid by comparing the Julia version with a previous result that is in Mr. Stanislav Nikolov's master thesis, named Trend or No Trend. We have also verified our solution with the Spark Streaming engine.en_US
dc.description.statementofresponsibilityby Alexander Y Chen.en_US
dc.format.extent111 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleTools and frameworks for data abstraction in a performance contexten_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc1015200950en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record