Show simple item record

dc.contributor.advisorAude Oliva.en_US
dc.contributor.authorYan, Tom, M. Eng. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2018-12-11T20:39:24Z
dc.date.available2018-12-11T20:39:24Z
dc.date.copyright2017en_US
dc.date.issued2017en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/119541
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionCataloged from student-submitted PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 37-39).en_US
dc.description.abstractThe goal of the project is to build a large scale video dataset called Moments, and train existing/novel models for action recognition. To aid automation of video collection and annotation selection, I trained Convolutional Neural Network models to estimate the likelihood of a desired action appearing in video clips. Selecting clips, which are highly probable to contain the wanted action, for annotation leads to a more efficient process overall with higher yield. Once a sizable dataset had been amassed, I investigated new multi-modal models that make use of different (spatial, temporal, auditory) signals in the video. I also conducted preliminary experiments into several promising directions that Moments opens up, including multi-label training. Lastly, I trained baseline models on Moments to calibrate the performance of existing techniques. Post-training, I diagnosed the shortcomings of the models and visualized videos that were found to be particularly difficult. I discovered that the difficulty largely arises due to the great variety in quality/perspective/subjects found in Moments videos. This highlights the challenging nature of the dataset and its value to the research community.en_US
dc.description.statementofresponsibilityby Tom Yan.en_US
dc.format.extent39 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleLarge scale video action understandingen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc1076271908en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record