| dc.contributor.advisor | Aude Oliva. | en_US |
| dc.contributor.author | Yan, Tom, M. Eng. Massachusetts Institute of Technology | en_US |
| dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
| dc.date.accessioned | 2018-12-11T20:39:24Z | |
| dc.date.available | 2018-12-11T20:39:24Z | |
| dc.date.copyright | 2017 | en_US |
| dc.date.issued | 2017 | en_US |
| dc.identifier.uri | http://hdl.handle.net/1721.1/119541 | |
| dc.description | Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. | en_US |
| dc.description | This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. | en_US |
| dc.description | Cataloged from student-submitted PDF version of thesis. | en_US |
| dc.description | Includes bibliographical references (pages 37-39). | en_US |
| dc.description.abstract | The goal of the project is to build a large scale video dataset called Moments, and train existing/novel models for action recognition. To aid automation of video collection and annotation selection, I trained Convolutional Neural Network models to estimate the likelihood of a desired action appearing in video clips. Selecting clips, which are highly probable to contain the wanted action, for annotation leads to a more efficient process overall with higher yield. Once a sizable dataset had been amassed, I investigated new multi-modal models that make use of different (spatial, temporal, auditory) signals in the video. I also conducted preliminary experiments into several promising directions that Moments opens up, including multi-label training. Lastly, I trained baseline models on Moments to calibrate the performance of existing techniques. Post-training, I diagnosed the shortcomings of the models and visualized videos that were found to be particularly difficult. I discovered that the difficulty largely arises due to the great variety in quality/perspective/subjects found in Moments videos. This highlights the challenging nature of the dataset and its value to the research community. | en_US |
| dc.description.statementofresponsibility | by Tom Yan. | en_US |
| dc.format.extent | 39 pages | en_US |
| dc.language.iso | eng | en_US |
| dc.publisher | Massachusetts Institute of Technology | en_US |
| dc.rights | MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. | en_US |
| dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
| dc.subject | Electrical Engineering and Computer Science. | en_US |
| dc.title | Large scale video action understanding | en_US |
| dc.type | Thesis | en_US |
| dc.description.degree | M. Eng. | en_US |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| dc.identifier.oclc | 1076271908 | en_US |