Show simple item record

dc.contributor.advisorCharles E. Leiserson and Vijay Gadepally.en_US
dc.contributor.authorHutchinson, Matthew S.,M. Eng.Massachusetts Institute of Technology.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2020-09-15T21:56:33Z
dc.date.available2020-09-15T21:56:33Z
dc.date.copyright2020en_US
dc.date.issued2020en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/127411
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020en_US
dc.descriptionCataloged from the official PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 91-99).en_US
dc.description.abstractOver the past few years, there has been significant interest in video action recognition systems and models. However, direct comparison of accuracy and computational performance results remain clouded by differing training environments, hardware specifications, hyperparameters, pipelines, and inference methods. Additionally, the literature demonstrates a fixedness on late fusion approaches to audio-video multimodal problems. This project provides a side-by-side comparison of several 2-Dimensional Convolutional Neural Network (2D-CNN) video action recognition approaches and investigates the effectiveness and efficiency of new audio-video early fusion, slicing, and sampling methods. Model accuracy is evaluated using standard Top-1 and Top-5 metrics in addition to novel p-ROC metrics, and this project demonstrates the usefulness of the latter. Computational performance is measured via total training time and training time per epoch on a variety of high-performance computing (HPC) training configurations.en_US
dc.description.statementofresponsibilityby Matthew S. Hutchinson.en_US
dc.format.extent99 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleApplying high performance computing to early fusion video action recognitionen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1192561136en_US
dc.description.collectionM.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2020-09-15T21:56:32Zen_US
mit.thesis.degreeMasteren_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record