dc.contributor.author | Song, Yale | |
dc.contributor.author | Morency, Louis-Philippe | |
dc.contributor.author | Davis, Randall | |
dc.date.accessioned | 2014-04-11T14:20:52Z | |
dc.date.available | 2014-04-11T14:20:52Z | |
dc.date.issued | 2012-10 | |
dc.identifier.isbn | 9781450314671 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/86099 | |
dc.description.abstract | Multimodal human behavior analysis is a challenging task due to the presence of complex nonlinear correlations and interactions across modalities. We present a novel approach to this problem based on Kernel Canonical Correlation Analysis (KCCA) and Multi-view Hidden Conditional Random Fields (MV-HCRF). Our approach uses a nonlinear kernel to map multimodal data to a high-dimensional feature space and finds a new projection of the data that maximizes the correlation across modalities. We use a multi-chain structured graphical model with disjoint sets of latent variables, one set per modality, to jointly learn both view-shared and view-specific sub-structures of the projected data, capturing interaction across modalities explicitly. We evaluate our approach on a task of agreement and disagreement recognition from nonverbal audio-visual cues using the Canal 9 dataset. Experimental results show that KCCA makes capturing nonlinear hidden dynamics easier and MV-HCRF helps learning interaction across modalities. | en_US |
dc.description.sponsorship | United States. Office of Naval Research (Grant N000140910625) | en_US |
dc.description.sponsorship | National Science Foundation (U.S.) (Grant IIS-1118018) | en_US |
dc.description.sponsorship | National Science Foundation (U.S.) (Grant IIS-1018055) | en_US |
dc.description.sponsorship | United States. Army Research, Development, and Engineering Command | en_US |
dc.language.iso | en_US | |
dc.relation.isversionof | http://dx.doi.org/10.1145/2388676.2388684 | en_US |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | MIT web domain | en_US |
dc.title | Multimodal human behavior analysis: Learning correlation and interaction across modalities | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Yale Song, Louis-Philippe Morency, and Randall Davis. 2012. Multimodal human behavior analysis: learning correlation and interaction across modalities. In Proceedings of the 14th ACM international conference on Multimodal interaction (ICMI '12). ACM, New York, NY, USA, 27-30. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.contributor.mitauthor | Song, Yale | en_US |
dc.contributor.mitauthor | Davis, Randall | en_US |
dc.relation.journal | Proceedings of the 14th ACM international conference on Multimodal interaction (ICMI '12) | en_US |
dc.eprint.version | Author's final manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dspace.orderedauthors | Song, Yale; Morency, Louis-Philippe; Davis, Randall | en_US |
dc.identifier.orcid | https://orcid.org/0000-0001-5232-7281 | |
mit.license | OPEN_ACCESS_POLICY | en_US |
mit.metadata.status | Complete | |