MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Multistream Articulatory Feature-Based Models for Visual Speech Recognition

Author(s)
Glass, James R.; Saenko, Ekaterina; Livescu, Karen; Darrell, Trevor J.
Thumbnail
DownloadSaenko-2009-Multistream Articulatory Feature-Based Models for Visual Speech Recognition.pdf (837.4Kb)
PUBLISHER_POLICY

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
We study the problem of automatic visual speech recognition (VSR) using dynamic Bayesian network (DBN)-based models consisting of multiple sequences of hidden states, each corresponding to an articulatory feature (AF) such as lip opening (LO) or lip rounding (LR). A bank of discriminative articulatory feature classifiers provides input to the DBN, in the form of either virtual evidence (VE) (scaled likelihoods) or raw classifier margin outputs. We present experiments on two tasks, a medium-vocabulary word-ranking task and a small-vocabulary phrase recognition task. We show that articulatory feature-based models outperform baseline models, and we study several aspects of the models, such as the effects of allowing articulatory asynchrony, of using dictionary-based versus whole-word models, and of incorporating classifier outputs via virtual evidence versus alternative observation models.
Date issued
2009-09
URI
http://hdl.handle.net/1721.1/60293
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Journal
IEEE Transactions on Pattern Analysis and Machine Intelligence
Publisher
Institute of Electrical and Electronics Engineers
Citation
Saenko, K. et al. “Multistream Articulatory Feature-Based Models for Visual Speech Recognition.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 31.9 (2009): 1700-1707. ©2009 IEEE.
Version: Final published version
Other identifiers
INSPEC Accession Number: 10773214
ISSN
0162-8828

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.