Visual classification of co-verbal gestures for gesture understanding
Author(s)
Campbell, Lee Winston
DownloadFull printable version (7.366Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Architecture. Program in Media Arts and Sciences.
Advisor
Aaron F. Bobick.
Terms of use
Metadata
Show full item recordAbstract
A person's communicative intent can be better understood by either a human or a machine if the person's gestures are understood. This thesis project demonstrates an expansion of both the range of co-verbal gestures a machine can identify, and the range of communicative intents the machine can infer. We develop an automatic system that uses realtime video as sensory input and then segments, classifies, and responds to co-verbal gestures made by users in realtime as they converse with a synthetic character known as REA, which is being developed in parallel by Justine Cassell and her students at the MIT Media Lab. A set of 670 natural gestures, videotaped and visually tracked in the course of conversational interviews and then hand segmented and annotated according to a widely used gesture classification scheme, is used in an offline training process that trains Hidden Markov Model classifiers. A number of feature sets are extracted and tested in the offline training process, and the best performer is employed in an online HMM segmenter and classifier that requires no encumbering attachments to the user. Modifications made to the REA system enable REA to respond to the user's beat and deictic gestures as well as turntaking requests the user may convey in gesture. (cont.) The recognition results obtained are far above chance, but too low for use in a production recognition system. The results provide a measure of validity for the gesture categories chosen, and they provide positive evidence for an appealing but difficult to prove proposition: to the extent that a machine can recognize and use these categories of gestures to infer information not present in the words spoken, there is exploitable complementary information in the gesture stream.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001. Includes bibliographical references (leaves 86-92).
Date issued
2001Department
Program in Media Arts and Sciences (Massachusetts Institute of Technology)Publisher
Massachusetts Institute of Technology
Keywords
Architecture. Program in Media Arts and Sciences.