Show simple item record

dc.contributor.advisorKenneth Haase.en_US
dc.contributor.authorCahn, Janet E. (Janet Elizabeth)en_US
dc.date.accessioned2005-09-27T20:51:57Z
dc.date.available2005-09-27T20:51:57Z
dc.date.copyright1999en_US
dc.date.issued1999en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/29142
dc.descriptionThesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts & Sciences, 1999.en_US
dc.descriptionIncludes bibliographical references (p. 209-226).en_US
dc.description.abstractThis thesis links processing in working memory to prosody in speech, and links different working memory capacities to different prosodic styles. It provides a causal account of prosodic differences and an architecture for reproducing them in synthesized speech. The implemented system mediates text-based information through a model of attention and working memory. The main simulation parameter of the memory model quantifies recall. Changing its value changes what counts as given and new information in a text, and therefore determines the intonation with which the text is uttered. Other aspects of search and storage in the memory model are mapped to the remainder of the continuous and categorical features of pitch and timing, producing prosody in three different styles: for small recall values, the exaggerated and sing-song melodies of children's speech; for mid-range values, an adult expressive style; for the largest values, the prosody of a speaker who is familiar with the text, and at times sounds bored or irritated. In addition, because the storage procedure is stochastic, the prosody from simulation to simulation varies, even for identical control parameters. As with with human speech, no two renditions are alike. Informal feedback indicates that the stylistic differences are recognizable and that the prosody is improved over current offerings. A comparison with natural data shows clear and predictable trends although not at significance. However, a comparison within the natural data also did not produce results at significance. One practical contribution of this work is a text mark-up schema consisting of relational annotations to grammatical structures. Another is the product - varied and plausible prosody in synthesized speech. The main theoretical contribution is to show that resource-bound cognitive activity has prosodic correlates, thus providing a rationale for the individual and stylistic differences in melody and rhythm that are ubiquitous in human speech.en_US
dc.description.statementofresponsibilityby Janet Elizabeth Cahn.en_US
dc.format.extent226 p.en_US
dc.format.extent20859955 bytes
dc.format.extent20859711 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/pdf
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectArchitecture. Program in Media Arts and Sciencesen_US
dc.titleA computational memory and processing model for prosodyen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentProgram in Media Arts and Sciences (Massachusetts Institute of Technology)
dc.identifier.oclc42638539en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record