MIT OpenCourseWare
  • OCW home
  • Course List
  • about OCW
  • Help
  • Feedback
  • Support MIT OCW

Syllabus

Organization of the Course: Lectures and Labs

2 1-hour lectures per week

2 labs per week

Students organized into groups of 2 or 3. Laboratory sessions are usually 2-3 hours. Sets of readings for each lab, to be read before class. Some readings in text. Others will be handed out.

Lecture will cover background material pertinent to lab, in these areas:

  • The acoustics and acoustic analysis of speech
  • The physiology of speech production
  • Sentence-level phenomena
  • The perception of speech
  • Speech disorders
  • Speech synthesis and speech recognition

Written Requirements

Brief report on each lab: Summarize and interpret the results you got, details on method not necessary. Report handed in on next class day after the lab.

Short term paper: Select topic about middle of semester, proposing limited experiment; oral report on results during last few class meetings; submit written version (in format of Journal of the Acoustical Society) at end of term.

No exam: Grade based about 40% on term paper, rest on lab reports and participation in class.

Other Reference Books

  • Flanagan, J. L. Speech Analysis, Synthesis and Perception. 2nd ed. Berlin, NY: Springer-Verlag, 1972. ISBN: 000839925.
  • Chomsky, N., and M. Halle. The Sound Pattern of English. New York, NY: Harper and Row, 1968. ASIN: B000F5MWH4.
  • Beranek, L. Acoustics. New York, NY: McGraw-Hill, 1954. ISBN: 0070048355.
  • O'Shaughnessy, D. Speech Communications: Human and Machine. 2nd ed. Wiley-IEEE Press, 2000. ISBN: 0780334493.
  • Hardcastle, W. J., and J. Laver, eds. The Handbook of Phonetic Sciences. Oxford, UK: Blackwell Publishers, 1997. ISBN: 0631188487.
  • Kent, Raymond D., Bishnu S. Atal, and Joanne L. Miller, eds. Papers in Speech Communication: Speech Production. New York, NY: Acoustical Society of America, 1991. ISBN: 003185876.
  • Miller, Joanne L., Raymond D. Kent, and Bishnu S. Atal, eds. Papers in Speech Communication: Speech Perception. New York, NY: Acoustical Society of America, 1991. ISBN: 003185879.
  • Atal, Bishnu S., Joanne L. Miller, and Raymond D. Kent, eds. Papers in Speech Communication: Speech Processing. New York, NY: Acoustical Society of America, 1991. ISBN: 0883189607.

The Speech Chain

Study of speech often summarized as study of chain of events, beginning with what goes on in a speaker's brain to plan an utterance, moving through the acoustics of speech and ending with the steps in the listener's brain that result in comprehension of the utterance:

Speech chain graphic.

This approach makes clear the diversity of topics one must understand in some depth in order to do basic research in speech.

Diversity of Topics That Relate to Speech Research

Linguistics

  • Semantics The meanings of words, and relations among them.
  • Syntax The order of words, role of function words.
  • Phonology Individual phonemic segments, features, stressed and unstressed vowels.
  • For example,
    • What is the phonemic inventory of English?
    • How does it function? The concept of contrast (e.g., pat vs. bat).
    • Why do we believe that it is psychologically real?
    • Why does the same phoneme give rise to different, acoustic realizations in different utterances? (E.g., In fluent speech, "Joe ate his soup" loses the /h/ of "his", and the /t/ of "ate" doesn't look like a /t/ in "Tom".)
    • What are the principles that lead to modifications of segments in different environments?
    • How are phonemes translated into phonetic representations, usually described in terms of features? (E.g., /z/ is + voiced, /s/ is - voiced; same relation for many pairs, like f-v; patterning of sounds is beautifully captured by feature concept.)
  • Want to tie topics in linguistics to data from acoustic analysis, mechanics of human speech production, and human speech processing, but often this is not easy.

Physiology of Speech Production

Structures capable of generating and modifying speech sounds

Acoustics

  • General (sound sources; vowels made by vibration of vocal folds; some sounds are produced with turbulence noise source; differences among vowels by change in size and shape of the vocal tract.)
  • Resonant properties of the vocal tract.
  • Sound propagation at the lips.
  • Acoustics of speech sounds traveling through air.

Acoustic Phonetics

Description of important attributes of speech sounds, especially English sounds; also prosody (i.e., durations, fundamental frequency of vocal fold vibration, amplitude.)

Auditory Nervous System

  • Peripheral Middle and inner ear; recent progress from recording signals from ear of cat, we know a good deal about the coding there, has implications about which of the acoustic properties of speech sounds can be discriminated.
  • Central We know little here, mostly rely on psychophysics and cognitive psychology.

Psychophysics and Cognitive Psychology

Studies of people's response to simple and complex sounds (tells a lot but also leaves much unknown).

Summary of Topics Important to Speech Research

  • Linguistics
  • Physiology of speech production and perception systems
  • General acoustics
  • Acoustic characteristics of speech sounds
  • Psychophysics of auditory system
  • Cognitive psychology
  • Computer and computer algorithms In this course, stress use of computer algorithms, and their individual strengths and weaknesses, rather than the mathematics of the algorithms themselves (refs available).

Term Projects

A. Introduction

  • 1: Organization; the speech chain. Recording speech in a sound-treated room; acoustic theory; SPL; dB. Digitizing, waveform editing and spectral analysis by computer; sampling theorem, waveform windowing.

B. Acoustic Analysis of Vowels and Consonants

  • 2: Broadband spectral analysis of English vowels; vocal tract transfer functions for vowels; nasalization of vowels. Ch. 3, Ch. 6.
  • 3: Broadband spectral analysis of sonorant English consonants: nasals, liquids, glides; sound sources and transfer functions for consonants. Ch. 9.
  • 4: Sound generation from turbulence in the vocal tract; spectral analysis of fricative and stop consonants; frication noise and aspiration noise. Ch. 7, Ch. 8.
  • 5: Sound generation at the larynx; inverse filtering and spectrum of glottal source; effect of glottal source on spectra of vowels. Ch. 2.

C. Quantal Theory and Features

  • 6: Quantal nature of articulatory-to acoustic relations; quantal theory for consonant place of articulation and for vowels.

D. Sentence-level Phenomena

  • 7: Vowel and consonant durations in sentences; duration rules for English.
  • 8: Some reduction and assimilation phenomena in fluent speech; effects of stress. Ch. 10.
  • 9: Sentence prosody; measurement and interpretation of fundamental frequency contours, respiratory constraints.

E. Speech Perception

  • 10: Evaluation of segmental intelligibility; intelligibility, comprehension, naturalness, cognitive load for words in sentences.

F. Speech Disorders

  • 11: Speech disorders. Analysis and interpretation of the speech of children with articulation disorders.
  • 12: Speech disorders. Analysis of consonants, vowels, and temporal characteristics of speakers with neuromotor disorders.

G. Speech Movements, Airflow

  • 13: Interpretation of cineradiographic motion pictures; anatomy/physiology of speech production apparatus. Ch. 1.
  • 14: Studies of speech movement; analysis of movements of jaw and lips for stop consonants. Ch. 1.
  • 15: Mouth pressure, flow, and respiration during speech; analysis and interpretation of flows and pressures during selected consonants. Ch. 1.

H. Speech Synthesis and Recognition

  • 16: Speech synthesis using a formant synthesizer; review acoustic theory of speech production, formant synthesis; synthesizing syllables.
  • 17: Higher-level synthesis with a formant synthesizer, using quasi-articulatory parameters.
  • 18: Continuation of speech synthesis labs.
  • 19: Topic selection for individual term project research; each student describes proposed research.
  • 20: Use of landmarks and features for speech recognition; labeling of sentences; rules for feature modification.

I. Term Projects

  • 21: Individual term project research; no lecture/lab.
  • 22: Individual term project research; no lecture/lab.
  • 23: Individual term project research; no lecture/lab.
  • 24: Student oral reports on term project results.
  • 25: Student oral reports on term project results.
  • 26: Student oral reports on term project results.