MIT OpenCourseWare
  • OCW home
  • Course List
  • about OCW
  • Help
  • Feedback
  • Support MIT OCW

Readings

Postscript viewer software, such as Ghostscript/Ghostview, can be used to view the .ps files in this section. File decompression software, such as Winzip® or StuffIt®, is required to open the .gz files in this section.
lec # TOPICS READINGS
1 Introduction and Overview Lee, Lillian. "I'm sorry Dave, I'm afraid I can't do that: Linguistics, Statistics, and Natural Language Processing circa 2001." To appear in The National Academies' study on the Fundamentals of Computer Science. (PDF)

Babel Fish Translation

Columbia Newsblaster
2 Basic Language Statistics; Zipf's Law Manning, Christopher, and Hinrich Schutze. Foundations of Statistical Natual Language Processing. Cambridge, MA: MIT Press, 1999, Section 1.4. ISBN: 0262133601.

Ando, Rie Kubota, and Lillian Lee. "Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji." First Conference of the NAACL, 2000, pp. 241-248. (PDF)

Banko, Michel, and Eric Brill. "Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing." First Conference of HLT, 2001. (PDF)

Li, Wentian. "Random texts exhibit Zipf's-law-like word frequency distribution." IEEE Transactions on Information Theory 38, no. 6 (1992): 1842-1845.

Keller, Frank, Maria Lapata, and Olga Ourioupina. "Using the Web to Overcome Data Sparseness." Conference on EMNLP, 2002. (PDF)

Zipf, George Kinglsey. Human Behavior and the Principle of Least Effort. Cambridge, MA: Addison-Wesley Press, 1949.
3 Language Models; Smoothed Estimation Jurafsky, Daniel, and James Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Prentice Hall, 2000, Section 6. ISBN: 0130950696.

Brown, P. F., V. J. Della Pietra, P. V. deSouza, J. C. Lai, and R. L. Mercer. "Class-based n-gram models of natural language." Computational Linguistics 18, no. 4 (1990): 467-479. (PDF)

Brown, P. F., V. J. Della Pietra, S.A. Della Pietra, J. C. Lai, and R. L. Mercer. "An estimate for an upper bound for the entropy of English." Computational Linguistics 18, no. 1 (1992): 31-40. (PDF)

Chen, Stanley F., and Joshua Goodman. "An Empirical Study of Smoothing Techniques for Language Modeling." The ACL Conference, 1996, pp. 310-318. (PDF)

Chen, Stanley F., and Ronald Rosenfeld. "A Survey of Smoothing Techniques for ME Models." IEEE Transactions on Speech and Audio Processing 8, no. 1 (2000): 37-50.

Chuch, Kenneth W., and William A. Gale. "A Comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams." Computer Speech and Language 5 (1991): 19-55.

Jelinek, Frederick. Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998. ISBN: 0262100665.

Lee, Lillian. "Similarity-Based Approaches to Natural Language Processing." Ph.D. Thesis, Harvard, 1997.
4 Tagging; Transformation Based Learning; HMM Taggers Manning, Christopher, and Hinrich Schutze. Foundations of Statistical Natual Language Processing. Cambridge, MA: MIT Press, 1999, Section 10. ISBN: 0262133601.

Brill, Eric. "Transformation-based error-driven learning and natural language processing: A case study of part-of-speech tagging." Computational Linguistics 21, no. 4 (1995): 543-566. (PDF - 1.4 MB)

———. "Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging." Workshop on Very Large Corpora, p. 1995. (PDF)

Charniak, E., C. Hendrickson, N. Jacobson, and M. Perkowitz. "Equations for part-of-speech tagging." AAAI Conference, 1993, pp. 784-789.
5 Maximum Entropy Tagger Manning, Christopher, and Hinrich Schutze. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999, Section 16.2.1. ISBN: 0262133601.

Ratnaparkhi, Adwait. "A Maximum Entropy Part-Of-Speech Tagger." EMNLP Conference, 1986.

———. "Maximum Entropy Models for Natural Language Ambiguity Resolution." Ph.D. Dissertation. University of Pennsylvania, pp. 6-18.

Darroch, J. N., and D. Ratcliff. "Generalized Iterative Scaling for Log-linear Models." The Annals of Mathematical Statistics 43, no. 5 (1972): 1470-1480.
6 Introduction to Syntax; Probabilistic Context Free Grammars Slides of Mike Collins (PS)

Jurafsky, Daniel, and James Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Prentice Hall, 2000, Section 9. ISBN: 0130950696.

Booth, Taylor L., and Richard A. Thompson. "Applying probability measures to abstract languages." IEEE Transactions on Computers C-22 (1973): 442-450.

Gold, Mark E. "Language identification in the limit." Information and Control 10 (1967): 447-474.
7 Syntactic Parsing Slides of Mike Collins (PS)

Collins, Michael. "Three Generative, Lexicalised Models for Statistical Parsing." ACL Conference, 1997. (PDF)
8 Introduction to EM Durbin, R., S. Eddy, A. Krogh and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press, 1998. ISBN: 0521620414.
9 Unsupervised Grammar Induction Manning, Christopher, and Hinrich Schutze. Foundations of Statistical Natual Language Processing. Cambridge, MA: MIT Press, 1999, Section 11.3.4, 11.4. ISBN: 0262133601.

Carrol, Glenn, and Eugene Charniak. Two experiments on Learning Probabilistic Dependency Grammars from Corpora. Technical Report CS-92-16, Brown University, 1992.

Chomsky, Noam. Rules and Representations. Oxford, UK: Basil Blackwell, 1980, p. 34. ISBN: 0231048270.

Clark, Alexander. "Unsupervised induction of stochastic context-free grammars using distributional clustering." Conference on Natural Language Learning, 2001. (PDF)

Gold, Mark E. "Language identification in the limit." Information and Control 10, (1967): 447-474.

Horning, James J. "A study of grammatical inference." Ph.D. Thesis, Stanford, 1969.

Lari, K., and S. J. Young. "The estimation of stochastic context-free grammars using the Inside-Outside algorithm." Computer Speech and Language 4 (1990): 35-56.

Manning C. "Grammar Induction: can one do unsupervised learning of linguistic structure? (And why is it hard)."

Pullum, Geoffrey K. "Learnability, Hyperlearning, and the Poverty of the Stimulus." Annual Meeting of the Berkeley Linguistics Society, 1996.

Pereira, Fernando, and Yves Schabes. "Inside-Outside reestimation from Partially Bracketed Corpora." ACL Conference, 1992, pp. 128-135. (PDF)

Stolcke, Andreas, and Stephen Omohundro. Best-first Model Merging for Hidden Markov Model Induction. Technical Report, Berkeley, 1994. (PS)

Zaanen, Menno van. "ABL: Alignment-Based Learning." COLING 2000, pp. 961-967. (PDF)
10 Distributional Similarity; Clustering Manning, Christopher, and Hinrich Schutze. Foundations of Statistical Natual Language Processing. Cambridge, MA: MIT Press, 1999, Section 14. ISBN: 0262133601.

Jurafsky, Daniel, and James Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition . Upper Saddle River, NJ: Prentice Hall, 2000, Section 16.2. ISBN: 0130950696.

Brown, P. F., V. J. Della Pietra, P. V. deSouza, J. C. Lai, and R. L. Mercer. "Class-based n-gram models of natural language." Computational Linguistics 18, no. 4 (1990): 467-479. (PDF)

Fellbaum, C., ed. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998. ISBN: 026206197X.

Pereira, Fernando, Naftali Tishby, and Lillian Lee. "Distributional clustering of English words." ACL Conference, 1993, pp. 183-190. (PDF)
11 Distributional Similarity (cont.) Manning, Christopher, and Hinrich Schutze. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999, Section 14, 15.4. ISBN: 0262133601.
12 Word Sense Disambiguation; Co-training Manning, Christopher, and Hinrich Schutze. Foundations of Statistical Natual Language Processing. Cambridge, MA: MIT Press, 1999, Section 7. ISBN: 0262133601.

Yarowsky, David. "Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French." ACL, 1994, pp. 88-95. (PDF)

———. "Unsupervised word sense disambiguation rivaling supervised methods." ACL, 1995, pp. 189-196. (PDF)

McCarthy, Diana, Rob Koeling, Juliee Weeds, and John Carroll. "Finding Predominant Word Senses in Untagged Text." ACL, 2004, pp. 280-287.
13 Text Segmentation Hearst, Marti. "Multi-paragraph segmentation of expository text." Proceedings of the ACL, 1994, pp. 9-16. (PDF)

Pevzner, Lev, and Marti Hearst. "A Critique and Improvement of an Evaluation Metric for Text Segmentation." Computational Linguistics (1994): 9-16. (PDF)

Passonneau, Rebecca J., and Diane J. Litman. "Intention-based Segmentation: Human Reliability and correlation with linguistic cues." Proceedings of the ACL, 1993, pp. 148-155. (PDF)

Grosz, Barbara, and Julia Hirschberg. "Some intonational charachteristics of discourse." Proceedings of the ICSLP, 1992. (PS)

Galley, Michel, Kathleen McKeown, Eric Fosler-Lussier, and Hongyan Jing. "Discourse Segmentation of Multi-Party Conversation." Proceedings of the ACL, 2003.
14 Learning Discourse Structure Teufel, Simone, and Marc Moens. "What's yours and what's mine: Determining Intellectual Attribution in Scientific Text." Proceedings of the EMNLP and VLC, 2000. (PDF)

Duboue, Pablo, and Kathleen R. McKeown. "Empirically Estimating Order Constraints for Content Planning in Generation." Proceedings of the ACL/EACL, 2001. (PDF)

Barzilay, Regina, and Lillian Lee. "Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization." Proceedings of the NAACL/HLT, 2004.
15 Rhetorical Parsing Mann, William, and Sandra Thompson. "Rhetorical Structure Theory: Toward a functional thoery of text organization." Text 8, no. 3 (1988): 243-281.

Marcu, Daniel. "Building Up Rhetorical Structure Trees." Proceedings of the AAAI, 1996. (PS)

Marcu, Daniel, and Abdessamad Echihabi. "An Unsupervised Approach to Recognizing Discourse Relations." Proceedings of the ACL/NAACL, 2002. (PDF)
16 Text Summarization Kupiec, Julian, Jan O. Pedersen, and Francine Chen. "A Trainable Document Summarizer." Proceedings of the SIGIR, 1995.

Melamed, Dan. "A Portable Algorithm for Mapping Bitext Correspondence." Proceedings of ACL, 1997. (GZ)

Gale, William A., and Kenneth Ward Church. "A Program for Aligning Sentences in Bilingual Corpora." Proceedings of the ACL, 1991.

Marcu, Daniel. "The automatic construction of large-scale corpora for summarization research." Proceedings of the SIGIR, 1999. (PS)

Barzilay, Regina, and Noemie Elhadad. "Sentence Alignment for Monolingual Comparable Corpora." Proceedings of EMNLP, 2003. (PS)

Document Understanding Conference (2004)
17 Text Summarization (cont.)
18 Midterm
19 Information Retrieval
20-22 Machine Translation
23-24 Project Presentations