Syllabus

Amazon logo Help support MIT OpenCourseWare by shopping at Amazon.com! MIT OpenCourseWare offers direct links to Amazon.com to purchase the books cited in this course. Click on the book titles and purchase the book from Amazon.com, and MIT OpenCourseWare will receive up to 10% of all purchases you make. Your support will enable MIT to continue offering open access to MIT courses.

6.867 Machine Learning

Fall 2002

This introductory course on machine learning will give an overview of many techniques and algorithms in machine learning, beginning with topics such as simple perceptrons and ending up with more recent topics such as boosting, support vector machines, hidden Markov models, and Bayesian networks. The course will give the student the basic ideas and intuition behind modern machine learning methods as well as a bit more formal understanding of how and why they work. The underlying theme in the course is statistical inference as this provides the foundation for most of the methods covered.

Here is a partial list of concepts, methods and tools dealt with in this course:

Recommended Prerequisites

The course assumes some basic knowledge of probability theory and linear algebra;
for example, you should be somewhat familiar with

  • Joint and marginal probability distributions
  • Normal (Gaussian) distribution
  • Expectation and variance
  • Statistical correlation and statistical independence

Problems, Concepts, Methods, and Tools

Listed here are important topics we intend to cover in the course. The list is not all inclusive.

Problems

  • Regression and classification
  • Active learning
  • Feature selection
  • Density estimation
  • Clustering
  • Model selection
  • Inference

Concepts

  • Estimation, bias, variance, loss, Empirical risk, maximum likelihood
  • Generalization, overfitting
  • Regularization
  • Capacity, VC-dimension
  • Generative/discriminative models
  • Representation, model structure
  • Minimum description length

Models and Methods

  • linear regression, additive models
  • Generalized Linear Models
  • Neural networks
  • Support Vector Machine (SVM)
  • Boosting
  • Mixture models, mixtures of experts
  • Kernel density estimation
  • Markov chain/processes
  • Hidden Markov Models (HMM)
  • Belief networks, Markov random fields

Tools

  • Cross-validation
  • Gradient descent
  • Quadratic programming
  • EM algorithm
  • Forward-backward algorithm
  • Junction tree algorithm
  • Gibbs sampling

Lectures

Two lectures / week
1.5 hours / lecture

Recitations

Two recitations / week
1 hour / recitation

Problem Sets

There will be a total of 6 problem sets, due roughly every two weeks. The content of the problem sets will vary from theoretical questions to more applied problems. You are encouraged to collaborate with other students while solving the problems but you will have to turn in your own solution. Copying will not be tolerated. If you collaborate, you must indicate all of your collaborators.

Each problem set will be graded by a group of students with the guidance of your TA. Each problem set will be graded in a single grading session, usually on the first weekday after it is due, starting at 5pm. Every student is required to participate in one grading session.You should sign up for grading by contacting the TA, by email or in person; doing it early increases the chances of getting the preferred grading schedule. Students who do not register for grading by the third week of the course, will be assigned to a problem set by us.

If you drop the class after signing up for a grading session, please be sure to let us know so we can keep track of students available for grading. If you add the class during the term, please remember to sign up for grading.

Exams

  • Midterm, in class
  • Final exam, in class

Project

You are required to complete a class project. The choice of the topic is up to you so long as it clearly pertains to the course material. To ensure that you are on the right track, you will have to submit a one paragraph description of your project a month before the project is due. Similarly to problem sets, you are encouraged to collaborate on the project. We expect a four page write-up about the project, which should clearly and succintly describe the project goal, methods, and your results. Each group should submit only one copy of the write-up and include all the names of the group members. The projects will be graded on the basis of your understanding of the overall course material (not based on, e.g., how brilliantly your method works). The scope of the project is about 1-2 problem sets.

The projects can be literature reviews, theoretical derivations or analyses, applications of machine learning methods to problems you are interested in, or something else.

Grading

Your overall grade will be determined roughly as follows: Midterm 15%, Problem sets 30%, Final 25%, Project 30%

Text

There are three useful texts for this course; each covers some part of the class material, as well as things outside of the scope of the class.

Jordan, M., and Bishop, C.Introduction to Graphical Models. Draft version, accessible within MIT only.

Bishop, C. Neural Networks for Pattern Recognition. New York, NY: Oxford University Press, 1996. ISBN: 0198538642.

Duda, R. O. Hart, P. E. D., and Stork, G.Pattern Classification. New York, NY: Wiley, 2000. ISBN: 0471056693.

You will not be able to find all the course material in the text nor do we plan to go through the chapters in order or in full. You are responsible for the material covered in lectures and problem sets, as well as in the chapters/sections of the text specifically indicated. The weekly recitations/tutorials will be helpful in understanding the material and solving the homework problems.