This is an archived course. A more recent version may be available at ocw.mit.edu.

Lecture 8: Query Optimization

Lectures: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23

Overview

In this lecture, we will discuss the problem of query optimization, focusing on the algorithms proposed in the classic "Selinger" paper.

Read the following paper:

  • Selinger, Patricia, M. Astrahan, D. Chamberlin, Raymond Lorie, and T. Price. "Access Path Selection in a Relational Database Management System." In Proceedings of ACM SIGMOD, Boston, MA, 1979, pp. 22-34. Also in Amazon logo Readings in Database Systems. San Fransisco, CA: Morgan Kaufmann, 1998. ISBN: 1558605231.
  • Optionally, you may also wish to look at: Mannino, Michael, Paichen Chu, and Thomas Sager. "Statistical Profile Estimation in Database Systems." ACM Computing Surveys 20, no. 3 (1988): 191-221. This paper discusses many of the techniques that used to make query optimization and cost estimation practical in modern database systems. We will cover some of the ideas at a high level in class.

As you read, think about and come to class prepared to answer the following questions:

  • The Selinger paper claims to be 'optimal'. Under what assumptions is this optimality true? Can you think of a situation in which Selinger will definitely be non-optimal?
  • Query optimization is highly dependent on the effectiveness of cost estimation. The cost metrics that Selinger proposes are very simple; how would you make them more sophisticated? What is the impact of more sophisticated cost metrics on the performance of a database system?