This is an archived course. A more recent version may be available at ocw.mit.edu.

Lecture 14: Distributed Databases

Lectures: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23

Overview

Read:

  • Amazon logo Dewitt, David, and Jim Gray. "Parallel Database Systems: The Future of High Performance Database Processing." Communications of the ACM 35, no. 6 (1992): 85-98. Also in Readings in Database Systems. San Fransisco, CA: Morgan Kaufmann, 1998. ISBN: 1558605231.

The Dewitt and Gray paper is a high level summary of database architectures for parallelism, illustrating some of the techniques that can be used to exploit the availability of multiple processors in a database system.

Questions to consider:

  1. What's the difference between a parallel and a distributed database? What issues are different in one architecture versus the other? In what ways are the two architectures alike?
  2. Why do Dewitt and Gray advocate a shared nothing architecture?
  3. In what ways must existing database architectures be modified to support multi-processor environments? What new data layout issues are introduced? What new query processing challenges must be addressed?