MIT OpenCourseWare
  • OCW home
  • Course List
  • about OCW
  • Help
  • Feedback
  • Support MIT OCW

Readings

Most readings are from the "Red Book", otherwise known as Readings in Database Systems, 4th edition, edited by Michael Stonebraker and Joseph Hellerstein, Cambridge: MIT Press, January 2005, ISBN: 0262693143. This book is available from the MIT Press. Note that this version of the book is substantially different than the 3rd edition and includes a number of new readings (and articles that are not otherwise available).

Lec # TOPICS READING QUESTIONS READINGS
Fundamentals
1 Introduction
2 The Relational Model and SQL Stonebraker, Michael, and Joseph Hellerstein. "What Goes Around Comes Around." In Red Book.
Copies will be made available on the first day of class. Read Sections 1-4, 9, 10, and 11.

Codd, E. F. "A relational model of data for large shared data banks." Communications of the ACM (1970).
Focus on Sections 1.3 and all of Section 2.
3 Logical Design and Physical Database Fundamentals The Entity-Relationship Model -- toward a unified view of data.
We will not discuss this paper in class -- I have included it because ER modeling is a technique that is widely used in database design that may be useful as a personal reference.

Hillyer, Mike. An Introduction to Database Normalization.
Note that this is an extremely informal discussion of the topic of database normalization, and does not cover all of the issues in depth. It is useful as a reference to the different types of normalization we discussed in class.

Vitter, J. "External Memory Algorithms and Data Structures: Dealing with massive data." ACM Computing Surveys 33, no. 2 (2001): 209-271.
Read sections 1-3, 5, 9, and 10 -- we will revisit R-Trees (Section 11) later in the course.
4 Introduction to Modern Relational Database Systems Hellerstein, Joseph, and Michael Stonebraker. "The Anatomy of a Database." In Red Book.
Read Sections 1-4; we will return to Section 5 when we discuss concurrency control and recovery.

Astrahan, M. M., et. al. "System R: Relational Approach to Database Management." ACM TODS 1, no. 2 (1976): 97-137.

Query Processing
5 Optimization Fundamentals Selinger, Patricia, M. Astrahan, D. Chamberlin, Raymond Lorie, and T. Price. "Access Path Selection in a Relational Database Management System." In Red Book. Proceedings of ACM SIGMOD. 1979, pp. 22-34.
We will discuss most of this paper next time in class -- read the whole thing, paying careful attention to the cost estimation methods and dynamic programming based optimization algorithm.

Mannino, Michael, Paichen Chu, and Thomas Sager. "Statistical Profile Estimation in Database Systems." ACM Computing Surveys 20, no. 3 (1988): 191-221.
This paper is optional -- it discusses many of the techniques that are used to make query optimization and cost estimation practical in modern database systems. Will cover some of the ideas at a high level in class.
6 Join Algorithms and Memory Management Shapiro, L. D. "Join Processing in Database Systems with Large Main Memories." In Red Book.
This paper presents a thorough discussion of different strategies for computing a join, including methods that take advantage of hash and btree indices. We will discuss the tradeoffs between join strategies in class, so think carefully about the pros and cons of each approach.

Sacco, G. M., and M. Schkolnick. "Buffer Management in Relational Database Systems." TODS 11, no. 4 (1986): 473-489.
We will not discuss this paper in detail in class and you do not need to read it front-to-back, but you should be familiar with the basic idea behind hot-set based buffer management. For the extra-curious, the classic paper in this area is from Chou and Dewitt in 1978, called "An Evaluation of Buffer Management Strategies for Relational Database Systems" -- it proposes a hot-set model and a evaluates a buffer management strategy called DBMIN that is similar to the strategy proposed in this paper (as far as I can tell, the DBMIN paper is not available in digital format from ACM -- it can be found elsewhere on the Web.)
7 Indexing Reading Questions 7 (PDF) Hellerstein, Joseph M., Jeffrey F. Naughton, and Avi Pfeffer. "Generalized Search Trees for Database Systems." In Red Book. VLDB. 1995.
Rather than focusing on the details of the different tree types discussed in this paper, try to understand the basic GiST abstraction.

Beckman, Norbert, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. "The R*-tree: An efficient and robust access method for points and rectangles." In Red Book. SIGMOD. 1990.
R/R* trees are an important generalization of B-trees. Make sure you understand what types of queries they are useful for, and how their properties differ from those of B-trees.
8 Distributed Databases Reading Questions 8 (PDF) Dewitt, David, and Jim Gray. "Parallel Database Systems: The Future of High Performance Database Processing." In Red Book. Communications of the ACM. 1992.
This is a good general introduction to parallel database systems and the issues involved in their design.

Mackert, Lothar F., and Guy M. Lohman. "R* Optimizer Validation and Performance Evaluation for Distributed Queries." In Red Book. VLDB. 1986.
9 Data Warehouses Reading Questions 9 (PDF) Chaudhuri, Surajit, and Umeshwar Dayal. "An overview of data warehousing and OLAP technology." In Red Book. SIGMOD Record. Vol. 26, no. 1, 1997.

Gray, Jim, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, and Murali Venkatrao. "DataCube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals." In Red Book. Data Mining and Knowledge Discovery. 1997.

O'Neil, Patrick, and Dallan Quass. "Improved Query Performance with Variant Indices." In Red Book. ACM SIGMOD. 1997.
This paper is optional, but it's worth reading to at least get a sense of the unusual types of indices that are often used in warehouses. The overview paper mentions this type of index briefly.
10 Extensibility and Object Databases Reading Questions 10 (PDF) Stonebraker, Michael. "Inclusion of New Types in Relational Database Systems." In Red Book. Proceedings of ICDE. 1986.

Lamb, Charles, Gordon Landis, Jack Orenstein, and Dan Weinreb. "The ObjectStore Database System." Communications of the ACM 34, no. 10, 1991.
Transactions
11 Introduction to Transactions Franklin, Michael. "Concurrency Control and Recovery." In The Handbook of Computer Science and Engineering. Edited by A. Tucker. CRC Press, Boca Raton, 1997. (PDF)

Review 6.033 Chapter 8.

There are no reading questions for this paper -- it's a fairly accessible overview, which you should read and understand.
12 Transactions Part 2 Reading Questions 12 (PDF) Haerder, Theo, and Andreas Reuter. "Principles of Transaction Oriented Database Recovery." ACM Computing Surverys 15, no. 4, 1983.
13 Locking and Consistency Reading Questions 13 (PDF) Gray, Jim, et. al. "Granularity of Locking and Degrees of Consistency in a Shared Data Base" In Red Book. Modeling in Data Base Management Systems. Edited by G. M. Nijssen. 1976.
14 Optimistic Concurrency Control Reading Questions 14 (PDF) Kung, H. T., and John T. Robinson. "On Optimistic Methods for Concurrency Control." In Red Book. TODS. June 1981.

Agrawal, Rakesh, Mike Carey, and Miron Livny. "Concurrency Control Performance Modeling: Alternatives and Implications." In Red Book. ACM Transactions On Database Systems. Vol. 12, no. 4, 1987.
(Note that the paper that appears in the Red Book does not contain pages 613-622 -- this is intentional and you shouldn't read them if you get the paper from elsewhere.) You also do not need to read Section 6.
Exam 1
15 Distributed Transactions and Replication Reading Questions 15 (PDF) Mohan, C., Bruce Lindsay, and R. Obermarck. "Transaction Management in the R* Distributed Database Management Systems." In Red Book. ACM Transactions On Database Systems. Vol. 11, no. 4, 1986.

Gray, Jim, et. al. "The Dangers of Replication and a Solution." In Red Book. ACM SIGMOD. 1996.
Networked Data Management
16 Web Services Reading Questions 16 (PDF) Brewer, Eric. "Combining Systems and Databases: A Search Engine Retrospective." In Red Book.

Jacobs, Dean. "Data Management in Application Servers." In Red Book.
17 Semistructured Data Reading Questions 17 (PDF) Bergholz, Andre. "Extending your Markup: An XML Tutorial." IEEE Internet Computing (August 2003). (PDF)
This is a basic introduction to XML, DTDs, and XML schemas. If you are already familiar with XML, you do not need to read it.

Goldman, Ray, and Jennifer Widom. "Query Optimization for XML." VLDB (1999). (PDF)

Hunter, Jason. X is For XQuery. Oracle Technology Network.
This provides a very lightweight introduction to the XQuery language, an emerging standard for XML query processing.
18 Continuous Queries Reading Questions 18 (PDF) Chen, Jianjun, David DeWitt, Fend Tian, and Yuan Wang. "NiagaraCQ: A Scalable Continuous Query System for the Internet Databases." In Red Book. ACM SIGMOD. 2000.

Hanson, Eric N., Chris Carnes, Lan Huang, Mohan Konyala, Lloyd Noronha, Sashi Parthasarathy, J. B. Park, and Albert Vernon. "Scalable Trigger Processing." In Red Book. ICDE. 1999.
19 Adaptive Query Processing Reading Questions 19 (PDF) Avnur, Ron, and Joseph M. Hellerstein. "Eddies: Continuously Adaptive Query Processing." In Red Book. SIGMOD Conference. 2000.

Deshpande, Amol. "An Initial Study of Overheads of Eddies." SIGMOD Record 33, no. 1, (March 2004). (PDF)
20 Stream Databases Reading Questions 20 (PDF) Abadi, Daniel J., Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. "Aurora: a New Model and Architecture for Data Stream Management." VLDB Journal 12, no. 2 (August 2003): 120-139. (PDF)
Read Sections 1-6. Note that this is not the Aurora paper in the Red Book.
Advanced Topics
21 Data Mining and Sequence Queries Reading Questions 21 (PDF) Lerner, Alberto, and Dennis Shasha. "AQuery: A Query Language for Ordered Data, Optimization Techniques, and Experiments." VLDB. 2003. (PDF)
22 Approximate Querying Reading Questions 22 (PDF) Hellerstein, Joseph, Ron Avnur, and Vijayshankar Raman. "Informix under CONTROL: Online Query Processing. Data Mining and Knowledge Discovery." In Red Book. Vol. 12, 2000, pp. 281-314.
Exam 2
23 Consistency and Availability in Data Streams Reading Questions 23 (PDF) Hwang, Jeong-Hyon, Magdalena Balazinska, Alexander Rasin, Ugur Cetinemel, Mike Stonebraker, and Stan Zdonik. "High-Availability Algorithms for Distributed Stream Processing." ICDE. 2005. (PDF)
24 Last day of Class Presentations