This is an archived course. A more recent version may be available at ocw.mit.edu.

Lecture 18: Stream Processing

Lectures: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23

Overview

Papers:

  • Abadi, Daniel J., Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. "Aurora: A New Model and Architecture for Data Stream Management." VLDB Journal 12, no. 2 (August 2003): 120-139. Read Sections 1-6. Note that this is not the Aurora paper in the Red Book.

Aurora is a "stream management system" for processing continuous queries over "streams" -- sequences of stock quotes, traces of network traffic, or runs of sensor data.

As you read the paper, consider the following questions:

  1. What language constructs does Aurora introduce that are not in the relational model? How are those new language features specially tailored to work with data streams?
  2. Do you think the Aurora idea of writing queries via "boxes and arrows" is a good one? How does Aurora deal with the desire to share processing of multiple queries?