Representing and querying regression models in a relational database management system
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Samuel Madden and Hari Balakrishnan.
MetadataShow full item record
Curve fitting is a widely employed, useful modeling tool in several financial, scientific, engineering and data mining applications, and in applications like sensor networks that need to tolerate missing or noisy data. These applications need to both fit functions to their data using regression, and pose relational-style queries over regression models. Unfortunately, existing DBMSs are ill suited for this task because they do not include support for creating, representing and querying functional data, short of brute-force discretization of functions into a collection of tuples. This thesis describes FunctionDB, a novel DBMS that extends the state of the art. FunctionDB treats functions output by regression as first-class citizens that can be queried declaratively and manipulated like traditional database relations. The key contributions of FunctionDB are a compact, algebraic representation for regression models as piecewise functions, and an algebraic query processor that executes declarative queries directly on this representation as combinations of algebraic operations like function inversion, zero finding and symbolic integration. FunctionDB is evaluated on two real world data sets: measurements from a temperature sensor network, and traffic traces from cars driving on Boston roads. The results show that operating in the functional domain has substantial accuracy advantages (over 15% for some queries) and order of magnitude (10x-100x) performance gains over existing approaches that represent models as discrete collections of points. The thesis also describes an algorithm to maintain regression models online, as new raw data is inserted into the system. The algorithm supports a sustained insertion rate of the order of a million records per second, while generating models no less compact than a clairvoyant (offline) strategy.
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 77-79).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.