A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications

Tsou, Ching-Huei, 1973-

dc.contributor.advisor	John R. Williams.	en_US
dc.contributor.author	Tsou, Ching-Huei, 1973-	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering.	en_US
dc.date.accessioned	2008-11-10T19:49:09Z
dc.date.available	2008-11-10T19:49:09Z
dc.date.copyright	2007	en_US
dc.date.issued	2007	en_US
dc.identifier.uri	http://dspace.mit.edu/handle/1721.1/38578	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/38578
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2007.	en_US
dc.description	Includes bibliographical references (p. 102-104).	en_US
dc.description.abstract	A machine learning framework is presented that supports data mining and statistical modeling of systems that are monitored by large-scale sensor networks. The proposed algorithm is novel in that it takes both observations and domain knowledge into consideration and provides a mechanism that combines analytical modeling and inductive learning. An efficient solver is presented that allow the algorithm to solve large-scale problems efficiently. The solver uses a randomized kernel that incorporates domain knowledge into support vector machine learning. It also takes advantage of the sparseness of support vectors and this allows for parallelization and online training to further speed-up of the computation. The solver can be integrated into existing systems, embedded into databases, or exposed as a web service. Understanding the data generated by large-scale system presents several problems. First, statistical modeling approaches may either under-fit or over-fit the data and are sensitive to data quality. Second, learning is a computational extensive process and often becomes intractable when the sample size exceeds several thousands.	en_US
dc.description.abstract	(cont.) Third, learning algorithms need to be tuned to the specific problem in most engineering and business fields. Last but not least, a flexible learning framework is also not available. This work addresses these problems by presenting a methodology that combines machine learning with domain knowledge, and an efficient framework that supports the algorithm. Benchmark and practical engineering problems are used to validate the methodology.	en_US
dc.description.statementofresponsibility	by Ching-Huei Tsou.	en_US
dc.format.extent	104 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/38578	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Civil and Environmental Engineering.	en_US
dc.title	A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
dc.identifier.oclc	156281156	en_US

Files in this item

Name:: 156281156-MIT.pdf
Size:: 19.48Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record