MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications

Author(s)
Tsou, Ching-Huei, 1973-
Thumbnail
DownloadFull printable version (19.48Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering.
Advisor
John R. Williams.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/38578 http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
A machine learning framework is presented that supports data mining and statistical modeling of systems that are monitored by large-scale sensor networks. The proposed algorithm is novel in that it takes both observations and domain knowledge into consideration and provides a mechanism that combines analytical modeling and inductive learning. An efficient solver is presented that allow the algorithm to solve large-scale problems efficiently. The solver uses a randomized kernel that incorporates domain knowledge into support vector machine learning. It also takes advantage of the sparseness of support vectors and this allows for parallelization and online training to further speed-up of the computation. The solver can be integrated into existing systems, embedded into databases, or exposed as a web service. Understanding the data generated by large-scale system presents several problems. First, statistical modeling approaches may either under-fit or over-fit the data and are sensitive to data quality. Second, learning is a computational extensive process and often becomes intractable when the sample size exceeds several thousands.
 
(cont.) Third, learning algorithms need to be tuned to the specific problem in most engineering and business fields. Last but not least, a flexible learning framework is also not available. This work addresses these problems by presenting a methodology that combines machine learning with domain knowledge, and an efficient framework that supports the algorithm. Benchmark and practical engineering problems are used to validate the methodology.
 
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2007.
 
Includes bibliographical references (p. 102-104).
 
Date issued
2007
URI
http://dspace.mit.edu/handle/1721.1/38578
http://hdl.handle.net/1721.1/38578
Department
Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
Publisher
Massachusetts Institute of Technology
Keywords
Civil and Environmental Engineering.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.