MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Scalable kernel density estimation-based local outlier detection over large data streams

Author(s)
Qin, X; Cao, L; Rundensteiner, EA; Madden, S
Thumbnail
DownloadPublished version (2.196Mb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/
Metadata
Show full item record
Abstract
© 2019 Copyright held by the owner/author(s). Local outlier techniques are known to be effective for detecting outliers in skewed data, where subsets of the data exhibit diverse distribution properties. However, existing methods are not well equipped to support modern high-velocity data streams due to the high complexity of the detection algorithms and their volatility to data updates. To tackle these shortcomings, we propose local outlier semantics that operate at an abstraction level by leveraging kernel density estimation (KDE) to effectively detect local outliers from streaming data. A strategy to continuously detect top-N KDE-based local outliers over streams is designed, called KELOS – the first linear time complexity streaming local outlier detection approach. The first innovation of KELOS is the abstract kernel center-based KDE (aKDE) strategy. aKDE accurately yet efficiently estimates the data density at each point – essential for local outlier detection. This is based on the observation that a cluster of points close to each other tend to have a similar influence on a target point’s density estimation when used as kernel centers. These points thus can be represented by one abstract kernel center. Next, the KELOS’s inlier pruning strategy early prunes points that have no chance to become top-N outliers. This empowers KELOS to skip the computation of their data density and of the outlier status for every data point. Together aKDE and the inlier pruning strategy eliminate the performance bottleneck of streaming local outlier detection. The experimental evaluation demonstrates that KELOS is up to 6 orders of magnitude faster than existing solutions, while being highly effective in detecting local outliers from streaming data.
Date issued
2019
URI
https://hdl.handle.net/1721.1/137525
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Journal
Advances in Database Technology - EDBT
Publisher
OpenProceedings.org
Citation
Qin, X, Cao, L, Rundensteiner, EA and Madden, S. 2019. "Scalable kernel density estimation-based local outlier detection over large data streams." Advances in Database Technology - EDBT, 2019-March.
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.