Subscribe to Dr. Granville's Weekly Digest

Good paper on multidimensional outlier detection on time series

Abstract 

Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as pro¯t and sales, °uctuate over time and form time-series data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research.

In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose time-series data cube to capture the multi-dimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, \more general" time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an e±cient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and effciency of the proposed solution.

Authors: Xiaolei Li, Jiawei Han, University of Illinois at UrbanaChampaign, Urbana, IL

Click here to read more (PDF).

Views: 782

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by William J McKibbin on February 26, 2014 at 6:59am

Thanks for sharing this useful paper regarding outliers in time-series data -- a constant problem -- especially in multi-dimensional models -- thanks again.

Comment by Kevin Kautz on February 15, 2014 at 8:29am

What I like about this paper, and what does appear new to me, is the way in which the SUITS approach closely follows a clustering technique but systematically distinguishes anomalies to exclude from Euclidean distance. That is, trend, magnitude and time-delayed anomalies are specifically called out. This is non-trivial. With near linear scalability as the number of dimensions increases, it is also useful. To my eyes and ears, an approach that is non-trivial and also useful is pretty attractive. Thanks for the post!

Comment by Hilda cerdeira on February 14, 2014 at 1:01pm

I do not see anything new here. If you are analyzing a system with N elements, you will have a vector space, and these cubes do not seem to me more than volume elements in these N dimensions. And as the authors say this is not a new problem. May be it is the way of presenting the math to a different public what is new ...

Follow Us

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service