Abstract
Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as pro¯t and sales, °uctuate over time and form time-series data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and product-category, that form a multi-dimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of time-series in multi-dimensional market segments. This is a topic that has been largely ignored in time series and data cube research.
In this study, we examine the issues of anomaly detection in multi-dimensional time-series data. We propose time-series data cube to capture the multi-dimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, \more general" time-series. Anomaly detection in a time-series data cube poses computational challenges, especially for high-dimensional, large data sets. To this end, we also propose an e±cient search algorithm to iteratively select subspaces in the original high-dimensional space and detect anomalies within each one. Our experiments with both synthetic and real-world data demonstrate the effectiveness and effciency of the proposed solution.
Authors: Xiaolei Li, Jiawei Han, University of Illinois at UrbanaChampaign, Urbana, IL
Click here to read more (PDF).
Comment
Thanks for sharing this useful paper regarding outliers in time-series data -- a constant problem -- especially in multi-dimensional models -- thanks again.
What I like about this paper, and what does appear new to me, is the way in which the SUITS approach closely follows a clustering technique but systematically distinguishes anomalies to exclude from Euclidean distance. That is, trend, magnitude and time-delayed anomalies are specifically called out. This is non-trivial. With near linear scalability as the number of dimensions increases, it is also useful. To my eyes and ears, an approach that is non-trivial and also useful is pretty attractive. Thanks for the post!
I do not see anything new here. If you are analyzing a system with N elements, you will have a vector space, and these cubes do not seem to me more than volume elements in these N dimensions. And as the authors say this is not a new problem. May be it is the way of presenting the math to a different public what is new ...
© 2014 Data Science Central
You need to be a member of Data Science Central to add comments!
Join Data Science Central