Subscribe to DSC Newsletter

A large volume of timestamp data is a reality, this is common when we are dealing with networked devices. Typically a network of devices generate a large number of alerts. Mining of alert dataset provides insights about the network .

Recently, I came across a situation where a business user was looking for a multidimensional visualization of timestamp data.  Data was  about a network  of thousand plus devices and alarms  generated from the devices  about the status of the network  -  alarm transaction of few million records spread over six months and each record has a timestamp. His primary questions was :

1. Do we see a bunch of alarms during a specific time of the day.

2. Is this trend is consistent across months  and/or a pattern exists by the days of the month.

First step would be to create few visualization, then probably few measures around the questions to accept or reject hypotheses conjectured earlier. What he was looking for - a better description of the alarm patterns and linking with time, months and days.   As usual there could many approaches - we can have multiple ways of slicing the data and generate a large number of visualization etc.  

The aggregated data in this case - 4,320 records divided into 3 dimensions - month, day and hour.

The dataset in question can be structured  in two  ways -  1.  series of  4,320 data points  or 2. 2D Array of  4,230 cells. Without much details we can say  single bar and/or line makes a little sense here, if we make multiple bar or charts we may  miss the interrelation between the months, days and hours.

First solution was  a modified heat map as below:

A  matrix of month-days X hour  with cell values being color coded as per the percentile buckets of the values. This picture provides a very high level view of the distribution of the events. In addition  from the picture we cannot make out  which is the exact time-hour when we the see the increased activity in the network. So  we can conclude that we need more granular view of the situation. 

I was looking for a better solution, finally with some secondary research, SO and others, developed a solution which can be thought as  visualization of data in circular coordinate system. 

In the above diagram, a  single (thirty days) month has been represented by a collection of thirty concentric circles. Each circle has been divided into 24 arcs , with specific arc representing a particular time hour - the values of signals has been  color coded as per the specific scale. If we pick a specific disc and a specific arc it would tell us - distribution of signals for a 30 days period for that specific hour. In my opinion, this granular view is a really a good visualization for the questions above.

Matplotlib has good facility of plotting in polar coordinate system - demo page few examples. Plot above is a collection of line charts in polar coordinate system. For a specific data tuple - (day, hour and value), has been converted into - rθj, rθk ,… rθp, color-code ,subsequently, plotting  a line base on this conversion. This would be repeated for 4,320 data points.

Finally, polar coordinate based visualization could be useful  in specific  situations, for timestamp data it could all the more useful.  The original mathematical approach is here

Views: 7240

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Rp on August 28, 2017 at 11:44pm
There is a complete reversal in Nov Dec visible in both the representations. But yes, the concentrics really give out a detailing!
Comment by Delasa Aghamirzaie on February 27, 2017 at 10:17am

Can you share the functions for creating circular coordinate plots in matplotlib? 

Comment by John Corradi on February 23, 2017 at 9:43am
These pages do not display properly in the browser (Chrome or IE). Graphics are missing.

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service