Challenge of the week: identifying patterns in complex time series

Test your data science skills!

Our new challenge is about identifying periodicity (simple or multiple) and especially periodic peaks occurring in each cycle, in the attached spreadsheet, after taking into account seasonality, outliers (e.g. Christmas day), noise, and messy data.

Cyclic peaks, download spreadsheet for detailed data

We know what these cyclic peaks are, because they are caused by our actions, whose purpose is precisely - assuming it works as expected - to create these peaks. Here we ask you to

  • Find the pattern about these peaks
  • Determine the smallest time window needed to really detect these peaks and their cycle
  • Detect how tall above baseline (or valley floor) these peaks are
  • Whether these peaks are getting smaller over time or not, compared with the growing baseline
  • Find outliers
  • Find when these regular peaks started
  • In each cycle, on average, which peak is stronger?
  • How much peaks contribute to overall traffic?

To know about the cause of this phenomenon, as well as what the data is about, and download a much bigger, multi-dimensional data set related to the same time series, go to our members-only page where the solution is provided. Previous challenges of the week can be found here.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 10826

Reply to This

Replies to This Discussion

I see go to our members-only page but I am unable to see the solution out there. Where is it in there?

Dr. Vincent, Can you please let me know where is the solution for this one. I see this to be a typical problem in web data and eager to understand the cyclical patterns.

Answer is in item #4 in the members-only page.



I am sorry, I couldn't locate that page. Could you send the link please.

Dr. Vincent Granville said:

Answer is in item #4 in the members-only page.



Howdy All,

I created a couple plots in R: geom_line with a geom_boxplot on top.  You can see a a peak on Monday and Thursday corresponding to DSC emails, a dip on Saturday where people mow their lawns and do laundry.  You can also see from the lines a shift upwards in page counts as the year progresses.  There are only two outlier points.  

Also, looking at/plotting the page views delta's (today minus yesterday) will help to discover the patterns.

Finally, we also do a Friday blast (usually called Good Friday Reading) , but it is a much smaller one in terms of reach, not done each week, and Friday being a relatively low day, you don't see it in the data.

Thanks for this interesting challenge!

http://www.datasciencecentral.com/profiles/blogs/how-we-combined-di.... I implemented regression model following the trend+seasonal component strategy. 

Interesting how the differenced variables removed any sort of overall trend and made seasonal patterns easier to identify.


© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service