.

There are many ways to deal with time-data. Sometimes one can use it as time-series to take possible trends into account. Sometimes this is not possible because time can not be arranged in a sequence. For example, if there are just weekdays (1 to 7) in a dataset over several month. In this case one could use one-hot-encoding. However, considering minutes or seconds of a day one-hot-encoding might lead to high complexity. Another approach is to make time cyclical. This approach leads to a lower dimensionality and (maybe) a better predictive power.

But what does cyclical time mean?

Cyclical time is a concept that assumes that time is a circle. Sunday 12:00 midnight is the same as Monday 0:00 am. The end of the last second of the previous year is the same as the start of the first second of the next year etc. Whether this approach is appropriate one can prove by cross validation or the correlation with the target feature.

How to make data cyclical?

In many blog-posts it is recommended to use a sine function. For example, for seven weekdays like (blue):

*value = sin(2π*time/7)*

or from zero to zero over one week (green):

*value = sin(2π*time/14)*

or two cycles over the week (red):

*value = sin(2π*time/3.5)*

Let’s take a look at the function “value”:

figure created by wzgrapher

But how many cycles are there over a specific time period?

It depends on the case – and the data. If there is a solid theory why there must be 2, 4, or 8 cycles; perfect. Nevertheless, one must check how many cycles are appropriate.

Furthermore, it is not clear when the cycle starts. In case of a week on Monday, Sunday, or Saturday...?

A good idea is to shift the start (respectively the end) of the period (week). For example, making Saturday to the first day of the week. (Don’t forget to validate which shift is the best)

Now, let me introduce an alternative approach which sometimes leads to better results than a sine function.

Starting with a three-dimensional function:

*cycle = time^2+value^2*

solve to "value":

*value = (cycle - time^2)^0.5*

Let’s look at “value” by setting cycle equal to one = red, two = blue, three = green:

figure created by wzgrapher

This function has two important features:

1) The diameter is determined by “cycle”

2) It is a circle that the radius is diameter / 2

So, to adjust the function to seven weekdays one must find a diameter of seven:

*0 = (cycle - time^2)^0.5*

(because diameter is the width at value = 0)

Now solve it to “cycle”:

*cycle = time^2*

With respect to radius = diameter / 2 we must set time = (max_time) / 2. ** Note: (max_time ) / 2 = radius**. For weekdays 7/2:

*cycle = radius^2*

Additionally, one must shift the function right into – radius to the right:

*value_cyc = (radius^2 - (radius - time)^2)^0.5*

Now let´s take a look at the function “value_cyc” for seven weekdays 1 to 7 (red). To compare it with the mentioned sine-approach the function (blue)

*value_sin = sin(2π*time/14)*3.5*

is added:

figure created by wzgrapher

The difference is the slope. The function "value_cyc" starts with a stronger increase which leads to higher values at the start and at the end of the period. This effect makes the outcome around the center of the period similar and differentiates the center even more pronounced from the start and the end – compared to "value_sin".

Note that making data cyclical is not always a good idea. Be careful if the periods not have the same length (e.g. month-days)!

- Using a business rules engine to streamline decision-making
- IBM boosts vertical cloud push with financial services cloud
- Exploring GRC automation benefits and challenges
- Check model accuracy with Facebook AI's new data set
- AR use cases gain ground due to COVID-19, maturing tech
- Air Force's data overhaul makes analytics a priority
- AI adoption in the supply chain requires a strategic approach
- New DataRobot CEO sees bright AI future for the vendor
- Why consider an augmented data catalog?
- Consider IoT TPM security to augment existing protection
- 11 Best Data Science Blogs to Follow

Posted 12 April 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central