.

In the previous article, we have tried to model the gold price in Turkey per gram. We will continue to do that to find the best fit for our data. When we chose the KNN and Arima model, we saw the traditional Arima model was much better than the KNN, which is a machine learning algorithm. This time we will try the regression model as a machine learning model and also try to improve our Arima model with some mathematical operations.

A regression that has Fourier terms is called **dynamic harmonic regression**. This harmonic structure is built of the successive Fourier terms that consist of sine and cosine terms to form a periodic function. These terms could catch seasonal patterns delicately.

, , ,

, , …

**m **is for the seasonal periods. If the number of terms increases, the period would converge to a square wave. While Fourier terms capture the seasonal pattern, the ARIMA model process the error term to determine the other dynamics like prediction intervals.

We will examine the regression models with K values from 1 to 6 and plot them down to compare corrected Akaike’s information criterion(**AICc**) measurement, which should be minimum. We will set the **seasonal parameter** to **FALSE**; because of that Fourier terms will catch the seasonality, we don’t want that the auto.arima function to search for seasonal patterns, and waste time. We should also talk about the transformation concept to understand **the lambda parameter** we are going to use in the models.

**Transformation**, just like differentiation, is a mathematical operation that simplifies the model and thus increases the prediction accuracy. In order to do that it stabilizes the variance so that makes the pattern more consistent. These transformations can be automatically made by the **auto.arima function** based on the optimum value of the **lambda parameter** that belongs to the Box-Cox transformations which are shown below, if the lambda parameter set to “**auto**“.

; if

; if **otherwise**

`#Comparing with plots`

`plots <-`

`list`

`()`

`for`

`(i`

`in`

`seq`

`(6)) {`

` `

`fit <- train %>%`

` `

`auto.arima`

`(xreg =`

`fourier`

`(train, K = i), seasonal =`

`FALSE`

`, lambda =`

`"auto"`

`)`

` `

`plots[[i]] <-`

`autoplot`

`(`

`forecast`

`(fit, xreg=`

`fourier`

`(train, K=i, h=18))) +`

` `

`xlab`

`(`

`paste`

`(`

`"K="`

`,i,`

`" AICC="`

`,`

`round`

`(fit[[`

`"aicc"`

`]],2))) +`

` `

`ylab`

`(`

`""`

`) +`

` `

`theme_light`

`()`

`}`

`gridExtra::`

`grid.arrange`

`(`

` `

`plots[[1]],plots[[2]],plots[[3]],`

` `

`plots[[4]],plots[[5]],plots[[6]], nrow=3)`

You can also see from the above plots that the more K value the more toothed point forecasting line and prediction intervals we get. It is seen that after the K=3, AICC values increase significantly. Hence, K should be equals to 2 for the minimum AICC value.

`#Modeling with Fourier Regression`

`fit_fourier <- train %>%`

` `

`auto.arima`

`(xreg =`

`fourier`

`(train,K=2), seasonal =`

`FALSE`

`, lambda =`

`"auto"`

`)`

`#Accuracy`

`f_fourier<- fit_fourier %>%`

`forecast`

`(xreg=`

`fourier`

`(train,K=2,h=18)) %>%`

`accuracy`

`(test)`

`f_fourier[,`

`c`

`(`

`"RMSE"`

`,`

`"MAPE"`

`)]`

`# RMSE MAPE`

`#Training set 8.586783 4.045067`

`#Test set 74.129452 17.068118`

`#Accuracy plot of the Fourier Regression`

`fit_fourier %>%`

`forecast`

`(xreg=`

`fourier`

`(train,K=2,h=18)) %>%`

` `

`autoplot`

`() +`

` `

`autolayer`

`(test) +`

` `

`theme_light`

`() +`

` `

`ylab`

`(`

`""`

`)`

In the previous article, we have calculated AICC values for non- seasonal ARIMA. The reason we choose the non-seasonal process was that our data had a very weak seasonal pattern, but the pairs of Fourier terms have caught this weak pattern very subtly. We can see from the above results that the Fourier regression is much better than the non-seasonal ARIMA for RMSE and MAPE accuracy measurements of the test set.

Since we are also taking into account the seasonal pattern even if it is weak, we should also examine the **seasonal ARIMA** process. This model is built by adding seasonal terms in the non-seasonal ARIMA model we mentioned before.

: non-seasonal part.

: seasonal part.

: the number of observations before the next year starts;**seasonal period**.

: seasonal part.

: the number of observations before the next year starts;

The seasonal parts have term non-seasonal components with backshifts of the seasonal period. For instance, we take model for monthly data, m=12. This process can be written as:

`#Modeling the Arima model with transformed data` `fit_arima<- train %>%` ` ` `auto.arima` `(stepwise =` `FALSE` `, approximation =` `FALSE` `, lambda =` `"auto"` `)` `#Series: .` `#ARIMA(3,1,2) with drift` `#Box Cox transformation: lambda= -0.7378559` `#Coefficients:` `# ar1 ar2 ar3 ma1 ma2 drift` `# 0.8884 -0.8467 -0.1060 -1.1495 0.9597 3e-04` `#s.e. 0.2463 0.1557 0.1885 0.2685 0.2330 2e-04` `#sigma^2 estimated as 2.57e-06: log likelihood=368.02` `#AIC=-722.04 AICc=-720.32 BIC=-706.01` |

Despite the seasonal parameter set to **TRUE **as default, **the** **auto.arima** **function **couldn’t find a model with seasonality because the time series data has a very weak seasonal strength level as we mentioned before. Unlike the Arima model that we did in the previous article, we set to **lambda parameter** to “**auto**“. It makes the data transformed with =-0.7378559.

`f_arima<- fit_arima %>%`

`forecast`

`(h =18) %>%`

` `

`accuracy`

`(test)`

`f_arima[,`

`c`

`(`

`"RMSE"`

`,`

`"MAPE"`

`)]`

`# RMSE MAPE`

`#Training set 9.045056 3.81892`

`#Test set 67.794358 14.87034`

As we will be remembered, the RMSE and MAPE values of the Arima model without transformation were 94.788638 and 20.878096 respectively. We can easily confirm from the above results that the transformation improves the accuracy if the time series have an unstabilized variance.

The time-series data with weak seasonality like our data has been modeled with dynamic harmonic regression, but the accuracy results were worst than Arima models without seasonality.

In addition to that, the transformed data has been modeled with the Arima model more accurately than the one not transformed; because our data has the variance that has changed with the level of time series. Another important thing is that when we take a look at the accuracy plots of both the Arima model and Fourier regression, we can clearly see that as the forecast horizon increased, the prediction error increased with it.

*The original article can be found here.*

**References**

- Forecasting: Principles and Practice,
*Rob J Hyndman and George Athanasopoulos* - Statistic How To: Box-Cox Transformation
- Wikipedia: Fourier Series

- 11 data science skills for machine learning and AI
- Get started on AWS with this developer tutorial for beginners
- Microsoft, Zoom gain UCaaS market share as Cisco loses
- Develop 5G ecosystems for connectivity in the remote work era
- Choose between Microsoft Teams vs. Zoom for conference needs
- How to prepare networks for the return to office
- Qlik keeps focus on real-time, actionable analytics
- Data scientist job outlook in post-pandemic world
- 10 big data challenges and how to address them
- 6 essential big data best practices for businesses
- Hadoop vs. Spark: Comparing the two big data frameworks
- With accelerated digital transformation, less is more
- 4 IoT connectivity challenges and strategies to tackle them

Posted 10 May 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central