Subscribe to DSC Newsletter

Exponential Smoothing of Time Series Data in R

Guest blog post by Jeffrey Strickland. Originally posted here.
This article is not about smoothing ore into gems though your may find a few gems herein.

Systematic Pattern and Random Noise

In “Components of Time Series Data”, I discussed the components of time series data. In time series analysis, we assume that the data consist of a systematic pattern (usually a set of identifiable components) and random noise (error), which often makes the pattern difficult to identify. Most time series analysis techniques involve some form of filtering out noise to make the pattern more noticeable.

Two General Aspects of Time Series Patterns

Though I have discussed other components of time series data, we can describe most time series patterns in terms of two basic classes of components: trend and seasonality. The former represents a general systematic linear or nonlinear component that changes over time and does not repeat, or at least does not repeat within the time range captured by our data (e.g., a plateau followed by a period of exponential growth). The latter may have a formally similar nature; however, it repeats itself in systematic intervals over time. These two general classes of time series components may coexist in real-life data. For example, sales of a garden supply company can rapidly grow over years but they still follow consistent seasonal patterns (e.g., as much as 55% of yearly sales each year are made in May, whereas only 5% in August).

This general pattern is well illustrated by the international passenger data series(G), as mentioned in the textbook Time Series: Forecast and Control by Box, Jenkins and Reinsel (ISBN: 978-0470272848), representing monthly international airline passenger totals (measured in thousands) for twelve consecutive years from 1949 to 1960. If you plot the successive observations (months) of airline passenger totals, a clear, almost linear trend emerges, indicating that the airline industry enjoyed steady growth over the years (approximately four times more passengers traveled in 1960 than in 1949). At the same time, the monthly figures will follow an almost identical pattern each year (e.g., more people travel during holidays than during any other time of the year). This example data file also illustrates a very common general type of pattern in time series data, where the amplitude of the seasonal changes increases with the overall trend (i.e., the variance is correlated with the mean over the segments of the series). This pattern, which is called multiplicative seasonality, indicates that the relative amplitude of seasonal changes is constant over time. Thus, it is related to the trend.

Trend Analysis

There are no fool-proof “automatic” techniques to identify trend components in the time series data. However, as long as the trend is monotonous (consistently increasing or decreasing) that part of data analysis is typically not very difficult. If the time series data contain considerable error, then the first step in the process of trend identification is smoothing. But smoothing alone may not always be adequate for more complex data, for instance when the measurement error is enormous or when the data has the international passenger data series (G) characteristics.

Smoothing.

Smoothing involves some form of local averaging of data such that the nonsystematic components of individual observations cancel each other out. The most common technique is moving average smoothing, which replaces each element of the series by either the simple or weighted average of surrounding elements, where n is the width of the smoothing "window" (see Box & Jenkins, 1976; Velleman & Hoaglin, 1981). Exponential smoothing refers to the use of an exponentially weighted moving average (EWMA) to “smooth” a time series. In Single Moving Averages the past observations are weighted equally, but Exponential Smoothing assigns exponentially decreasing weights as the observation get older.

The Data

100 monthly observations on the consumer confidence index (cci) and seasonally adjusted civilian unemployment (unemp) in the US, covering the period June 1997 – September 2005. The third column is a "terrorism" indicator variable taking value one from September 2001. The dataset unemp.cci is part of the R-Package ‘expsmooth’.

Single Exponential Smoothing

Using the R-Package ‘forecast’, we enter the following code for simple exponential smoothing. Beta is a parameter of Holt-Winters Filter. If set to FALSE, the function will do exponential smoothing. Gamma is a parameter used for the seasonal component. If set to FALSE, a non-seasonal model is fitted. So, with both beta and gamma set to FALSE, we get single exponential smoothing.

 

  • library(expsmooth)
  • data(unemp.cci)
  • cci <- ts(unemp.cci[,"cci"],start=c(1997))
  • plot.ts(cci)
  • cci.smooth<- HoltWinters(cci, beta=FALSE, gamma=FALSE)
  • plot(cci.smooth$fitted)

Double Exponential Smoothing

The following R code performs double-exponential smoothing:

  • cci.smoother<- HoltWinters(cci, gamma=FALSE)
  • plot(cci.smoother$fitted)

Conclusion

Simple Exponential smoothing methods are useful under certain conditions, as those described here. The international passenger data series (G) time series data requires more robust methods such as Moving Median, Kernal Smoothing, ARIMA, or UCM (see “Unobserved Component Models using R”). Nevertheless, R offers several useful function for exponential smoothing, including some not discussed here, for instance in the QCC-Package.

Just so you know, here is the result of exponential smoothing on theinternational passenger data series (G) time series data.

About the Author

Jeffrey Strickland, PhD,  has over 20 years of subject matter expertise in predictive modeling and analysis, as an operations research analyst and analytics scientist. He is the author of "Predictive Analytics using R"​ and "Data Science and Analytics for Ordinary People"​. 

Views: 11035

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Tom Reilly on March 11, 2016 at 6:53pm

Hi Jeff,  I used Tsay's variance test using Autobox found here (http://onlinelibrary.wiley.com/doi/10.1002/for.3980070102/abstract)

and the cci has a change(increasing) in the variance at period 42.  The plot also supports the F test.  This would suggest the need to use Weighted Least Squares.

I think the dates might off on the x axis.

DIAGNOSTIC CHECK #5: THE TSAY VARIANCE CONSTANCY TEST SUMMARY                                                                                                                                                                                        The Critical value used for this test :     .01                                  The minimum group or interval size was:      20                                                                                                                  DIRECTION    TIME    DATE       F VALUE     P VALUE                                           (T)                                                                                                                                                 INCREASING      42     4/  6    2.21894        .0044           

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2018   Data Science Central™   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service