Home » Programming Languages » Python

Overview on Forecasting Models in Power BI

Time Series forecasting in PBI is based on the thumb technique of smoothening time series prediction called Exponential Smoothening (ES). ES of time series data assigns exponentially decreasing weights for newest to oldest observations. ES is also be used for time series with trend and seasonality. This model is usually used to make short term forecasts, as longer-term forecasts using this technique can be quite unreliable. Collectively, the methods are sometimes referred to as ETS models, referring to explicit modeling for errors, Trend and Seasonality.

Types of Exponential Smoothening models in PBI  

  • Simple exponential smoothening : – uses a weighted moving average with exponentially decreasing weights
  • Holt’s trend-corrected double exponential smoothening :- usually more reliable for handling data that shows trends, compared to the single procedure
  • Triple exponential smoothening :- usually more reliable for parabolic trends or data that shows trends and seasonality

 

Handling the missing values

In some cases, your timeline might be missing some historical values. Does this pose a problem?

Not usually – the forecasting chart can automatically fill in some values to provide a forecast. If the total number of missing values is less than 40% of the total number of data points, the algorithm will perform linear interpolation prior to performing the forecast.

If more than 40% of your values are missing, try to fill in more data, or perhaps aggregate values into larger time units, to ensure that a more complete data series is available for analysis.

Reference Code ::-

import requests
import pandas as pd
import json
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, Holt
import numpy as np
%matplotlib inline
plt.style.use(‘Solarize_Light2’)

r = requests.get(‘https://datamarket.com/api/v1/list.json?ds=22qx’)
jobj = json.loads(r.text[18:-1])
data = jobj[0][‘data’]
df = pd.DataFrame(data, columns=[‘time’,’data’]).set_index(‘time’)
train = df.iloc[100:-10, :]
test = df.iloc[-10:, :]
train.index = pd.to_datetime(train.index)
test.index = pd.to_datetime(test.index)
pred = test.copy()

model = SimpleExpSmoothing(np.asarray(train[‘data’]))
model._index = pd.to_datetime(train.index)

fit1 = model.fit()
pred1 = fit1.forecast(9)
fit2 = model.fit(smoothing_level=.2)
pred2 = fit2.forecast(9)
fit3 = model.fit(smoothing_level=.5)
pred3 = fit3.forecast(9)

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(train.index[150:], train.values[150:])
ax.plot(test.index, test.values, color=”gray”)
for p, f, c in zip((pred1, pred2, pred3),(fit1, fit2, fit3),(‘#ff7823′,’#3c763d’,’c’)):
ax.plot(train.index[150:], f.fittedvalues[150:], color=c)
ax.plot(test.index, p, label=”alpha=”+str(f.params[‘smoothing_level’])[:3], color=c)
plt.title(“Simple Exponential Smoothing”)
plt.legend();

model = Holt(np.asarray(train[‘data’]))
model._index = pd.to_datetime(train.index)

fit1 = model.fit(smoothing_level=.3, smoothing_slope=.05)
pred1 = fit1.forecast(9)
fit2 = model.fit(optimized=True)
pred2 = fit2.forecast(9)
fit3 = model.fit(smoothing_level=.3, smoothing_slope=.2)
pred3 = fit3.forecast(9)

fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(train.index[150:], train.values[150:])
ax.plot(test.index, test.values, color=”gray”)
for p, f, c in zip((pred1, pred2, pred3),(fit1, fit2, fit3),(‘#ff7823′,’#3c763d’,’c’)):
ax.plot(train.index[150:], f.fittedvalues[150:], color=c)
ax.plot(test.index, p, label=”alpha=”+str(f.params[‘smoothing_level’])[:4]+”, beta=”+str(f.params[‘smoothing_slope’])[:4], color=c)
plt.title(“Holt’s Exponential Smoothing”)
plt.legend();

Evaluating the Forecast

Hindcasting and adjusting confidence intervals are two good ways to evaluate the quality of the forecast.

Hindcast is one way to verify whether the model is doing a good job If the observed value doesn’t exactly match the predicted value, it does not mean the forecast is all wrong – instead, consider both the amount of variation and the direction of the trend line. Predictions are a matter of probability and estimation, so if the predicted value is close to but not exactly the same as the real value, it could be a better indicator of prediction quality than if the value exactly matched the real result. In general, when a model too closely mirrors the values and trends within the input dataset, it might be overfitted, meaning it likely won’t provide good predictions on new data.

You are the best judge of how reliable the input data is, and what the real range of possible predictions might be.

Tags: