Time Series Forecasting and Internet of Things (IoT) in Grain Storage
Authors: Vinay Mehendiratta, PhD, Director of Research and Analytics, Eka Software
Sishir Kumar Pagada, Senior Software Engineer, Eka Software
Created as part of the Data Science for IoT practitioners course – starting Nov 10 2015
The pdf version of this paper may be downloaded HERE
Abstract
Grain storage operators are always trying to minimize the cost of their supply chain. Understanding relationship between receival, outturn, within storage site and between storage site movements can provide us insights that can be useful in planning for the next harvest reason, estimating the throughput capacity of the system, relationship between throughout and inventory. This article explores the potential of scanner data in advance analytics. Combination of these two fields has the potential to be useful for grain storage business. The study describes Grain storage scenarios in the Australian context.
----------------------------------------------------------------------------------------------------------------------------------------
Introduction
There is sufficient grain storage capacity across most of Australia to cater for a range of seasonal outcomes. There is about 55 Million Metric Ton (MMT) of bulk handling storage capacity at 623 sites across Australia. Combined with an estimated 15 MMT of on-farm storage capacity, Australia has the capacity to store the equivalent of two years’ average grain production. As a result, grain storage fees are kept relatively low and are falling in real terms [1].
As grain facilities and port terminals investigate methods to increase throughput and grow revenues, it is becoming important to maximize resource utilization and understand throughput. A typical busy site could receive as high as 400 trucks a day during harvest season.
Data collected at individual storage facilities provides visibility of stock across all storage sites. An RFID Scanner at the storage site entrance registers information on truck arrival time and is mapped to tonnage, grower, grade, commodity type, and quality. This activity is known as ‘receival’ in grain storage operations. RFID Scanner at the storage site exit registers information on truck departure time and is mapped to tonnage, grower, grade, commodity type, and quality. This activity is known as ‘outturn’ in grain storage operations. Storage facilities also have access to grain movement information from one site to another site (‘between-site-movements’) and within a site (‘within-site-movements’).
The question we address is: Could ‘Internet of Things’ and ‘Predictive Analytics’ help understand receival and outturn behavior at grain storage sites? Could we make better usage of this data?
Predictive Analytics Platform
The High level flow of data at a grain storage site is described below.
Figure 1: Internet of Things specific Architecture
We consolidated receivals, outturns information received by sensors at month level for the last 5 years. Understanding of those patterns can be used in planning for the site operations at harvest season and other seasons. The rest of the document is arranged as shown in Figure 2.
Figure 2: Document and Methodology Flow
Input Processing: Receival and outturn is a frequent activity at some sites but not all. Unavailability of data at regular interval at those sites made it impossible to analyse daily operations and at site level. We aggregate data across sites, commodities, and grades at month level to overcome this issue of missing values. We determine inventory snapshot using receival and outturn data.
Inventory at the end of month = Initial Inventory at the beginning of month + Receival Quantity during this month – Outturn Quantity during this month.
Time Series Modeling – Identify patterns in historical data
Time series is sequence of observations (generally quantitative observations) taken at equally spaced time intervals. An inherent feature of time series is that adjacent observations are dependent/related. Time series analysis is mainly concerned with analysis of the dependence among adjacent observations. Sensor data captured at various discrete time periods is a time series and suitable for time series modeling. There are various packages available in R to decompose a time series. A tutorial on time series in R can be found at [8]. Time series decomposition (using software R) breaks observations into three components. Decompose method of R was used to determine the trend, seasonality, and random pattern in this study. We provide the code snippet below:
tsObject<-ts(QuantitySeries,start=c(fromYear,fromMonth),frequency=frequencyInt)
decomposedTs <- decompose (tsObject, type="additive", filter=NULL)
fromMonth and fromYear is the earliest observation of receival and outturn data. frequencyInt is the number of observations in a year in this dataset. More information on configuring these parameters is available at [7]. QuantitySeries is the name of the dataset brought into R by reading a csv file.
Plot function is used to generate all the graphs used in this report.
plot.ts(tsObject)
Trend: This component looks for the trend in observed data whether observations have increasing or decreasing or constant trend over time. The function determines the trend component using a moving average.
trendComp <- decomposedTs$trend
seasonalComp <- decomposedTs$seasonal
randomComp <- decomposedTs$random
Receival Volumes at sites: Historical receival data is plotted in Figure 3. We used this data and fed to decompose [2] package of software R to understand the patterns. One can easily interpret from Figure 4 that receivals do increase dramatically during harvest season every year as shown in seasonality graph of Figure 4. Receivals are increasing every year until 2013. Moving average (plotted in trends section) in receivals have come down during 2013-2014 as shown in Figure 4.
Figure 3 : Receival Time Series Figure 4: Patterns for Receival Time Series
Outturn Volumes at sites: Historical outturn data is plotted in Figure 5. We used this data and fed to the decompose package of software R to understand the patterns. One can easily interpret from trend section of Figure 6 that outturns do increase steadily every year until 2013. Outturns also do show a very strong ‘seasonal’ behaviour as shown in ‘seasonal’ section of Figure 6.
Inventory Volumes at sites: Historical inventory data is plotted in Figure 7. We used this data and fed to the decompose package of software R to understand the patterns. One can easily interpret from Figure 6 that inventory volume increases during harvest time every year. Inventory build was higher when harvest season had more receival tonnage. Inventory is the result of receivals and outturns. It does show seasonality behaviour similar to receival and outturn activity pattern as shown in ‘seasonality’ section of Figure 8.
Time Series Forecasting
To make forecast for future periods, time series extrapolates the observed dependence relationships among available observations to future periods. Time series forecasts for future periods are based on analysis of dependence relationships, such as trend and seasonality, among available observations. We divided the data into two parts – training data and test data. Training data (from Jan, 2010 to Dec, 2013) was used to allow our models learn from history. Test data (From Jan, 2014 to Oct, 2014) was used to test the accuracy of model generated using training data. The objective was to identify any pattern in receivals (client, grower), outturns, and resultant inventory that can be used to understand grower’s operations, market behavior, and bottlenecks in warehouse efficiency.
There are many packages available within R that can be used to forecast receival, outturn with varying degree of accuracy. One has to test and check results to find the package that gives better results. We did use two such algorithms: ARIMA and Holt-Winters.
Arima: We use auto.arima () function available in forecast package of R [9]. This function automatically finds best fitting Arima [3] model to the data. Residuals are the actual values minus the fitted values.
tsObject <- ts(QuantitySeries, start=c(fromYear,fromMonth), frequency=frequencyInt)
arimaFit <- auto.arima(tsObject)
arimaResiduals <- residuals(arimaFit)
Holt-Winters: We use Holt Winters Exponential Smoothening model [4] with trend and seasonality.
trend <- isTrend
seasonality <- isSeasonality
hwFit <- HoltWinters(tsObject, beta=TRUE, gamma=TRUE)
beta = TRUE setting would include the trend component. gamma is the parameter used for the seasonal component. If set to TRUE
, a -seasonal model is fitted. We calculate the sum of squared errors (SSE) for the in-sample forecast errors, that is, the forecast errors for the time period covered by our original time series.
arimaErros <- sum(arimaResiduals^2)
hwErrors <- hwFit$SSE
We compare the model accuracy of Arima and Holt-Winters for given data. The model that provides the forecast with the minimum value of SSE is chosen to make forecast for the future periods. Code is provided below:
if (arimaErros <= hwErrors){
print ("Using Arima")forecastedValues <- forecast(arimaFit, h=10)
}
else if (arimaErros > hwErrors){
Print ("Using Holt-Winters")
forecastedValues <- forecast.HoltWinters(hwFit, h=10)
}
forecastedValue <- as.data.frame(forecastedValues)
Figure 11: Receival, Outturn, and Inventory Forecast
Results of time series forecasting are shown in Figure 11. It is important to consider point forecast as well as forecast with confidence intervals. People responsible for site operations have consider next harvest season related news, weather, and crop yield information to decide the confidence interval that should be used for various purposes. Forecast with confidence intervals can be used to devise strategies for various scenarios and assess future uncertainty.
Relation between scanner data and internal movement data
We also considered movements within-the storage-site from one bin to another. We found that volume of movements within-the storage-site was increasing from year 2012 to 2014 (Figure 12, 13). We are aware that outturns increased while receivals and inventory decreased from year 2012 to 2014. Inventory volume had touched the peak during 2012.
We also observed that as inventory volume went down, movement volume within-the storage-site increased while volume moved between-storage-sites decreased. That bodes well for bulk storage handlers trying to reduce the cost of supply chain.
Figure 12: Movements within-the-site time series Figure 13: Patterns for movement within-the-site series
Figure 12a: Movements among sites Figure 13a: Patterns for movement among sites
Figure 14: Client Receivals Figure 15: Patterns for Client Receivals
Grower Receivals – Figure 15, 16 show us that peak volume reached during harvest receivals from growers is decreasing. Receival volume from clients is also decreasing steadily. Impact of this cause is clearly visible in declining volume of inventory at sites from year 2012 to year 2013 and further. Is on-farm storage and high cost of storage one of the reason for reduced receivals?
Figure 16: Grower Receivals Timeseries Figure 17: Patterns – Grower Receivals
Conclusion
Analysis of receival and outturn data Storage data might be beneficial for an organization to gain insights into the behavior of storage sites. One could determine the relationship between throughput and inventory, throughput and internal movements to measure the efficiency of operations. This analysis can be useful to plan for next harvest season. It might be worthwhile performing the time series modeling at daily level during harvest season for major/busy sites. Our objective in this article has been to promote the use of Advance analytics and scanner data (IOT) has the potential to be useful for grain storage business.
Created as part of the Data Science for IoT practitioners course – starting Nov 10 2015
The pdf version of this paper may be downloaded HERE
References
Comment
A good pre-processing step in time-series analysis is to denoise it first. This depends on the kind of time-series data I have. I sometimes use wavelet MODWT (maximal overlap discrete wavelet transform), Robust-Spline or Hodrick-Prescott filter. Doing so, will reveal certain features that were not obvious in the original time-series itself. Do the forecasting on the filtered time-series data rather than the original raw data. There are various de-noising methods available today, but the 3 methods above are just enough for me to use in any time-series preprocessing task I do. The 3 methods are parametric, so the user can choose values of parameters to specify as inputs to the models.
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central