Missing data present significant challenges to trend analysis of time series. Straightforward approaches consisting of supplementing missing data with constant or zero values or with linear trends can severely degrade the quality of the trend analysis, which significantly reduces the reliability of the trend analysis.
We present a robust adaptive approach to discover the trends from fragmented time series. The approach proposed in this paper is based on the HASF (Hypothesis-testing-based Adaptive Spline Filtering) trend analysis algorithm, which can accommodate non-uniform sampling and is therefore inherently robust to missing data.
HASF adapts the nodes of the spline based on hypothesis testing and variance minimization, which adds to its robustness. Further improvement is obtained by filling gaps by data estimated in an earlier trend analysis, provided by HASF itself. Three variants for filling the gaps of missing data are considered, the best of which seems to consist of filling significantly large gaps with linear splines matched for continuity and smoothness with cubic splines covering data-dense regions. Small gaps are ignored and addressed by the underlying cubic spline fitting.
Trend Analysis Algorithm: HASF (Hypothesis-testing-based Adaptive Spline Filtering)
The basic concept of HASF is to break the time series into flexible sections, each of which is curve-fitted with a cubic spline, and to impose appropriate constraints such as continuity and smoothness between the sections, a minimum or maximum section length, etc. The number of sections and the nodes between them are adapted from the data using hypothesis applied to the second statistics of the residual noise. Essentially, the nodes are adapted, provided that the standard errors due to the adaptation result in a statistically significant improvement, typically determined through an F-test.
This involves three operations:
a) Inserting nodes: This operation is based on the null hypothesis that the variances of two residuals (before and after inserting a node) are equal. If the null hypothesis is rejected, (namely, that the F-statistic is significantly larger than 1), the current section can be divided into smaller sections by inserting a node. Note that in this way the heteroscedasticity of the residual error is reduced, and the trend estimated leaves a hopefully homoscedastic residual.
b) Shifting nodes: The shifting is simply determined by whether it improves the overall standard error. The shifting is first tested in the positive direction, and ifit fails to improve the error, the current node is shifted to the negative direction.
c) Removing nodes: This is similar to the operation of inserting nodes, and is based on the null hypothesis that the variances of two residuals (before and after inserting a node) are equal. If the null hypothesis is rejected, (namely, that the F-statistic is significantly larger than 1), the two sections in questions are merged by removing the node between them. Heteroscedasticity of the residual error is therefore hopefully reduced
Filling Gaps (missing data) via HASF
The first step is to substitute plausible values for the missing observations to initialize a complete time series.
We present three variants:
1) fill gaps with straight lines
2) fill gaps with cubic filtering
3) fill gaps with cubic and linear filtering.
The second step is to carry out the trend using the initialized filled time series using the HASF method