Missing data present significant challenges to trend analysis of time series. Straightforward approaches consisting of supplementing missing data with constant or zero values or with linear trends can severely degrade the quality of the trend analysis, which significantly reduces the reliability of the trend analysis.

We present a robust adaptive approach to discover the trends from fragmented time series. The approach proposed in this paper is based on the HASF (Hypothesis-testing-based Adaptive Spline Filtering) trend analysis algorithm, which can accommodate non-uniform sampling and is therefore inherently robust to missing data.

HASF adapts the nodes of the spline based on hypothesis testing and variance minimization, which adds to its robustness. Further improvement is obtained by filling gaps by data estimated in an earlier trend analysis, provided by HASF itself. Three variants for filling the gaps of missing data are considered, the best of which seems to consist of filling significantly large gaps with linear splines matched for continuity and smoothness with cubic splines covering data-dense regions. Small gaps are ignored and addressed by the underlying cubic spline fitting.

**Trend Analysis Algorithm: **HASF (Hypothesis-testing-based Adaptive Spline Filtering)

The basic concept of HASF is to break the time series into flexible sections, each of which is curve-fitted with a cubic spline, and to impose appropriate constraints such as continuity and smoothness between the sections, a minimum or maximum section length, etc. The number of sections and the nodes between them are adapted from the data using hypothesis applied to the second statistics of the residual noise. Essentially, the nodes are adapted, provided that the standard errors due to the adaptation result in a statistically significant improvement, typically determined through an F-test.

**This involves three operations:**

**a) Inserting nodes:** This operation is based on the null hypothesis that the variances of two residuals (before and after inserting a node) are equal. If the null hypothesis is rejected, (namely, that the F-statistic is significantly larger than 1), the current section can be divided into smaller sections by inserting a node. Note that in this way the heteroscedasticity of the residual error is reduced, and the trend estimated leaves a hopefully homoscedastic residual.

**b) Shifting nodes:** The shifting is simply determined by whether it improves the overall standard error. The shifting is first tested in the positive direction, and ifit fails to improve the error, the current node is shifted to the negative direction.

**c) Removing nodes:** This is similar to the operation of inserting nodes, and is based on the null hypothesis that the variances of two residuals (before and after inserting a node) are equal. If the null hypothesis is rejected, (namely, that the F-statistic is significantly larger than 1), the two sections in questions are merged by removing the node between them. Heteroscedasticity of the residual error is therefore hopefully reduced

**Filling Gaps (missing data) via HASF**

**The first step i**s to substitute plausible values for the missing observations to initialize a complete time series.

We present three variants:

1) fill gaps with straight lines

2) fill gaps with cubic filtering

3) fill gaps with cubic and linear filtering.

**The second step** is to carry out the trend using the initialized filled time series using the **HASF method**

*For original paper download from IEEE, click here. PDF version available here. *

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central