Home » Uncategorized

Predictive maintenance for the Oxford Data Science for IoT Course

After my first post on Anomaly Detection for Time Series post, I would like to continue presenting what I did during the course at for the Data Science for IoT Course at Department of Continued Education of the University of Oxford with Ajit Jaokar.
In line with what I wrote previously, this second post will be about predictive maintenance.
The post is divided into four parts which are:

  • The challenges to create the course
  • The materials found and used for the creation of the course
  • Key elements from the course
  • Next steps and further topics to explore

The post will conclude the initial exploration of the topics I covered at Oxford.

Challenges

When researching materials to cover this course, I had a general idea of what to look for. Having worked already in industrial environments, I had a good idea of what predictive maintenance should be and how it could be used.
Just like for the previous part of the course (Anomaly Detection for Time Series), there is not much materials on the topic though.
Or to be completely fair, I should say that there is not much in-depth topic. We can find a lot of high level materials explaining what predictive maintenance is about or providing some generalities, but that’s it.

When digging deeper, I finally found some interesting articles and papers on the topic, but again with a twist. What I found focuses nearly exclusively on the algorithms (and with few examples) and not on the operational side of predictive maintenance. The architecture side of it is missing.

Materials

Still the materials were enough to cover the short course that I needed to deliver.
The main material that I used during the course is the Cortana Playbook from Microsoft which is focusing on predictive maintenance for the aerospace industry.
It provides a very good overview of the topic and goes in depth into explaining how to work with the data. For this, it uses a dataset that I decided to use during the course to present some code: the NASA Turbofan Engine dataset.
Other sources that I used for the course include:

  • Machine Learning Approaches for Failure Type Detection and Predictive Maintenance, by Patrick Jahnke, Technische Universität Darmstadt, 2015
  • Machine learning methods for vehicle predictive maintenance using off-board and on-board data, by Rune Prytz, Halmstad University, 2014
  • Early Failure Detection for Predictive Maintenance of Sensor Parts, by Tomáš Kuzin and Tomáš Borovicka, Czech Technical University in Prague, 2016

Those three papers cover some different topics but go in-depth into some areas that are relevant to predictive maintenance.
For instance, the last two papers have a very different emphasis, the second one looking at whole systems such as vehicles and the last one looking more specifically at sensor data.
The first paper is broader and provides a good view of the different techniques and algorithms that can be used for predictive maintenance.

Key elements

During the course, I covered several elements such as:

  • The different types of maintenance;
  • The importance of predictive maintenance and the conditions for it to be implemented;
  • The algorithms that can be used for predictive maintenance;
  • The need to be careful about skew and criticality;
  • A simple architecture proposal and a methodology for running predictive maintenance;
  • Some code using the NASA Turbofan Engine dataset

I will not go over the details of every category here, but I will focus on a few of them.

Algorithms for predictive maintenance

With respect to the types of algorithms that can be used for predictive maintenance, we can use the same classification that we use for all data science problems. Problems can be of supervised or unsupervised nature.

However, with regards to predictive maintenance, I must stress that unsupervised machine learning algorithms are far less used than supervised ones.

Indeed given the nature of unsupervised algorithms, they are not going to help in predicting when a failure will occur. However, they can be very useful initially when analysing a dataset that contains data with respect to different types of failure. For instance, we can run a clustering algorithm on the dataset to be able to separate data showing failure modes from those which do not show a failure mode. This can be useful when the dataset needs to be labelled.

So the key algorithms for predictive maintenance are those focusing on supervised learning. In this category, a lot of algorithms can be used, though given the temporal nature of the data of predictive maintenance problems, it is worth focusing on algorithms that can take care of that. For instance, time series analysis and anomaly detection will be useful here.

Within this category, we can again split the problems into two separate groups:

  • classification problems which focus on either determining whether a failure will occur over a given time horizon (binary classification with “will happen” / “won’t happen” classes), whether a failure will occur over several time horizons (multi class classification with “won’t happen” / “will happen in 1 time period” / “will happen in 2 time periods” / etc. classes) and finally whether a failure will be of a certain type (multi class classification with the failure modes as classes)
  • regression problems which focus on determining the remaining time to failure (which is often called the Remaining Useful Life or RUL)

Skew and criticality

Two elements need to be taken care of when working on predictive maintenance: skew and criticality.

Skew is inherent to the type of data we are working with in predictive maintenance problems. Indeed, problems are rare (hopefully) and data on problems is often not readily available. Two reasons for that are:

  • equipment manufacturers do not provide data on the failure of their equipment,
  • equipment is not run to failure by its users, and preventive maintenance is conducted before failure happens.

When data on failure is available, skew needs to be taken care of in the dataset, and there are three techniques to do so (all related to classification):

  • Oversampling the failure classes by replicating them in the dataset given some specific criterion. But this can lead to overfitting.
  • Undersampling the normal class (non failure) by reducing the number of observation. But this can lead to information loss on the normal class.
  • Synthetic data generation which consists in generating artificial data in the dataset using a specific algorithm such as Synthetic Minority Oversampling Technique (SMOTE)

The other element that needs to be taken care of when dealing with predictive maintenance is the criticality of the problem.

The criticality of the problem can expressed in terms of cost for the organization running predictive maintenance. For instance when looking at classification problems, we will have False Positive and False Negative. However, False Negative might be more costly to the organization than False Positive.

For instance in the aerospace industry, False Negative for engine failure are more costly than False Positive (in this case this amounts to the cost of servicing the engine plus the cost of opportunity of not using it).

One simple way to express this criticality it is to calculate the total cost of the prediction which will be the sum of the cost of False Negative and the cost of False Positive. The overall objective of the algorithm will be to minimise this total cost.

Next steps

Just like for the course on anomaly detection for time series, there are elements of the course that need improvement and which will be included in the book Ajit and I are preparing and for later batches of the course.

For instance, the architecture side of it is only briefly presented and having worked in industrial environment before, this is definitely an aspect that is far from obvious.

Exploring more examples with different datasets (i.e. not only the turbofan example) would be useful, together with applying different algorithms. For instance, applying neural networks like LSTM (Long Short Term Memory networks) might provide more interesting results.  

Feel free to leave a comment or reach out to jjb at cantab dot net or jjb at datagraphy dot io if you want more information or if you would like to discuss some of the points above. This post is also available here.