In my recent blog, Marrying Kalman Filtering & Machine Learning, we saw the merger of Bayesian exact recursive estimation (algorithm for which is Kalman Filter/Smoother in the linear, Gaussian case) and Machine Learning. We developed a solution called Kernel Projection Kalman Filter for business applications that require static or dynamical, dynamical or time-varying dynamical, linear or non-linear Machine Learning, i.e., pretty much all applications - therefore, Kernel Projection Kalman Filter is a "universal" solution . . .
But who needs anything more than STATIC Machine Learning (ML)?
Indeed, university courses in ML largely teach static ML. Given a set of inputs and outputs, find a static map between the two during supervised “Training” and use this static map for business purposes during “Operation” (which is called “Testing” during pre-operation evaluation). In real life, static is hardly the case ...
Before we proceed further, it will be useful to review my blog, “Prediction – the other dismal science?”, where we discussed “detection” and “prediction”. Also, we know that ML learns a “map” that relates the input and output of a System – if the Systems does not change (remains static), static maps can be used for Detection and Prediction during the operational phase.
As a FIRST approach to practical ML, static system assumption may be okay. But ML has progressed rapidly and today, we can go beyond such grossly simplifying assumptions and develop more sophisticated solutions.
During the Operation phase, Detection involves noticing changes in the ML map output when the underlying System undergoes changes. Prediction on the other hand involves quantifying the changes in the output as the System evolves. Clearly, detection is an easier task than prediction.
Let me take an IoT example to make my case for dynamical ML. Say, you are monitoring 1000’s of machines on a manufacturing plant floor with 10’s of 1000’s of monitoring points. In the training phase, “normal” operation for all these machines would have been established by deciding on a range of acceptable ML output values. To be specific, let us say multiple inputs such as vibration, temperate and pressure from a machine are trained against Normal/ Abnormal. As we saw in the “Double Moon” experiment in my book referred to in Marrying Kalman Filtering & Machine Learning, the Kalman Smoother (and Predictor) outputs will vary due to noise and other errors. We “threshold” this varying output to determine Normal or Abnormal.
When a bearing starts misbehaving on a machine, ML output moves beyond the threshold and an alarm is generated; appropriate action is taken on the problem machine.
With a Static ML solution, we can do this. HOWEVER, what is normal for a machine (or the System under observation) is environment-dependent and ever-changing in real life!
Systems “wander” about in a normal zone which we do try to capture by the pre-determination of the “threshold” or range for ML output values. However, such safe-range determination is ad-hoc since we do not have the actual experience of this particular machine and its own range of “normal” behavior as it ages. This lifetime evolution is one example of “wandering” in the normal zone. Consequence of the system’s evolution within normal range for Static ML then is possibility of increased amount of False Positives!
Also, with the Static ML solution, when “abnormal” is indicated, prediction of what the machine’s condition (via ML “map” output) will not be possible since the System has changed (as the detection of Abnormal indicated). In summary, Static ML is adequate for one-off detection (and subsequent offline intervention) IF your business has a high tolerance for false positives.
In the dynamical ML solution case, Kalman Smoother that we saw in “Marrying Kalman Filtering & Machine Learning” in the offline training phase does everything that a Static ML map can do; PLUS our Bayesian solution is OPTIMAL in the mean squared error sense.
What more does a Dynamical ML solution such as Kernel Projection Kalman Filter offer?
In any business application, more amount of relevant information is gold since it can be exploited to improve performance. Knowing that a machine has a problem is good but knowing the nature of the problem in addition is BETTER!
Kalman Predictor in the “In-Stream” or operational phase provides the following:
At a simple level, if we move all the decision-making we did with Predictor output (and thresholding) to State trajectories, IoT solution performance will be better due to the less volatile nature of States; meaning less False Positives and False Negatives! For example, consider the Predictor output and State trajectory plots for In-Stream phase experiments in the previous sections; consider any one misclassification event (detected due to the PREDCITION output exceeding threshold). If you observe the State trajectories at the same instant of misclassification however, there are hardly any changes, indicating that this event is most likely a False Positive and you do not have to send a technician to troubleshoot the machine!
Going beyond the use of System parameters for classification, . . .
New normal: Let us say that the machine usually process aluminum but switched to titanium work pieces (vibration and temperature signals will be very different). Kalman Filter will adapt to the new normal instead of having to retrain the static ML map; With Static ML, the option would have been (1) to train Static ML for multiple types of work pieces separately and switch the map when work piece is switched – a tedious and error-prone solution or (2) train the Static ML map for all potential work piece types – which will “smear out” the map and make it less accurate overall; in probabilistic terms, this is caused by non-homogeneity of data beyond heteroscedasticity.
Even though I have used IoT as an example, it must be obvious that other business applications are close analogs.
If you contribute to the view that true learning is “generalization from past experience AND the results of new action” and therefore ML business solutions ought to be like flu shots (adjust the mix and apply on a regular basis), then every ML application is a case of Dynamical Machine Learning. In Kernel Projection Kalman Filter, we have a Bayesian exact recursive estimation solution for Dynamical Machine Learning that one can build on; the area is rich and there are many related algorithms that can be put into play for even better results.
In summary, if your business problem space is “static”, stay with Static Machine Learning. If not, it may be time to move on to Dynamical Machine Learning for the many practical benefits it brings; but you pay the price of increased complexity and challenging underlying theory largely unfamiliar to Data Scientists.
A framework to accomplish Dynamical Machine Learning using Bayesian exact recursive estimation is outlined in Marrying Kalman Filtering & Machine Learning and a summary of the theory and the details of a prototypical implementation are provided in my new book, “SYSTEMS Analytics: Adaptive Machine Learning workbook”.
PG Madhavan, Ph.D. - “Data Science Player+Coach with deep & balanced track record in Machine Learning algorithms, products & business”