The Coming Revolution in Recurrent Neural Nets (RNNs)

Summary: Recurrent Neural Nets (RNNs) are at the core of the most common AI applications in use today but we are rapidly recognizing broad time series problem types where they don’t fit well. Several alternatives are already in use and one that’s just been introduced, ODE net is a radical departure from our way of thinking about the solution.

Recurrent Neural Nets (RNNs) and their cousins LSTMs are at the very core of the most common applications of AI, natural language processing (NLP). There are far more real world applications of RNN-NLP than any other form of AI, including image recognition and processing with Convolutional Neural Nets (CNNs).

In a sense, the army of data scientists has split off into two groups, each pursuing the separate applications that might be developed from these two techniques. In application there is essentially no overlap since image processing is about processing data that is static (even if only for a second) while RNN-NLP has always interpreted speech and text as time series data.

It turns out though that while RNN/LSTMs remain the go-to technique for most NLP, the more we try to expand time series applications the more trouble we run into. What’s on the horizon may not be so much a modification of RNNs but perhaps a hard fork to several other innovative new AI methods.

The First Fork

The first fork that we wrote about last year is combining CNNs and RNNs in a single neural network. The problem to be solved related to images that occurred in time series, that is video, and the most common tasks for this odd mashup is video scene labeling. Turns out this technique is also good for recognizing and labeling emotion in a video and for some types of person recognition based on having seen that person in a video before.

RNN Not So Good for Massive Parallel Processing (MPP)

Also last year, both Google and Facebook addressed a second type of problem with RNNs. That is, because the data to be analyzed extends over several layers in the DNN you have to wait for all those layers to complete before calculating. That also means that MPP isn’t really feasible. Yes this all still happens very fast but not fast enough for the real time language translation apps to avoid noticeable latency.

This second fork caused both these leaders to abandon RNNs for real time translation and adopt a variant of CNNs they labeled Temporal Convolutional Neural Nets (TCNs). This looks a lot like a CNN with the addition of an ‘Attention’ function. Because they’re structured as CNNs they can be easily adapted to MPP so latency disappears. If you’d like to dig into that follow the hyperlink.

The Third Fork Problem – Irregular Time Series

There are several other classes of time series problems that are not well addressed by RNNs. Mostly these are characterized by systems having continuous values (think economic or financial variables used to forecast stock prices or physically analog systems), and those in which you want to combine time series data with different frequencies, durations, and start points.

If this last one seems mysterious it shouldn’t be. That describes what your medical history looks like as you visit different doctors, have appointments at different intervals, begin or stop medications at different dosages and intervals, have different physical responses (output variables) to these inputs, and simply grow older or stronger or better or worse in some measurable way.

This is at the core of why the vast majority healthcare applications of AI have been in image recognition. Because our ability to use AI with irregular time series is really deficient in its ability to predict an outcome based on these irregular separate data series.

One solution might be to divide up your parallel medical records into discrete steps of weeks or days or even hours (adding layers to the DNN to increase granularity). In theory this would adapt to the discretization required by RNNs. But you begin to see the problem. To gain maximum benefit you would have to use very fine time buckets which would increase computation cost and complexity and rapidly reach a point of impossibility. Then there’s the issue that many of these time buckets would contain no data.

So both the forecasting community and the healthcare community need an AI solution superior to what RNNs can currently deliver.

ODE net

At the Neural Information Processing Systems (NIPS) conference held in Montreal last December researchers from Canada’s Vector Institute presented an entirely new concept in AI time series modeling, and was named one of four best papers from the conference.

The name of their system “ODE net” stands for Ordinary Differential Equation net. Don’t be misled. ODE net doesn’t look anything like a DNN, with no nodes, layers, or interconnects. It is a method of using a black box differential equation solver (DES) with backpropagation and several other clever adaptions to outperform RNNs in both continuous and discrete time series problems. In other words, this is more like a solid slab of computation than anything that might be visualized as a neural net.

There are several interesting changes in mindset that come along with this method. For example, with an RNN you would specify layers and other hyperparameters, run the experiment and see what accuracy you achieved.

With ODE net, there is a direct tradeoff between accuracy and time to train. You specify the desired level of accuracy and the ODE net will find the best way to achieve that, allowing training time to vary. If the training time is unacceptably long, specify a lower accuracy and training is faster. One interesting outcome might be to train at high accuracy but to speed testing by specifying a lower accuracy.

The paper which is available here on arXiv.org is quite thorough and offers the results of several experiments in which the results are clearly superior to RNNs. It’s still in its research phase, but as will most things in data science that won’t necessarily be long.

A New Way Forward

It’s particularly interesting that solutions to some of these most intractable problems with our current deep dive into DNNs may not look anything like neural nets. It makes me also wonder for example whatever happened to that promising work in evolutionary algorithms that is no longer mainstream. We may be at the beginning of a very interesting hard fork into entirely new methods of AI.

Other articles by Bill Vorhies

About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2 million times. He can be reached at:

[email protected] or [email protected]