As data scientists plan and evolve their big data programs, it is time to evaluate the value of adding data in motion to the data lake. What’s the difference between data in motion and data at rest? We are all familiar with data at rest. This is the data we are most accustomed to working with. Data at rest is persistent data that is stored for some period of time on either disk or in memory like sales transaction records or account information. Data in motion, on the other hand, is transitory data. It is the information about the data as it traverses the network. Data in motion isn’t stored, yet it holds information that is very helpful for the data scientist.
Let us look at an example, say a Video on Demand service like Netflix or Amazon Prime. The data at rest would include account information, movies watched, when they were watched, and even the movies themselves. A lot of information can be gleaned from just this information alone. But, this tells us nothing about the experience the user had with the service.
To understand the customer experience, we need to understand the data in motion. Isn’t the mere fact that we have a completed transaction enough to infer some quality of user experience? No. In fact, a completed transaction represents just that, a completed transaction. Data in motion allows us to understand the entirety of the experience.
Data in motion will allow us to understand whether the users were able to authenticate to the service, if the authentication process is slow and if it required multiple attempts. It will also tell us how many customers failed at the login or authentication process. Data in motion also allows us to understand where customers are coming from – device, location, network bandwidth, etc. This data can be correlated with data at rest to understand if our service is optimized for the various devices that are used to access services today.
The challenge with data in motion is that unlike data at rest, it needs to be processed in real-time. As a data scientist, you don’t want to keep all the data in motion, you are really interested in metadata about the data in motion. For example, if someone is watching a movie, you don’t care to store every bit of the movie – that’s of no value and will turn your data lake into a data ocean. What you are concerned with is the data about how the files traversed the network. Were there requests for retransmission because of dropped packets? Did authentication fail? Was there poor performance due to lack of network bandwidth?
When we take data in motion and correlate it with data at rest, we can finally understand not only what customers are doing but also the intricacies of their experience during the interaction. The key to increasing customer revenue or customer retention can be hidden in the data in motion.