Updated from original posted on April 17, 2014
The importance of metadata only continues to grow as organizations are realizing that to fully exploit the business and operational potential of machine learning, deep learning and artificial intelligence requires that the raw data be enhanced with metadata. And while we have growing volumes of actual data, there is even more data, or metadata, around the usage and source of the actual data.
Metadata is defined as a set of data that describes and gives information about other data. The phone call illustrates the insights that can be mined just from the metadata. Research from the Stanford University has shown that metadata of phone calls discloses a significant amount of personal information without accessing actual voice records. Graph analyses of phone call metadata can reveal frequency, recency, strength and the nature of relationships among people.
Let’s drill further into the analytic richness of metadata.
Tagging is a concept with which most web analytics users are familiar. Tagging is a method of tracking visitor activity on each page of the website (see Figure 1).
Figure 1: Web analytics tagging process
The advantages of tagging include:
Sometimes it’s hard to imagine what metadata is and why it’s important. Let’s look at an example of the metadata associated with a 140-character tweet. 140 characters wouldn’t seem to be much data, even with a voluminous number of tweets. However, data volumes explode when you start coupling the tweet with all the metadata necessary to understand the 140-characters in context of the conversation (see Figure 2).
Figure 2: Metadata associated with a tweet
Here is some of the metadata associated with a 140-character tweet:
It’s quick to see how the volume of metadata quickly dwarfs the amount of raw data, and this is what happens when organizations start tagging more of their transactions and interactions in order to gain additional insight into the nature and context of the dialogue and interaction.
Not all data is necessarily useful for Big Data analytics. However, some data types are particularly ripe for analysis, such as:
These are in addition to the normal transactional data running through the enterprise systems in the course of normal data processing today.
An IDC study titled “Discover the Digital Universe of Opportunities” states that from 2013 to 2020, the digital universe will grow by a factor of 10x—from 4.4 trillion gigabytes to 44 trillion. However, the IDC study estimates that only 3% of the potentially useful data will be tagged.
Call this the Big Data gap: information that is untapped and waiting for enterprising digital explorers to extract the value hidden within it. The bad news is that this will take extra work and investment to tag all of these new data sources. The good news is that, as the digital universe expands, so does the amount of useful data it contains, and the invaluable insights about your customers, products, markets, and operations that can be used to optimize key business processes and uncover new monetization opportunities.
Researcher9999 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=44197627