Updated from original posted on April 17, 2014
The importance of metadata only continues to grow as organizations are realizing that to fully exploit the business and operational potential of machine learning, deep learning and artificial intelligence requires that the raw data be enhanced with metadata. And while we have growing volumes of actual data, there is even more data, or metadata, around the usage and source of the actual data.
Metadata is defined as a set of data that describes and gives information about other data. The phone call illustrates the insights that can be mined just from the metadata. Research from the Stanford University has shown that metadata of phone calls discloses a significant amount of personal information without accessing actual voice records. Graph analyses of phone call metadata can reveal frequency, recency, strength and the nature of relationships among people[1].
Let’s drill further into the analytic richness of metadata.
Tagging is a concept with which most web analytics users are familiar. Tagging is a method of tracking visitor activity on each page of the website (see Figure 1).
Figure 1: Web analytics tagging process[2] (source for picture: Hosting Canada)
Source: https://en.wikipedia.org/wiki/Web_analytics
As each web page is requested, the web server returns the HTML page with the embedded JavaScript page code. The JavaScript page code sets the values for analytic data that you are collecting and calls functions and global variables in the JavaScript library file. The JavaScript code builds an image request for a 1x1 pixel image, also called a web beacon that concatenates a query string of name/value pairs of analytics data that is sent to a data center for reporting and analysis.
The advantages of tagging include:
Sometimes it’s hard to imagine what metadata is and why it’s important. Let’s look at an example of the metadata associated with a 140-character tweet. 140 characters wouldn’t seem to be much data, even with a voluminous number of tweets. However, data volumes explode when you start coupling the tweet with all the metadata necessary to understand the 140-characters in context of the conversation (see Figure 2).
Figure 2: Metadata associated with a tweet
Here is some of the metadata associated with a 140-character tweet[3]:
It’s quick to see how the volume of metadata quickly dwarfs the amount of raw data, and this is what happens when organizations start tagging more of their transactions and interactions in order to gain additional insight into the nature and context of the dialogue and interaction.
Not all data is necessarily useful for Big Data analytics. However, some data types are particularly ripe for analysis, such as:
These are in addition to the normal transactional data running through the enterprise systems in the course of normal data processing today.
An IDC study titled “Discover the Digital Universe of Opportunities” states that from 2013 to 2020, the digital universe will grow by a factor of 10x—from 4.4 trillion gigabytes to 44 trillion. However, the IDC study estimates that only 3% of the potentially useful data will be tagged.
Call this the Big Data gap: information that is untapped and waiting for enterprising digital explorers to extract the value hidden within it. The bad news is that this will take extra work and investment to tag all of these new data sources. The good news is that, as the digital universe expands, so does the amount of useful data it contains, and the invaluable insights about your customers, products, markets, and operations that can be used to optimize key business processes and uncover new monetization opportunities.
[1]Source: Eckerson Group “If Data is the New Oil, Metadata is the New Gold”
[2]Researcher9999 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=44197627
[3]http://readwrite.com/2010/04/19/this_is_what_a_tweet_looks_like#awe...
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central