Subscribe to DSC Newsletter

Metadata is organised information that describes, locates or otherwise makes it easier to retrieve information. Metadata is all around us, it started as catalogue cards in libraries and is now used mainly in a digital manner. Metadata is everywhere, every webpage, every file, picture, piece of software has metadata that describes what it is, when was it created, what size it is, generally everything you or a computer needs to know to efficiently find information.

 

There are two main types of metadata, descriptive and structural. Descriptive metadata is information that is used for identification or discovery of a resource. It can be a title, abstract, author or keywords. Structural metadata usually refers the properties of an object such as the format, size, media, when it was created and so on.

 

Metadata provides important benefits to a business including:

 

  • Consistency – metadata has information that helps business users understand the difference between business terms such as: clients and consumers, revenue and sales and so on.
  • Understanding of relationships – metadata helps the business user to resolve inconsistencies when determining if business terms are associated throughout the data environment. If say the same entity in one form is declared as a delegate and another one is guest, metadata would help to resolve this issue.
  • Clarity of data lineage – metadata usually contains the origins of a data set and help determine where it came from and how it was created. Moreover, metadata can contain auditable information about its users, who created, changed, deleted or moved data with the exact timestamp.

 

To manage metadata on a large, enterprise scale it is common to create a metadata repository.  There are three main approaches to building a metadata repository: centralized metadata repository, distributed metadata repository and federated or hybrid metadata repository.

 

A centralised metadata repository is the traditional approach. This approach offers good scalability for new metadata to be captured, good access to information and fairly high performance. However, it does run the risk of being a single point of failure and a performance bottleneck.

 

A distributed metadata repository allows a business user to access up-to-date metadata from all systems in real-time. This approach offers better data quality as the data can be viewed in real-time, however because all of the systems need to be available in real-time a single system failure can potentially bring the metadata repository down.

 

Lastly, the hybrid approach tries to marry best of both worlds, it can support real-time access of data from source systems and centrally maintain metadata definitions or have a reference path to locations with the accurate definitions, thus improving performance and quality.

 

Metadata and its efficient management is crucial especially for large businesses that run the risk of major costs if there is no strategy and solutions in place to maintain massive repositories. For more great articles, videos and events on data management please visit our website.

Views: 3370

Tags: Metadata, data, repository

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Parker LAU on January 25, 2016 at 5:08pm

merely good article. Really brief intro, if you have some example that would be great.

Comment by Global Speed Index on January 25, 2016 at 11:22am
Comment by Richard Ordowich on January 25, 2016 at 9:06am

Capturing the metadata to find and retrieve data is certainly helpful but capturing the meaning(s) and interpretations of the data or semantics is very challenging.

People have varied meanings and interpretations of the same data and trying to capture these requires discipline and techniques that are not commonly applied to metadata or data such as ontology and taxonomy.

There are tools to help such as Protege but learning how to use and apply these tools is difficult and few organizations have the patience and resources to create this level of metadata and more importantly, maintain it. Data is subjective and capturing the meanings is a challenge. Without this level of detail, the metadata is an approximation at best. 

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service