Data Virtualization: A Supermarket for Data

What is data virtualization? Here’s an analogy using a concept that we can all relate to: a supermarket.

Picture the scene: Shopping list in one hand, shopping basket in the other, you’re ready to tackle your weekly shopping in your local supermarket. Your items range from fruit and vegetables to washing detergent, perhaps with some free-range eggs thrown in for good measure. Quite the eclectic mix, but you know that you’ll be able to find all you need under one roof.

The fact that this is possible is in itself quite remarkable. Think about it: In the average fruit section, you might find oranges imported from Seville, bananas from Latin America, and apples from France. The origin of each fruit is different, yet they have all been imported to arrive in front of you, the customer.

And now we’re only talking about fruits. If you think about the thousands of other products in the supermarket, all manufactured and/or produced in hundreds of different countries, the ease of buying these products in a supermarket becomes apparent. The idea of travelling to every country to purchase each individual product would be absurd—it would be expensive, time-consuming, and ridiculously slow. It would not even cross your mind as an option.

But how does this all apply to data?

In this analogy, if fruit were to represent data, then a supermarket would be a highly efficient system for delivering data to consumers. Just as every package of fruit has its origin, every set of data has its source. Each of the different forms of fruit offered in a supermarket, such as fresh, frozen, or dried varieties, are like the different ways that data is formatted. The different types of fruit, such as citrus fruits or berries, are like different types of data, and volume is a concept that is shared by retailer and data steward alike.

Just as supermarket customers don’t want to travel all the way to Seville for their oranges, data consumers don’t want to go to each of the different sources to access their data. It’s time-consuming and expensive.

Data consumers need a “data supermarket,” whereby all data, regardless of source, format, or volume, is easily accessible; what they need is data virtualization.

So, what is data virtualization?

Data virtualization forms a virtual data layer, just like a supermarket, that lies between the data sources and the consuming applications. This virtual data layer can connect to sources of all different shapes and sizes (brands, operating systems, etc.) and hides this complexity from the consuming applications (and the data analysts). Data virtualization lets you “shop” for all the data you need, in one place, without having to go directly to the sources.

This technology enables companies to implement a fast data strategy, which is simply not possible with ETL tools, since they integrate data in batches. Also, if you change or move a source connected with an ETL script, you have to rewrite the script, which is time consuming.

Data virtualization, does not physically move or copy the data (unless there is a need to do so,) and it does not require you to load and store the data into a warehouse. Instead of working with copies of the data itself, data virtualization works only with the metadata (the information needed to access each source) in a virtual data layer. This makes it easy for you to experiment with your data, making last minute changes to reports without touching the raw data itself. Data virtualization, therefore, makes it easier, quicker, and cheaper to get value from your data.

In an increasingly data-driven world, fast access to data is key for making real-time business decisions, so why waste precious time, money, and resources using outdated data integration tools, when you can “shop” with ease using data virtualization?

View original article here.