Subscribe to DSC Newsletter

Where ‘big data’ appears to be the skeleton key that will unlock everything and all you want to know about your business, there’s more than meets the eye when it comes to understanding your data. Yes, clean data will unlock incredible value for your enterprise; inaccurate records, on the other hand, are a significant burden on our productivity.

This is why we all seek the “Golden Record”.

The Golden Record is the ultimate prize in the data world. A fundamental concept within Master Data Management (MDM) defined as the single source of truth; one data point that captures all the necessary information we need to know about a member, a resource, or an item in our catalogue – assumed to be 100% accurate.

Its power is undeniable. However, where we have multiple databases, working out how to achieve such perfection is hard to ascertain. As such, we must first understand the benefits of a golden record.

So, let’s step back to your childhood and consider how imperfect information can cause havoc in any system.

The Power of the Golden Record

To explain, we are going to go back to a very simple example – You’re 13-years old. Imagine sitting in your classroom. Everyone has arrived, and the teacher is about to run through the register.

They have drawn the names from various local databases and put them on paper without properly checking who is meant to be where.

After a few minutes, something isn’t right.

The teacher is repeating themselves. No one is sure why. The process is taking much longer than it should and the kids are becoming restless. We take a closer look at the register, then everything becomes clear.

Seemingly duplicated records – the bane of the Master Data Management world.

When building databases from disparate sources, we often run into the issue of duplication. Whether resulting from incomplete entries, changes that occur over time or some other reason, this is a significant issue for any enterprise that relies on vast volumes of information.

As you may imagine, if we were to expand the rollcall example to include hundreds of thousands of names, the overhead of duplication becomes exponentially worse, with every process draining an increasing volume of resource. If we manage to compile a single entry, however – the “Golden Record” – every process becomes infinitely more efficient, and we can begin to leverage the data at our fingertips.

Building the Golden Record

The complexity of implementing a Master Data Management solution stems from defining the workflow that will connect our disparate data sets.

First, we have to identify every data source that feeds into the dataset. Then, we must consider which fields we find to be the most reliable depending on their source. Finally, we must define the criteria that will determine when the data from one source should overwrite conflicting data from a secondary source in our MDM system.

Back to the rollcall example.

We can see that Frederick’s name has two different entries, so which field do we choose to prioritise (bearing in mind this will apply to every entry within the system, not just to Frederick’s)? To answer this, we must determine which system has the correct name most often, or review if any other system captures that field more effectively.

In this instance, it appears to be the second row, in which case we would apply a rule that states we take the ‘Name’ from this source to build our golden record.

Merge and Match Records

The critical question you will face in any MDM solution is how to merge and match apparently duplicate records.

With Frederick, there are two seemingly similar entries, so what is our process to create the single golden record? There is crossover, however, specific differences mean this is not an automatic case to match and merge.

In such instances, the system must review the source of each field. If the first source is deemed more reliable for the postcode, whereas the second field is more reliable for the name and phone number, then define rules that specify the system to follow this approach.

Most MDM solutions offer effective merge functionality. So, you could define the above criteria for the system to review records and, where necessary, carry out the appropriate merge process.

Manual Intervention

Inevitably, problems still arise with data quality. Particularly if we’re lacking a reliable system for specific records; date of birth, for example. 

The toolkit will assign inconsistent records to a data steward for human review, so they can either follow up discrepancies or use past experiences to inform their decision.

Further rules can be put in place to manage the final merge of the revised information, meaning we preserve the overall integrity of our Golden Records.

This blog was originally posted  here

Views: 824

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service