We are living in a world dominated by data. Data overload and fatigue is drowning most corporations and individuals. There was a time when we had an opinion and we would speak our mind to our friends and families. Now we instantaneously write our minds on Facebook, Instagram, Twitter and LinkedIn. Everyone has a strong opinion and we don't hesitate to take our opinions public. In the last five years, we have generated as much as data as we have never done before. As a company, I want to make sense out of that nonsense and look for key value pair attributes to understand the hidden message lying beneath all that noise. Big Data plays an important part in mining such unstructured data. The Internet of Things is going to further aggravate the situation by adding many more data points.  It’s the wild wild west for data practitioners. Data governance therefore becomes a critical skill for individuals and organizations alike in the 21st century


Source for picture: click here

What is Data governance?

Data governance is process of owning a piece of data and running it through the organization without losing its value. At every step of the way in the food chain, this piece of data can be enhanced or modified. Data governance is sum total of all process, policies and technology that organizations use to store data in whatever native format they generate it in, process it, morph it into any form that an user needs, protect that data as a custodian and eventually, maintain shelf life.

Data governance has several key components:

1. Data Discovery

A data audit within the company routinely throws up the idiosyncrasies of the data at a business or department level. If I am sales & marketing, I have several data points that I might routinely discover from my campaigns/regions/customers etc. Some of these are blind spots that were previously not known and there is a definitive need for it now. For eg, within a community school context, with the amount of aid tied to student performance, I now need to fill in this blind spot with data that was previously not captured. Student performance indicators is a growing field and many schools and colleges are looking at their data discovery processes to fill in this gap. Likewise after the collapse of companies like Enron, WorldCom and others, Sarbanes Oxley Act required companies to report on certain data points from 2002 onward. In the last 13 years, companies have become adept at reporting such data.

Data discovery is an ongoing process that doesn't stop. The industry that the company is in defines the extent and the velocity with which the discovery has to happen. For an industry that is perennially under pressure (such as telecom and retail), the need to be abreast of that data is very significant. For others the frequency of data collection isn't that often. Concepts like customer churn rate and sales/sq foot of retail space have become common place now.

 2. Data custodian/steward/evangelist:

Every company needs data evangelist(s) /custodian(s)/steward(s) in each of its departments. These folks are responsible for owning the data pieces that belong to them. They define all the touch points they have with their customers (internal and external) and create a data dictionary and metadata for their departments.  It is upon them to define the longitudinal data sets for their business units. They also keep the currency of the data and define periods at which the audit needs to happen.

3. Data production:

Once we have defined the metadata and dictionaries of various different types of data, we need to either produce that data or mold existing data stores into a format that can be universally accepted by the organization.  Architecture is much easier now and so is modeling. All touch points- one department to another department and department to customer are now obvious. The data network can be visualized easily.  By accepting the data definitions and the metadata, we now create data repositories that can accommodate our user needs.  In this stage, we are setting the stage to do all the plumbing at the back-end and accept the conceptual model we created in the discovery phase and create framework for Master Data Management. The questions that are posed to us here are

  • Scalability- what’s our projected data store
  • Security - how to build a perimeter around our data
  • Integrity -how to avoid- GIGO, garbage in garbage out syndrome
  • Accessibility- how to provide for latency and access across multiple devices
  • Availability- how to provide for timely access

 4. Master Data Management:

MDM is a systematic approach to manage all the data put into production. The departments that we talked about could very well have their own data stores, data dictionaries and metadata. MDM is the integration of all these data stores if they exist. If these data stores don’t exist, the MDM sets up the framework to manage all of these different data types in a way that makes most sense to each of the businesses. Think of the MDM as your airport- flights come in and go out every sometime, yet there is no chaos. The air traffic control language is known to everyone, the landing and take-off strips are known to everyone, the baggage handling is also known. Similarly within the MDM framework, many businesses can send their data to the airport (MDM repository). The repository knows how to handle that data, where to send it and give appropriate commands.

 5. Organization Process Assets:

Every company has process assets. Depending on the maturity level of the organization with respect to processes, the OPA can be anywhere from weak to strong. A progressive company with strategic goals will invest in OPA. OPA’s are like the neural networks of the organization.  Much like how neural networks emit signals, processes emits data. If the process is absent, so is the data. Invest in the process. Define how you will interact with your customer for sales and for service- using multi-channel communication model. Define for each function ( sales and service) how many data points you want captured. For sales for eg you want to define processes for funnel management from prospect to suspect to customer. For service, you might want to define SLA’s (service level agreements) and measure the process around that.  Many companies have the processes but do not have a way to look at the byproduct of that process- the data. From my experience of running IT shops/ Data departments, I have seen first-hand, companies discard their daily work. Everyone in the company works religiously everyday but lot of that work is discarded as garbage. Not enough is captured. For eg, you may have a process to collect data around funnel management from suspect to prospect to customer but you don’t have a reliable system to collect that data and measure it. After a while, the pain sets in and garbage recyclers like me are called in to fix the problem. We collect the garbage, recycle it and mine it.  Before we get to it, we have to set the foundation to set policies, procedures and the infrastructure. No build out is ever initiated without a solid foundation. Your process is invaluable. It tells you why you are succeeding or why you are not succeeding. Set up Data stewards and give them enough authority to map the neural network within the department /business unit and allow for free flow of data.


Data governance is not for the faint of the heart. Specially, in these times, when the data is in constant flux because of the competitive pressures. Set up a war room to deal with the exponential growth of data, the velocity and the volume. Hire a Chief Information Officer/Chief Data Officer and give him /her the same latitude as others in your organization. Set up pilots with couple of departments. Build those small neural networks and let the data flow through those networks. Speak the data language. Set up data posters all over the offices. It’s a culture that is contagious and eventually data will persist. 

Rupen Shah has 25 years of experience in Software Industry and Information Technology Consulting for government and commercial companies. He is a thought leader in data driven customer analysis. He has excelled in providing advisory services to middle and senior management of large and mid size organizations regarding various aspects of technology implementation with the eventual goal of providing a integrated 360 degree view of the customer. Rupen's specialty is in Product Management, Customer Analytics and acts as a key facilitator between technical resources, business managers, Vice President's and CXO's. He is skilled in management consulting, strategic planning, project/program management,  software development methodology and deployment.Outside of his professional work, Rupen devotes considerable amount of time mentoring students, helping them navigate through their personal and professional life questions. He is an adjunct professor at Southern New Hampshire University & George Mason teaching undergraduate IT courses such as Advanced Database concepts, System analysis and design, Health IT, Object Oriented Programming.  At Marymount University, he teaches MBA students business strategy, marketing management, quantitative management, operations management. 

Views: 3932


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service