The Four Pillars of Data Governance
Well, another year has gone by, and we still struggle to build data governance programs. Recently someone asked me how I define data governance and the answer was, I don’t. But several thought-leading analyst organizations have weighed in on the definition and a few even work. I don’t worry about defining data governance, what I worry about is how to DO data governance; and for that I have four pillars. You can think about them like objectives.
Increasing Data Usage
Look folks, I'm going to be honest with you if there is anything that gets me really worked up about data governance functions, it's when they don't focus on usage. It is the only thing that really matters when we are building any kind of data function. Data governance should be no different. Increasing data usage will drive support for your data governance efforts and that will bring budget to your data governance efforts.
Improving Data Quality
There is no such thing as data governance without data quality and there is no such thing as data quality without data governance. There are two sides of the same coin. The only tangible way to prove to your end users that data governance is working is through good data quality. On the other side of that data governance/data quality coin, it is exceedingly difficult to know that you are doing data quality well if you don't have rules that are derived through processes associated with data governance. If you want to get support for data governance, focus your efforts on data quality because most organizations struggle with good or even good enough data quality.
Identifying Data Lineage
A month or so back I was giving a presentation talking about the four pillars of data governance and somebody in the audience asked, “Isn't data lineage just really about metadata?” And the short answer to that is, well sure, sort of. If I had a dime for every time an organization told me that they were collecting metadata therefore they had data governance I would be a very wealthy person right now. But metadata is not all I'm going after when I think about data lineage in the context of data governance. Data lineage encompasses metadata, but it also encompasses things like data catalogs so that you know your source to target. In other words, the source of the data and where it ends up. The ability to do a regression analysis (for example) to isolate what happens to that data in support of data quality remediation is critically important (and yes, requires metadata). All these things support data governance efforts, but it should not be about metadata alone.
Ensuring Data Protection
For way too long data governance was almost solely defined by data protection or data security. Certainly, there is some responsibility here for data governance to support and ensure that our assets are protected. When you do any kind of data management function you are responsible for that data. But the reality is the accountability for the protection of data and the associated data assets should clearly fall under your Privacy, Information Security team, and/or Compliance team. Data governance should support this effort by helping to deploy rules about the data to ensure that the actionable aspects of a policy or procedure exist within the data repository. But we are not the ones that should be driving the creation of the policies or the procedures. That is clearly the accountability of your information security, compliance, or privacy team. I encourage you to create a happy alliance with privacy, compliance, risk, and your information security team (or whatever variation your organization has) to ensure that what they are doing is well aligned to how you operationalize data governance into any kind of a repository but it is not your accountability.
How to Address the Pillars of Data Governance
You must deliver something in each of these four areas to be doing data governance well. The whole intention here is operationalizing all the efforts into a data repository to improve scale and increase repeatability. What I mean is applying the rules that you have created through data governance functions (whether that be definitions or data quality parameters) into a repository that supports automated functions for monitoring (among other things) so that it is a repeatable mechanism. Just creating definitions or just creating data quality parameters without actually encoding them into your repositories is not - I repeat not - data governance. That is pontification and while sometimes it is fun it's not going to get the support that your data governance team will need.
At the end of the day, I don't really care how you define data governance if it resonates with your organization. But what is important is that you address these four pillars to ensure that data governance is functioning appropriately in your organization.
You made it to the end, thank you! Since you're here I wanted to let you know that the #audiobook version of "Disrupting Data Governance" will be released on January 10 2023! Subscribe to stay in the loop.