Data governance methods and ideas

4 min read

What is data governance?

As any data organisation emerge, there is a case of data debt creeping in and before you know it, the organisation might have piled up thousands of dataset. It gets difficult especially after few months to find out who created the data set, who owns it and who manages it. 

These are crucial because when we want to make an informed decision through data we would need to answer the above questions, to have more meaning dashboards which doesn't consist of stale data.

In order to mitigate this issue we look at various data management strategies.

Cloud tools and tech for data governance

We can leverage cloud tools nowadays, as many managed solutions have been developed by industry leaders to have a proper data governance strategy.

The prelimnary step for data governance would be,

1. Data access

2. Data discovery

3. Data security

Considering the above points we can choose few technologies that would help us solve this issue.

Databricks Unity catalog

Unity catalog helps us solve the data access and discovery issues with ease with their industry leading UI. There is an explorer option through which we can hook other data catalogs from AWS or GCP and make use of their technology to better manage data. They provide services like data security (access through service principle), catalog(search tables), lineage (where the data comes from) which is very crucial in an evolving organisation and finally data access (as delta tables).

This option is better suited for evolving or even grown organisation which has other cataloging mechanism needing to port it to databricks inorder to manage data more efficiently. But there is an added cost since this is a paid service.

AWS Glue

This is another catalogging solution by AWS which offers data discovery, data access and security similar to databricks' catalog but lacks a modern UI. This is useful if you have a team full of developers and are comfortable with AWS technologies. 

The main advantage of this service is that you can create managed and non-managed tables in S3 (data lake providing security) and then use them or catalog them in hive metastore so it becomes discoverable. But missing out on lineage like the databricks unity catalog.

Since this is a paid service again but cost effective than the databricks' offering.

Built-in solution

As organisation grows there might be a possibility of having tools created internally which might be suited best for a particular use which the open source or cloud offerings doesn't really solve the problem.

That is where an organisation would decide to write their cataloging/data governance solution taking inspiration from other tools in the market. This would involve expert developers creating an internal tool with technology like Java, Python, Kubernetes and deploying a tool which an organisation can use it internally. 

This involves lot of efforts at first but once created would really help specific use case of the organisation, although maintaince and ops time will increase for the team that has developed the tool. This is very cost effective considering the above two solutions.

Choosing a solution

So there you have it, these are some of sample solutions for providing or starting to introduce data governance within an organisation, leverage it in order to manage data effectively and the need for it as well.

Choosing a technology requires analysing the use case, the severity and the cloud cost. If there is a need to introduce or develop data products faster then Databricks solutions are way to go since they integrate well with data pipelines through notebooks.

But if the organisation has a mature tooling and expert developers it is worth to check out creating data cataloging solutions within the team so complex use cases can be managed/developed which other tools might lack. 

So it ultimately depends on the use case but hope this article will give a glimpse of what data governance is, the starting point of things to consider and then few solutions for implementing it.

In case you have found a mistake in the text, please send a message to the author by selecting the mistake and pressing Ctrl-Enter.
vivek kumar 2
Joined: 7 months ago
Comments (0)

    No comments yet

You must be logged in to comment.

Sign In