Data Governance: Why questions on how to handle data are important

Mr. Gaurav Aggarwal, VP & Global Lead - Everything on Azure Solution Strategy & GTM at Avanade


No matter what a business does, it is collecting some kind of data on its users. Microsoft CEO Satya  Nadella aptly said: “When we talk about assets on the balance sheet, data deserves its row.It can either be used to support your business, or it can be used to give your end-users a better experience. The credibility of a company hinges on how much transparency it can offer its users and how the company secures, manages, and protects the data at hand.  

As an organisation, you have a big question in front of you “How to handle user’s data?”

With enough data and a roadmap to use that data effectively, you can accelerate your company’s growth. Using data effectively is incomplete without the term data governance. Here’s every “Why? How? Where?” you need to know about data governance and Azure Purview.

To begin with, let’s understand why data governance is imperative. Data is the new currency of the current digital age.But data within organisations is growing at exponential rates. As much as 90% data today was created in just the last two years. And by 2025, 80% of data will be unstructured data.This influx of data has increased the organizational challenges many folds.  

To get real business value from data an organisation needs to know:

  • What data exists within the organization?
  • Who owns the data? Who can access the data? 
  • For what purposes can they use the data responsibly and ethically?
  • Data lineage (traceability of data flow and its usage in solutions)
  • Duplicate data 
  • Quality of data and common taxonomy
  • Security and compliance for the data captured
  • Where and how the data is stored or archived (and overall lifespan of data)

Lack of understanding of any of the above can create operational inefficiencies, confusion related to data and information being distributed internally and externally, and poor business decisions based on flawed or misunderstood data. Well, that’s only a part of the problem set as regulators are cracking down on companies for any compliance data privacy and data sovereigntyand it won’t be surprising if soon we start seeing regulations around the ethical use of data. 

In short, for companies to use data as asset, companies would need to establish an Enterprise Data Governance program using appropriate technology platforms/solutions to ensure visibility into the organisation’s data assets and associated lifecycle of data assets. 

The next question is data governance?

According to Gartner, “Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behaviour in the valuation, creation, consumption, and control of data and analytics.”

Data governance helps ensure the data is usable, accessible, and protected. It also helps in more informed data analytics because an organisation can come to a well-informed conclusion. Data governance also improves the consistency of the data, removes redundancies, and helps make sense of garbage data, which can save an organisation from a big decision-making problem.

So what is Microsoft Azure Purview?

Microsoft Azure Purview is a fully managed, unified data governance service that helps you manage and govern your on-premises, multi-cloud, and SaaS data. 

Purview creates a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage. Purview empowers data consumers to find valuable,  trustworthy data.

It’s built over Apache Atlas, an open-source project for metadata management and governance for data assets. Azure Purview also has a data share mechanism that securely shares data with external business partners without setting up extra FTP nodes or creating redundant large datasets.

Purview is available for public preview.  

There is currently no licensing cost associated with Purview; you pay for what you use. The pay-per-use model offered by Microsoft as part of Public Preview is exciting for Microsoft customers looking to move quickly without having to create a business case to secure an additional budget. Azure Purview reduces costs on multiple fronts, including cutting down on manual and custom efforts to discover and classify data and eliminate hidden and explicit costs of maintaining home-grown systems and Excel-based solutions.  

Critical Capabilities of Azure Purview

Azure Purview consists of below main features.

1. Azure Purview Data Map provides the foundation for data discovery and effective data governance. It’s a cloud-native PaaS service that captures metadata about enterprise data present in analytics and operation systems on-premises and cloud. Purview Data Map is automatically kept up to date with a built-in automated scanning and classification system. Business users can configure and use the Purview Data Map through an intuitive UI, and developers can programmatically interact with the Data Map using open-source Apache Atlas 2.0 APIs.

Purview Data Map powers the Purview Data Catalog and Purview Data insights as unified experiences within the Purview Studio.

Azure Purview creates an automated system to manage one’s metadata from hybrid and miscellaneous sources while using built-in data classifiers and data protection to ensure sensitive data is not misused. It does that by using a feature called Microsoft Information Protection sensitive labels. 

Data Map extracts metadata, lineage, and classifications from existing data stores. It enables you to enrich your understanding with the help of classifiers at cloud scale classify data using 100+ built-in classifiers and your custom classifiers. With Purview Data Map, organisations can centrally manage, publish and inventory metadata at cloud scale and further extend using Atlas Apache open APIs.

Label-sensitive data feature is supported consistently across the database servers, Azure, Microsoft 365, and Power BI. Along with that let’s you easily integrate all your data systems using Apache Atlas Open-source APIs.

2. Purview Data catalog

With Data Catalog, Purview enables rich data discovery with the luxury of searching business and technical terms and understanding data by browsing associated technical, business, semantic, and operational metadata.

The Data Catalog feature of Azure Purview allows you to perform a Semantic search for your Data effortlessly and present it so that understanding it becomes quick and easy while verifying if the data interest originates from a trusted source maintaining the sensitivity of data labels.  

Purview helps companies to understand their data supply chain from raw data to business insights. 

The data governance component provides users a bird’s-eye view of your organisation’s data landscape by quickly determining which analytics and reports are stored.

4. Purview Studio

Purview Studio is essentially an environment created for you to work through the Azure purview services after creating an account. This studio is a central control area that allows developers, administrators, and end-users to work through Purview.

Challenges of Azure Purview

Azure Purview is in its early days and has some gaps that need to be addressed.

  • Purview has a minimal list of data sources.
  • User Interface is missing basic data management capabilities in the Data Catalog. For example, once classified, assets can’t be deleted with the UI.
  • No support for the classification of zip file content.
  • No support for Data Marketplace
  • No support for automation and alerting
  • Relations between assets are set manually, and it’s not possible to specify the type or nature of the relationship.
  • The maximum length of an asset name and classification name is just 4 KB
  • Currently, Azure Purview only provides you with 10GB storage capacity for four capacity unit platforms and 40GB for 16 capacity unit platforms.

Based on the roadmap shared, it won’t be long before the Purview team pull up their socks and cover enough to make Azure Purview an enterprise-grade Data governance suite. 

Along with this, Azure Purview provides companies services like a Data Catalog and Business Glossary

  • Data Catalog is a core element of any data governance software, which can scan all the data sources, identify, index, connect and classify registered user’s data sets.
  • Business Glossary is a collection of terms with brief definitions which connect to other terms. With Business Glossary, it’s possible to automate the process of classifying the data set and annotate them with correct business terms so end-users can understand them more simply. Any business glossary is the foundation of the semantic layer that an organisation uses to define a medium of communication behind its business.

With features like these, Microsoft Azure Purview allows your data to become a crucial asset. Data governance is a complex solution, yet a foundational pillar in any enterprise’s data journey. Data governance helps to democratise data responsibly through accessible, trusted, and connected enterprise data at scale. Microsoft Azure Purview provides a good starting point for Cloud-native Data governance solutions. From the feature checkpoint of view of Azure Purview, it has the potential to be a game-changer with features like Data Catalog, Data Insights, Data Mapping, Business Glossary, Pipelines to manage your data sources and destinations. Instead, Azure Purview has a solid potential to shape up a new Data Governance as A Service Industry  and open up some new opportunities for businesses to explore.