Having fun isn't hard when you have a modern data catalog

Data Catalog and Data Fabric are any data architecture enablers.

You can use centralized architecture or decentralized, Data Catalogs will enable effective management and help interact with the data.

Taking a closer look we figure out that Data Catalog is one of the main technology pillars of Data Fabric which has a much wider approach, including also data semantic enrichment, data preparation as well as data recommendation engines and various data orchestrators.

Data Fabric empowered by Data Catalog, is an abstraction layer that helps applications to connect to data, regardless of database technology and data server location, using built-in APIs.

However, a traditionally manually managed data catalog does not qualify as a Data Fabric unit.

Modern Data Catalog is actively driven by the meta-data and scans data sources regularly with no need for manual maintenance. Modern Data Catalogs usually would have built-in fully-automated end-to-end data lineage and enforce governance procedures as well as data access audits.

There are multiple products in the Data Catalog space in the Azure public cloud.

Azure Data Catalog: helps automatic data assets discovery and control who can discover and use which data assets. Supported data sources: Gen1 Data Lake containers and Blob storage, HDFS files and Hive metastore, MySQL, Postgres, Cassandra and MongoDB tables and views, Oracle database server and views, Azure Synapse, Sql Server,  Teradata, SAP HANA, DB2, HTTP endpoints and more. Since the Microsoft Purview launch, no new data catalog accounts can be created.

Microsoft Purview: Next Generation of Azure Data Catalog, a unified Data Governance solution over on-prem, multi-cloud and SaaS data services that helps to build the entire company data landscape, classify and tag sensitive information and establish security and governance procedures. The supported data sources list is very wide and can be found here.


Please note that there is no upgrade path from Azure Data Catalog to Microsoft Purview and the pricing model is different.

Adding to this salad, a Microsoft Fabric, this Azure service title name can be a little misleading. This is an analytic solution on top of Azure Data Lake Storage, a data lakehouse foundation, that helps to manage and process delta parquet files. In the future, this product will most probably grow into the full power Data Fabric abstraction layer but first, it will need to support a much broader number of data sources.

Microsoft Data Fabric items can be managed inside Microsoft Purview. Purview Data Catalog will show metadata of Microsoft Fabric data assets and help classify and protect them. All Data Fabric user activities are logged and available in the Purview audit log.

Comments

Popular posts from this blog

SQL Awesomeness: Finding a QUALIFY query clause in the depths of the database ocean

Look back and realize how far you came

The Greatest Reasons to use or not to use a Centralized Data Access Architecture