Tuesday, 6 January 2026

Unity Catalog Governance – Databricks on AWS

Unity Catalog Governance – Databricks on AWS

Unity Catalog is the centralized governance solution for Databricks that provides unified data governance, fine-grained access control, auditing, lineage tracking, and data discovery across all workspaces.

Unity Catalog allows organizations to control access to data assets such as catalogs, schemas, tables, views, volumes, and machine learning models while enforcing enterprise security policies.


1. Authentication

Authentication determines who the user is. In enterprise environments, authentication is typically integrated with the organization's Identity Provider (IdP).

Typical Authentication Flow

  • User attempts to access Databricks workspace
  • User is redirected to corporate Identity Provider
  • Identity Provider validates credentials
  • Authentication token is issued
  • User is granted access to Databricks

Supported Authentication Methods

  • Single Sign-On (SSO)
  • SAML 2.0
  • SCIM User Provisioning
  • OAuth Tokens

Example Enterprise Setup

Users authenticate through the corporate identity provider such as Okta or Azure Active Directory. Once authenticated, the identity provider synchronizes user groups to Databricks using SCIM provisioning. These groups are then used by Unity Catalog to manage data permissions.


2. Authorization

Authorization determines what the authenticated user can access. Unity Catalog implements Role-Based Access Control (RBAC) to manage permissions.

Access control is applied to the following securable objects:

  • Catalogs
  • Schemas
  • Tables
  • Views
  • Volumes
  • Functions
  • Models

Example Permission Model

Role Access Level
Data Engineer Create tables and manage pipelines
Data Scientist Read curated datasets
Business Analyst Query Gold layer datasets

3. Unity Catalog Role Types

Unity Catalog uses administrative roles and permission-based roles to control governance.

Account Administrator

  • Highest level administrative role
  • Manages Databricks account settings
  • Creates Unity Catalog metastore
  • Assigns metastore administrators

Metastore Administrator

  • Manages catalogs and storage locations
  • Controls overall data governance policies
  • Grants permissions to catalogs

Catalog Owner

  • Full control of a catalog
  • Can create schemas
  • Can grant permissions within the catalog

Schema Owner

  • Manages objects inside schema
  • Creates tables and views
  • Manages schema permissions

Table Owner

  • Full control of table
  • Can modify schema
  • Can grant SELECT, INSERT, UPDATE permissions

4. Identity Types in Unity Catalog

Unity Catalog supports multiple identity types for managing access control.

Users

Individual human identities authenticated through the enterprise identity provider.

Example Users:

  • data.engineer@company.com
  • data.scientist@company.com
  • analyst@company.com

Groups

Groups are collections of users synchronized from the Identity Provider. Permissions are assigned to groups instead of individual users to simplify governance.

Example Groups:

Group Name Purpose
DataEngineers Develop ETL pipelines
DataScientists Access curated datasets
BusinessAnalysts Query aggregated data

Service Principals

Service principals represent non-human identities used by applications, automation scripts, or CI/CD pipelines.

Example:

  • ETL pipeline service principal
  • Airflow automation user
  • CI/CD deployment identity

5. Unity Catalog Object Hierarchy

Unity Catalog organizes data assets in a hierarchical structure.

Metastore
   └── Catalog
         └── Schema
               └── Tables / Views / Functions

Example Hierarchy

Metastore: enterprise_metastore

Catalog: finance

Schema: transactions

Tables:
   daily_transactions
   monthly_revenue

6. Example Governance Implementation

Below is an example of implementing governance using Unity Catalog.

Catalog Level

Catalog Owner Purpose
raw_data DataEngineeringTeam Raw ingested data
curated_data DataEngineeringTeam Clean and processed datasets
analytics DataAnalyticsTeam Business reporting data

Schema Example

Schema Purpose
bronze Raw ingestion layer
silver Cleaned and standardized data
gold Business-ready datasets

7. Example Permission Assignments

Group Object Permission
DataEngineers bronze schema CREATE TABLE
DataScientists silver schema SELECT
BusinessAnalysts gold schema SELECT

8. Data Lineage and Auditing

Unity Catalog automatically tracks data lineage and access activity.

Capabilities

  • Column-level lineage
  • End-to-end pipeline visibility
  • Query history tracking
  • Audit logging

Example

If a Gold table is created from a Silver table using a transformation job, Unity Catalog automatically records the lineage between these datasets.


9. Security Best Practices

  • Use groups instead of assigning permissions to individual users
  • Apply least privilege principle
  • Separate environments (Dev, Test, Production)
  • Enable audit logging
  • Use service principals for automation
  • Restrict raw data access

10. Governance Architecture Summary

Component Purpose
Identity Provider User authentication
Unity Catalog Centralized governance
Groups Access management
Catalogs and Schemas Logical data organization
Permissions Fine-grained access control

This governance model ensures that enterprise data assets are securely managed, access is properly controlled, and compliance requirements are met while enabling scalable analytics workloads in Databricks on AWS.

No comments:

Post a Comment