Tuesday, 6 January 2026

Databricks Unity Catalog Governance Example

Databricks Unity Catalog Governance Example

Databricks Unity Catalog Governance Example

5. Unity Catalog Object Hierarchy

Unity Catalog organizes data assets in a hierarchical structure:

Metastore
   └── Catalog
         └── Schema
               └── Tables / Views / Functions

Example Enterprise Structure:

Metastore: enterprise_metastore

Catalogs
 ├── raw_data
 │     └── bronze
 │          └── customer_raw
 │
 ├── curated_data
 │     └── silver
 │          └── customer_clean
 │
 └── analytics
       └── gold
            └── customer_revenue_summary

6. Example Governance Implementation

We have three teams and example users:

TeamRole
Data EngineersBuild ETL pipelines
Data ScientistsBuild ML models
Business AnalystsQuery reporting data

Users:

  • john.engineer@company.com
  • sara.scientist@company.com
  • mike.analyst@company.com

Groups:

  • data_engineers
  • data_scientists
  • business_analysts

Step 1: Create Groups (SQL)

CREATE GROUP data_engineers;
CREATE GROUP data_scientists;
CREATE GROUP business_analysts;

Step 2: Add Users to Groups (SQL)

ALTER GROUP data_engineers ADD USER `john.engineer@company.com`;
ALTER GROUP data_scientists ADD USER `sara.scientist@company.com`;
ALTER GROUP business_analysts ADD USER `mike.analyst@company.com`;

Step 3: Create Catalogs (SQL)

CREATE CATALOG raw_data COMMENT 'Raw ingestion data catalog';
CREATE CATALOG curated_data COMMENT 'Processed datasets catalog';
CREATE CATALOG analytics COMMENT 'Business reporting catalog';

Step 4: Assign Catalog Ownership (SQL)

ALTER CATALOG raw_data OWNER TO data_engineers;
ALTER CATALOG curated_data OWNER TO data_engineers;
ALTER CATALOG analytics OWNER TO data_scientists;

Step 5: Create Schemas (SQL)

USE CATALOG raw_data;
CREATE SCHEMA bronze COMMENT 'Raw ingestion layer';

USE CATALOG curated_data;
CREATE SCHEMA silver COMMENT 'Cleaned data layer';

USE CATALOG analytics;
CREATE SCHEMA gold COMMENT 'Business reporting layer';

Step 6: Create Tables (SQL)

-- Bronze Table
CREATE TABLE raw_data.bronze.customer_raw (
  customer_id STRING,
  name STRING,
  email STRING,
  created_date TIMESTAMP
) USING DELTA;

-- Silver Table
CREATE TABLE curated_data.silver.customer_clean (
  customer_id STRING,
  name STRING,
  email STRING,
  created_date TIMESTAMP
) USING DELTA;

-- Gold Table
CREATE TABLE analytics.gold.customer_revenue_summary (
  customer_id STRING,
  total_revenue DOUBLE,
  last_purchase DATE
) USING DELTA;

Step 7: Assign Permissions (SQL)

Data Engineers:

GRANT USE CATALOG ON CATALOG raw_data TO data_engineers;
GRANT USE SCHEMA ON SCHEMA raw_data.bronze TO data_engineers;
GRANT CREATE TABLE ON SCHEMA raw_data.bronze TO data_engineers;
GRANT MODIFY ON SCHEMA raw_data.bronze TO data_engineers;

Data Scientists:

GRANT USE CATALOG ON CATALOG curated_data TO data_scientists;
GRANT USE SCHEMA ON SCHEMA curated_data.silver TO data_scientists;
GRANT SELECT ON ALL TABLES IN SCHEMA curated_data.silver TO data_scientists;

Business Analysts:

GRANT USE CATALOG ON CATALOG analytics TO business_analysts;
GRANT USE SCHEMA ON SCHEMA analytics.gold TO business_analysts;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics.gold TO business_analysts;

Python API Examples

Using the databricks-sdk for managing Unity Catalog programmatically:

# Install SDK
pip install databricks-sdk

Create Catalog

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

w.catalogs.create(
    name="raw_data",
    comment="Raw ingestion data"
)

Create Schema

w.schemas.create(
    name="bronze",
    catalog_name="raw_data",
    comment="Raw ingestion layer"
)

Grant Permissions

w.grants.update(
    securable_type="SCHEMA",
    securable_name="raw_data.bronze",
    changes=[
        {
            "principal": "data_engineers",
            "add": ["CREATE_TABLE", "USE_SCHEMA"]
        }
    ]
)

Create Table

spark.sql("""
CREATE TABLE raw_data.bronze.customer_raw (
  customer_id STRING,
  name STRING,
  email STRING
) USING DELTA
""")

Enterprise Governance Summary

LayerAccess
BronzeData Engineers
SilverData Engineers + Data Scientists
GoldBusiness Analysts

Roles:

  • Metastore Admin: Manage governance
  • Data Engineer: ETL pipelines
  • Data Scientist: ML & modeling
  • Analyst: Reporting

Best Practices:

  • Use groups instead of individual users
  • Restrict raw data access
  • Enable audit logging
  • Use external locations for S3
  • Enforce least privilege access

No comments:

Post a Comment