Thursday, 15 January 2026

Unity Catalog Metastore & Data Isolation – Enterprise Deep Dive

Unity Catalog Metastore & Data Isolation – Enterprise Deep Dive

Unity Catalog Metastore & Data Isolation

Enterprise-Level Technical Deep Dive with Real Examples (AWS Databricks)


1. What a Unity Catalog Metastore Really Is

A Unity Catalog metastore is the central security and governance control plane for Databricks. It owns:

  • All metadata (catalogs, schemas, tables, views, functions)
  • All permissions (RBAC, RLS, CLS)
  • Access to physical storage through credentials and locations
The workspace is NOT the security boundary for data. The metastore is.

2. Metastore Scope & Design Decision

Enterprise Best Practice

One Metastore per:
- Cloud
- Region
- Compliance Boundary

Why This Matters

  • Enables cross-workspace data sharing
  • Centralizes governance and audit
  • Prevents duplicated security logic
Anti-pattern: One metastore per workspace This breaks data sharing and multiplies governance overhead.

3. Real Enterprise Architecture (AWS)

AWS Account
│
├── Unity Catalog Metastore (us-east-1)
│   ├── Storage Root
│   ├── Storage Credentials
│   ├── External Locations
│   ├── Catalog: prod
│   └── Catalog: dev
│
├── Databricks Workspace: dev
└── Databricks Workspace: prod

Both workspaces attach to the same metastore.


4. Metastore Storage Root

The storage root is the default storage for managed tables. Users never access this directly.

Example


s3://company-uc-root/

IAM Role Permissions

  • s3:GetObject
  • s3:PutObject
  • s3:ListBucket
Users and clusters do NOT get these permissions directly.

5. Storage Credentials

A storage credential is a Unity Catalog object that wraps an IAM role.

Example


CREATE STORAGE CREDENTIAL prod_storage_cred
WITH IAM_ROLE 'arn:aws:iam::123456789:role/dbx-prod-uc-role';

This decouples cloud IAM from users completely.


6. External Locations (Actual Data Isolation)

External locations bind:

  • S3 path
  • Storage credential

Example


CREATE EXTERNAL LOCATION prod_sales_loc
URL 's3://prod-sales-data/'
WITH STORAGE CREDENTIAL prod_storage_cred;
Without an external location, Unity Catalog blocks access — even if S3 exists.

7. Catalog-Level Isolation

Catalogs are the first logical isolation layer.

Example


CREATE CATALOG prod;
CREATE CATALOG dev;

Access Control


GRANT USAGE ON CATALOG prod TO `group_prod_users`;

8. Schema-Level Isolation

Schemas isolate teams or business domains.

Example


CREATE SCHEMA prod.sales;
CREATE SCHEMA prod.finance;

GRANT SELECT ON SCHEMA prod.sales
TO `group_sales_analytics`;

9. Table-Level Isolation

Tables are where most security risk exists.

Example


GRANT SELECT, MODIFY
ON TABLE prod.sales.customers
TO `group_sales_engineers`;
Never grant access to PUBLIC.

10. Cross-Workspace Data Sharing

Scenario

  • Dev workspace needs read-only access to Prod data

Solution


GRANT SELECT
ON TABLE prod.sales.customers
TO `group_dev_engineers`;

No S3 access required. Unity Catalog enforces this.


11. Row-Level Security (Dynamic Views)

Business Rule

GroupCountry Access
group_us_analystsUSA
group_eu_analystsEU

Dynamic View


CREATE VIEW prod.sales.customers_secure AS
SELECT *
FROM prod.sales.customers
WHERE
  (is_member('group_us_analysts') AND country = 'US')
  OR
  (is_member('group_eu_analysts') AND country = 'EU');

12. Column-Level Security

Example


CREATE VIEW prod.sales.customers_masked AS
SELECT
  id,
  name,
  CASE
    WHEN is_member('group_pii_admins') THEN ssn
    ELSE 'XXX-XX-XXXX'
  END AS ssn
FROM prod.sales.customers;

13. Managed vs External Tables

TypeStorageUse Case
ManagedUC RootDev, sandbox
ExternalExternal LocationProd, regulated data

14. How Security Is Actually Enforced

  • At query planning
  • At query execution

Even if a user knows the S3 path, Unity Catalog blocks access.


15. Auditing & Lineage

Unity Catalog automatically captures:

  • Who accessed what
  • Which queries touched which tables
  • Downstream dependencies

Example Query


SELECT * FROM system.access.audit;

16. Common Enterprise Mistakes

  • Multiple metastores per environment
  • Granting S3 access to users
  • Relying on workspace ACLs for data
  • No catalog separation

17. Enterprise Golden Rules

  1. One metastore per region
  2. Always use groups
  3. Never grant to PUBLIC
  4. Use views for sensitive data
  5. Treat UC as a security firewall

18. End-to-End Access Example

UserGroupReadWrite
User Agroup_prod_engineersAllYes
User Bgroup_dev_engineersAllNo
User Cgroup_us_analystsUS onlyNo

Final Summary

Unity Catalog is not just metadata. It is your data firewall, governance engine, and compliance backbone.

If the metastore is designed correctly, everything else becomes simple.

No comments:

Post a Comment