Thursday, 15 January 2026

Enterprise Databricks on AWS – Identity, Workspace Isolation, Unity Catalog & RBAC (Terraform-Only)

Enterprise Databricks on AWS – Identity, Workspace Isolation, Unity Catalog & RBAC (Terraform-Only)

Enterprise Databricks on AWS – Terraform-First Architecture

This article explains how to build a fully automated, enterprise-grade Databricks platform on AWS using Terraform only, covering:

  • SCIM & Identity automation
  • Workspace creation and isolation
  • Unity Catalog metastore & data isolation
  • Catalog, schema, table-level RBAC
  • Row-level security using dynamic views
  • Cross-account AWS data sharing

High-Level Enterprise Architecture

AWS Account (Databricks Account)
│
├── Account Console
│   ├── SCIM Users & Groups (Terraform)
│   ├── Unity Catalog Metastore (Terraform)
│   └── Workspaces (Dev / QA / Prod)
│
├── AWS Account A (Prod Data)
│   ├── S3 UC Managed Location
│   └── IAM Role (External Location)
│
├── AWS Account B (Analytics)
│   └── Read-only access via UC Sharing
│
└── Azure AD / Okta
    └── Identity Source (SSO + SCIM)
Design Principle: Identity, access, and data governance are controlled centrally at the Databricks Account level.

1. SCIM Group Automation with Terraform

Why SCIM Matters

SCIM ensures that Databricks users and groups are never created manually. Azure AD (or Okta) remains the source of truth.

Terraform – Databricks Account Provider

provider "databricks" {
  alias      = "account"
  host       = "https://accounts.cloud.databricks.com"
  account_id = var.databricks_account_id
}

Create Groups (Mirrors Azure AD)

resource "databricks_group" "data_engineers" {
  provider     = databricks.account
  display_name = "data-engineers"
}

resource "databricks_group" "data_scientists" {
  provider     = databricks.account
  display_name = "data-scientists"
}

Assign Users (SCIM)

resource "databricks_user" "alice" {
  provider  = databricks.account
  user_name = "alice@company.com"
}

resource "databricks_group_member" "alice_engineers" {
  provider  = databricks.account
  group_id = databricks_group.data_engineers.id
  member_id = databricks_user.alice.id
}
Result: Azure AD → SCIM → Databricks is now fully automated.

2. Workspace Creation & Environment Isolation

Enterprise Workspace Strategy

  • One workspace per environment
  • Dev cannot modify Prod
  • Shared metastore across workspaces

Create Workspace (AWS)

resource "databricks_mws_workspaces" "prod" {
  provider      = databricks.account
  workspace_name = "prod-workspace"
  aws_region     = "us-east-1"

  credentials_id = databricks_mws_credentials.this.credentials_id
  storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
}

Attach Groups to Workspace

resource "databricks_mws_permission_assignment" "prod_admins" {
  provider     = databricks.account
  workspace_id = databricks_mws_workspaces.prod.workspace_id
  principal_id = databricks_group.data_engineers.id
  permissions  = ["ADMIN"]
}

3. Unity Catalog Metastore – Terraform-Only

Create Metastore

resource "databricks_metastore" "main" {
  provider     = databricks.account
  name         = "enterprise-metastore"
  region       = "us-east-1"
  storage_root = "s3://uc-metastore-root/"
}

Attach Metastore to Workspace

resource "databricks_metastore_assignment" "prod" {
  provider     = databricks.account
  workspace_id = databricks_mws_workspaces.prod.workspace_id
  metastore_id = databricks_metastore.main.id
}

4. Unity Catalog RBAC as Code (grants.tf)

Create Catalogs per Domain

resource "databricks_catalog" "finance" {
  name = "finance"
}

Create Schemas

resource "databricks_schema" "payments" {
  name       = "payments"
  catalog_name = databricks_catalog.finance.name
}

Grant Permissions

resource "databricks_grants" "finance_read" {
  catalog = databricks_catalog.finance.name

  grant {
    principal  = "data-scientists"
    privileges = ["USE_CATALOG"]
  }
}
All permissions are version-controlled and auditable.

5. Row-Level Security (Dynamic Views)

Use Case

  • US team sees US data
  • EU team sees EU data

Dynamic View

CREATE OR REPLACE VIEW finance.payments.secure_payments AS
SELECT *
FROM finance.payments.raw
WHERE region = current_user();
No data duplication. No application-side filtering.

6. Cross-Account AWS Sharing with Unity Catalog

Producer Account (Prod)

CREATE SHARE finance_share;
ALTER SHARE finance_share ADD TABLE finance.payments.raw;

Consumer Account

CREATE CATALOG finance_shared
USING SHARE finance_share
WITH PROVIDER databricks;
S3 access is mediated by UC – not IAM users.

7. Decision Diagrams for Architects

Identity Decision

Azure AD
 ├── Manual Users ❌
 └── SCIM + SSO ✅

Data Access Decision

IAM Policies ❌
Unity Catalog Grants ✅

Security Model

Workspace ACLs → Compute
Unity Catalog → Data

What This Enables Next

  • Prod data read-only from Dev
  • Cluster RBAC enforced
  • Auditor-friendly access logs
  • Multi-account AWS sharing
This is the reference architecture used by regulated enterprises.

Suggested Multi-Post Series

  1. Identity & SCIM Automation
  2. Workspace Isolation Strategy
  3. Unity Catalog Deep Dive
  4. RBAC & Data Security Patterns
  5. Cross-Account Data Sharing

No comments:

Post a Comment