Enterprise Databricks Automation on AWS
SCIM, Unity Catalog RBAC, Row-Level Security & Security-as-Code
Where This Fits in the Enterprise Series
| Post | Topic |
|---|---|
| Step 0 | Identity setup (SSO + SCIM) |
| Step 1 | Workspace strategy & environment isolation |
| Step 2 | Unity Catalog metastore & data isolation |
| Step 3 | Identity, RBAC & data security as code (this post) |
| Step 4 | CI/CD & promotion pipelines |
1️⃣ SCIM Group Automation with Terraform
Why SCIM Automation Is Mandatory
- No manual user or group creation
- Identity source of truth = IdP
- Permissions change automatically with group membership
Provider Configuration (Account Level)
provider "databricks" {
host = var.databricks_account_host
token = var.databricks_account_token
}
Create Groups via Terraform
resource "databricks_group" "prod_engineers" {
display_name = "group_prod_engineers"
}
resource "databricks_group" "dev_engineers" {
display_name = "group_dev_engineers"
}
Add Users to Groups
resource "databricks_group_member" "prod_user" {
group_id = databricks_group.prod_engineers.id
member_id = databricks_user.user_a.id
}
In real enterprises, users are synced automatically from IdP via SCIM.
Terraform manages only group-level logic.
2️⃣ Unity Catalog RBAC as Code (grants.tf)
Why RBAC as Code Matters
- Auditable permissions
- No UI drift
- Consistent across environments
grants.tf – Catalog-Level Access
resource "databricks_grants" "catalog_usage" {
catalog = "prod_catalog"
grant {
principal = "group_prod_engineers"
privileges = ["USAGE"]
}
grant {
principal = "group_dev_engineers"
privileges = ["USAGE"]
}
}
Schema-Level RBAC
resource "databricks_grants" "sales_schema" {
schema = "prod_catalog.sales"
grant {
principal = "group_prod_engineers"
privileges = ["CREATE", "SELECT"]
}
grant {
principal = "group_dev_engineers"
privileges = ["SELECT"]
}
}
Table-Level RBAC
resource "databricks_grants" "customers_table" {
table = "prod_catalog.sales.customers"
grant {
principal = "group_prod_engineers"
privileges = ["SELECT", "MODIFY"]
}
grant {
principal = "group_dev_engineers"
privileges = ["SELECT"]
}
}
Permissions are enforced by Unity Catalog at query execution time.
3️⃣ Row-Level Security (Dynamic Views)
Use Case
| User Group | Allowed Country |
|---|---|
| group_us_analysts | USA |
| group_eu_analysts | EU |
Base Table (Restricted)
REVOKE ALL PRIVILEGES ON TABLE prod_catalog.sales.customers FROM PUBLIC;
Dynamic View with RLS
CREATE VIEW prod_catalog.sales.customers_secure AS
SELECT *
FROM prod_catalog.sales.customers
WHERE
(is_member('group_us_analysts') AND country = 'USA')
OR
(is_member('group_eu_analysts') AND country = 'EU');
Grant Access Only to the View
GRANT SELECT ON VIEW prod_catalog.sales.customers_secure
TO `group_us_analysts`, `group_eu_analysts`;
Row-level security is enforced automatically based on group membership.
No application changes required.
4️⃣ End-to-End Access Example
| User | Group | Result |
|---|---|---|
| User A | group_prod_engineers | Read + Write all rows |
| User B | group_dev_engineers | Read-only |
| User C | group_us_analysts | USA rows only |
5️⃣ CI/CD Flow (Security Included)
Git Commit ↓ Terraform Apply ↓ SCIM Groups + Workspaces + UC Grants ↓ Users Login ↓ Access Automatically Enforced
6️⃣ Common Enterprise Anti-Patterns
- Granting permissions to users instead of groups
- Direct access to base tables (no views)
- Mixing Dev and Prod users in same workspace
- Manual permission changes via UI
7️⃣ Why Auditors Love This Setup
- All access is code-reviewed
- Clear separation of duties
- Full traceability in Git
- Zero manual overrides
8️⃣ Enterprise Databricks Blog Series Roadmap
| Post | Description |
|---|---|
| Part 1 | Identity, SSO & SCIM architecture |
| Part 2 | Workspace isolation & networking |
| Part 3 | Unity Catalog & RBAC as code |
| Part 4 | Row-level & column-level security |
| Part 5 | CI/CD promotion Dev → Prod |
| Part 6 | Operating Databricks at scale |
Final Takeaway
This approach gives you:
- Enterprise-grade security by design
- Zero-touch onboarding
- Strong compliance posture
- Infrastructure and data security as code
This is how Databricks is run in regulated enterprises.
No comments:
Post a Comment