Thursday, 15 January 2026

Enterprise Databricks Onboarding – Identity Setup (Azure AD)

Enterprise Databricks Onboarding – Step 0: Identity Setup (Azure AD)

Enterprise Databricks Onboarding – Step 0: Identity Setup (Azure AD → Databricks)

Identity is the foundation of every enterprise Databricks deployment. Before you talk about clusters, Unity Catalog, or RBAC, you must first answer one question:

Who is the user, and how is their access controlled?

In this step, we integrate Azure Active Directory (Azure AD) with Databricks Enterprise using SSO and SCIM provisioning.

---

Objective of Step 0

  • Azure AD becomes the single source of truth
  • No manual users or groups in Databricks
  • All access is group-based and auditable
  • Identity lifecycle is fully automated
---

High-Level Identity Architecture

+--------------------+
|     Azure AD       |
|--------------------|
| Users              |
| Groups             |
| MFA / CA Policies  |
+---------+----------+
          |
          | 1) SSO (SAML)
          |
          v
+--------------------+
| Databricks Account |
|--------------------|
| Authentication     |
| (Login)            |
+---------+----------+
          |
          | 2) SCIM Provisioning
          |
          v
+-----------------------------+
| Databricks Identity Store   |
|-----------------------------|
| Users (Read-only)           |
| Groups (SCIM-managed)       |
| Memberships                 |
+-----------------------------+
Key Principle:
Azure AD authenticates users (SSO). SCIM provisions users and groups. Databricks never owns identity.
---

First-Time Setup (Greenfield Environment)

Step 1: Define Identity Model

Databricks must consume identities — not create them.

  • ❌ No local Databricks users
  • ❌ No Databricks-only groups
  • ✅ Azure AD is authoritative
---

Step 2: Create Azure AD Groups (RBAC-Oriented)

Create role-based groups, not user-specific ones.

dbx-admins
dbx-platform
dbx-data-engineers
dbx-data-analysts
dbx-ml-engineers
dbx-prod-users
Never assign permissions directly to users later. All permissions must flow from groups.
---

Step 3: Create Azure Databricks Enterprise Application

  1. Azure Portal → Azure Active Directory
  2. Enterprise Applications → New Application
  3. Search for Azure Databricks
  4. Create the application

This application handles both SSO and SCIM provisioning.

---

Step 4: Configure SSO (Authentication)

SSO answers the question: Who are you?

SAML Configuration

Entity ID (Identifier):
https://accounts.azuredatabricks.net

Reply URL (ACS):
https://accounts.azuredatabricks.net/login/saml

User attributes:

email  → user.mail
name   → user.userprincipalname
---

SSO Login Flow

User Browser
     |
     v
Azure AD Login (MFA, CA)
     |
     v
SAML Assertion
     |
     v
Databricks Account Console

After this, users authenticate using corporate credentials only.

---

Step 5: Configure SCIM Provisioning

SCIM answers the question: What access does the user have?

Generate SCIM Token

  1. Databricks Account Console
  2. User Management → Generate SCIM token

Azure AD Provisioning Settings

Tenant URL:
https://accounts.azuredatabricks.net/api/2.0/accounts/<ACCOUNT_ID>/scim/v2

Authentication:
Bearer Token (SCIM Token)
---

SCIM Provisioning Flow

Azure AD
  |
  | Users + Groups + Memberships
  |
  v
SCIM API
  |
  v
Databricks Account
  |
  v
Workspaces / Unity Catalog / Clusters
---

Step 6: Assign Groups to the Application

Only assigned groups are synced.

Assigned Groups:
- dbx-admins
- dbx-data-engineers
- dbx-data-analysts
---

Day-2 Operations (After Go-Live)

Adding a New User

1. Create user in Azure AD
2. Add to dbx-data-engineers
3. SCIM sync runs
4. User appears in Databricks automatically

Removing a User

1. Disable user in Azure AD
2. SCIM removes user from Databricks
3. Access revoked everywhere

Changing User Role

Remove: dbx-data-analysts
Add:    dbx-data-engineers

All permissions update automatically without Databricks admin intervention.

---

Security & Compliance Benefits

  • Centralized identity management
  • Audit-friendly access controls
  • MFA and Conditional Access enforced
  • Zero-trust compatible
  • SOC2 / ISO aligned
---

Final Outcome of Step 0

Authentication → Azure AD
Authorization  → Groups
Provisioning   → SCIM
Databricks     → Identity Consumer

This identity foundation enables:

  • Unity Catalog RBAC
  • Cluster isolation
  • Workspace governance
  • Secure production onboarding
---

Next Blog: Step 1 – Workspace Strategy & Environment Isolation

No comments:

Post a Comment