Enterprise Databricks Onboarding – Step 0: Identity Setup (Azure AD → Databricks)
Identity is the foundation of every enterprise Databricks deployment. Before you talk about clusters, Unity Catalog, or RBAC, you must first answer one question:
Who is the user, and how is their access controlled?
In this step, we integrate Azure Active Directory (Azure AD) with Databricks Enterprise using SSO and SCIM provisioning.
---Objective of Step 0
- Azure AD becomes the single source of truth
- No manual users or groups in Databricks
- All access is group-based and auditable
- Identity lifecycle is fully automated
High-Level Identity Architecture
+--------------------+
| Azure AD |
|--------------------|
| Users |
| Groups |
| MFA / CA Policies |
+---------+----------+
|
| 1) SSO (SAML)
|
v
+--------------------+
| Databricks Account |
|--------------------|
| Authentication |
| (Login) |
+---------+----------+
|
| 2) SCIM Provisioning
|
v
+-----------------------------+
| Databricks Identity Store |
|-----------------------------|
| Users (Read-only) |
| Groups (SCIM-managed) |
| Memberships |
+-----------------------------+
Azure AD authenticates users (SSO). SCIM provisions users and groups. Databricks never owns identity.
First-Time Setup (Greenfield Environment)
Step 1: Define Identity Model
Databricks must consume identities — not create them.
- ❌ No local Databricks users
- ❌ No Databricks-only groups
- ✅ Azure AD is authoritative
Step 2: Create Azure AD Groups (RBAC-Oriented)
Create role-based groups, not user-specific ones.
dbx-admins dbx-platform dbx-data-engineers dbx-data-analysts dbx-ml-engineers dbx-prod-users
Step 3: Create Azure Databricks Enterprise Application
- Azure Portal → Azure Active Directory
- Enterprise Applications → New Application
- Search for Azure Databricks
- Create the application
This application handles both SSO and SCIM provisioning.
---Step 4: Configure SSO (Authentication)
SSO answers the question: Who are you?
SAML Configuration
Entity ID (Identifier): https://accounts.azuredatabricks.net Reply URL (ACS): https://accounts.azuredatabricks.net/login/saml
User attributes:
email → user.mail name → user.userprincipalname---
SSO Login Flow
User Browser
|
v
Azure AD Login (MFA, CA)
|
v
SAML Assertion
|
v
Databricks Account Console
After this, users authenticate using corporate credentials only.
---Step 5: Configure SCIM Provisioning
SCIM answers the question: What access does the user have?
Generate SCIM Token
- Databricks Account Console
- User Management → Generate SCIM token
Azure AD Provisioning Settings
Tenant URL: https://accounts.azuredatabricks.net/api/2.0/accounts/<ACCOUNT_ID>/scim/v2 Authentication: Bearer Token (SCIM Token)---
SCIM Provisioning Flow
Azure AD | | Users + Groups + Memberships | v SCIM API | v Databricks Account | v Workspaces / Unity Catalog / Clusters---
Step 6: Assign Groups to the Application
Only assigned groups are synced.
Assigned Groups: - dbx-admins - dbx-data-engineers - dbx-data-analysts---
Day-2 Operations (After Go-Live)
Adding a New User
1. Create user in Azure AD 2. Add to dbx-data-engineers 3. SCIM sync runs 4. User appears in Databricks automatically
Removing a User
1. Disable user in Azure AD 2. SCIM removes user from Databricks 3. Access revoked everywhere
Changing User Role
Remove: dbx-data-analysts Add: dbx-data-engineers
All permissions update automatically without Databricks admin intervention.
---Security & Compliance Benefits
- Centralized identity management
- Audit-friendly access controls
- MFA and Conditional Access enforced
- Zero-trust compatible
- SOC2 / ISO aligned
Final Outcome of Step 0
Authentication → Azure AD Authorization → Groups Provisioning → SCIM Databricks → Identity Consumer
This identity foundation enables:
- Unity Catalog RBAC
- Cluster isolation
- Workspace governance
- Secure production onboarding
Next Blog: Step 1 – Workspace Strategy & Environment Isolation
No comments:
Post a Comment