Friday, 13 March 2026

Databricks Roles Full Reference Matri

Databricks Roles Full Reference Matrix

Databricks Roles – Full Reference Matrix

This table includes Workspace Roles, Account Roles, and Unity Catalog Roles with exact capabilities.

Role Category Capabilities / Permissions Notes
Workspace Admin Workspace
  • Manage users and groups
  • Assign workspace roles
  • Create/manage clusters
  • Restart/terminate all clusters
  • Create/manage jobs and workflows
  • Create SQL warehouses
  • Manage secrets, libraries, instance profiles
  • Access DBFS (read/write)
  • Run notebooks and jobs
Full control of workspace; does NOT grant automatic data access in Unity Catalog
User Workspace
  • Create/edit/run own notebooks
  • Create/run jobs
  • Create clusters (if allowed by cluster policies)
  • Access DBFS (read/write)
  • Use SQL warehouses (if permitted)
Cannot manage other users or workspace settings
Can Manage / Job Creator Workspace
  • Create/manage own jobs and clusters
  • Run notebooks
  • Upload files to DBFS
Limited admin; cannot manage other users or workspace-wide settings
Viewer Workspace
  • Read-only access to notebooks, dashboards
  • View clusters and jobs
  • Read access to DBFS (if allowed)
No write permissions
Account Admin Account
  • Create and delete workspaces
  • Assign workspace admins
  • Manage metastore assignments
  • Access account-wide audit logs
  • Manage billing / usage
Full control over account; workspace-level roles must still be respected
Billing / Support Roles Account
  • View usage and billing
  • Access technical support
Cannot manage workspace or data; read-only account permissions
Metastore Admin Unity Catalog
  • Create catalogs and schemas
  • Create storage credentials and external locations
  • Assign catalog-level permissions
  • Grant/revoke data access
Full control over UC metadata; does NOT give workspace admin rights
Catalog Owner Unity Catalog
  • Manage catalog and contained schemas
  • Grant/revoke access at catalog level
Limited to one catalog; cannot manage other catalogs
Schema Owner Unity Catalog
  • Manage schema and contained tables/views
  • Grant/revoke access at schema level
Cannot manage catalog-level permissions
Volume Owner Unity Catalog
  • Manage managed volumes (file storage)
  • Grant/revoke access to volumes
Access to volume paths only
Data Access Roles (SELECT / MODIFY / USAGE) Unity Catalog
  • Read/write/query specific tables, views, volumes
  • Can be granted granular privileges via grants
Applied per-object; separate from workspace admin rights

Databricks on AWS – Least Privilege Permission Matrix

Databricks on AWS Least Privilege Permission Matrix

Databricks on AWS – Least Privilege Permission Matrix

This matrix describes the minimal AWS and Databricks permissions required to create or manage common platform resources when using Databricks on AWS. The goal is to follow enterprise least-privilege security principles.

Resource Primary Owner Role Required AWS Permissions Required Databricks Permissions Purpose Security / Least Privilege Notes
Workspace Platform Admin iam:CreateRole, iam:AttachRolePolicy, ec2:CreateVpc, s3:CreateBucket Account Admin Create Databricks workspace Automate using Terraform and restrict to platform team
Cross Account IAM Role AWS Cloud Admin iam:CreateRole, iam:PutRolePolicy, sts:AssumeRole None Allows Databricks control plane access Trust only Databricks account
Root Storage (DBFS) AWS Cloud Admin s3:CreateBucket, s3:PutBucketPolicy None Workspace default storage Enable encryption and versioning
Unity Catalog Metastore Data Platform Admin s3:GetObject, s3:PutObject, s3:ListBucket Metastore Admin Central governance metadata store Dedicated metastore bucket
Metastore Assignment Platform Admin None Account Admin Attach metastore to workspace Single metastore per region recommended
Storage Credential Data Platform Admin iam:PassRole, sts:AssumeRole CREATE STORAGE CREDENTIAL Connect Unity Catalog to S3 IAM role should allow only specific S3 path
External Location Data Governance Admin s3:GetObject, s3:PutObject CREATE EXTERNAL LOCATION Expose S3 path to Unity Catalog Use path-level permissions
Catalog Data Governance Admin Access to storage location CREATE CATALOG Top governance layer One catalog per domain recommended
Schema Data Owner None CREATE SCHEMA Database container Grant schema-level privileges
Delta Table Data Engineer S3 read/write CREATE TABLE Structured table storage Use Unity Catalog governance
External Table Data Engineer S3 read CREATE TABLE Reference external dataset Avoid direct S3 access
Notebook Data Engineer / Analyst None Workspace Editor Analytics code Store production code in Git
Git Repo Integration Developer None Workspace Editor Version control integration Use GitHub / GitLab PAT
Job / Workflow Data Engineer None CREATE JOB Automated pipelines Define jobs as code
Cluster Platform Admin ec2:RunInstances, iam:PassRole CREATE CLUSTER Compute resource Restrict using cluster policies
SQL Warehouse Data Engineer None CREATE SQL WAREHOUSE Serverless SQL analytics Limit compute size via policies
Cluster Policy Platform Admin None CREATE CLUSTER POLICY Restrict compute usage Important governance control
Feature Store Table ML Engineer S3 read/write CREATE TABLE Machine learning features Stored as Delta tables
ML Model Registry ML Engineer S3 artifact storage CREATE MODEL Track ML model versions Store artifacts in secure bucket
Streaming Checkpoints Data Engineer s3:PutObject, s3:GetObject Job permission Streaming progress tracking Separate checkpoint directory
Unity Catalog Volume Data Platform Admin S3 access CREATE VOLUME File storage governance Alternative to DBFS
Audit Logs Security Team S3 write Account Admin Security auditing Send logs to SIEM
PrivateLink Networking AWS Cloud Admin ec2:CreateVpcEndpoint Account Admin Private connectivity Required for highly secure environments
DBFS File Upload User s3:PutObject Workspace User Temporary file storage Avoid for production data

Databricks Architecture Matrix with DR and Terraform Resources

Databricks Architecture Matrix with DR and Terraform Resources

Databricks Architecture Matrix (Serverless on AWS)

This document shows where major Databricks components live when using Serverless on AWS, including Disaster Recovery strategies and Terraform resources used for automation.

Control Plane Components

Component Purpose Where It Runs Plane DR Best Practice Terraform Resource
Workspace Main analytics workspace Databricks SaaS Control Plane Create secondary workspace in another region databricks_mws_workspaces
Users User identity Databricks account Control Plane Use centralized IdP databricks_user
Groups Access management Databricks account Control Plane Manage via SCIM databricks_group
Group Membership User-group association Databricks account Control Plane Recreate from IaC databricks_group_member
Notebook Source Code Notebook scripts Workspace storage Control Plane Store notebooks in Git databricks_notebook
Repos (Git Integration) Source code integration Workspace metadata Control Plane Keep Git remote as source of truth databricks_repo
Job Scheduler Pipeline scheduling Databricks control services Control Plane Define jobs as code databricks_job
Cluster Configuration Compute definition Databricks control services Control Plane Recreate clusters via IaC databricks_cluster
SQL Warehouse Serverless SQL endpoint Databricks control services Control Plane Recreate warehouse in DR region databricks_sql_endpoint
Unity Catalog Metastore Metadata store Databricks metadata service Control Plane Replicate configuration databricks_metastore
Unity Catalog Catalog Top level data container Databricks governance service Control Plane Recreate catalogs databricks_catalog
Unity Catalog Schema Database layer Databricks governance service Control Plane Recreate schema structure databricks_schema
Permissions Access control policies Databricks governance service Control Plane Store as code databricks_grants
Model Registry ML model version tracking Databricks metadata services Control Plane Replicate model metadata databricks_mlflow_model
Feature Store Metadata ML feature definitions Databricks metadata services Control Plane Store definitions in Git databricks_feature_table

Data Plane Components (AWS)

Component Purpose Where It Runs Plane DR Best Practice Terraform Resource
S3 Data Lake Primary storage AWS S3 Data Plane Enable cross-region replication aws_s3_bucket
Delta Tables Structured data storage S3 Data Plane Replicate bucket aws_s3_bucket
DBFS Root Storage Databricks filesystem S3 Data Plane Enable bucket versioning aws_s3_bucket
MLflow Artifact Storage Stores ML models S3 Data Plane Replicate artifact bucket aws_s3_bucket
Streaming Checkpoints Streaming progress tracking S3 Data Plane Replicate checkpoint folders aws_s3_bucket
Feature Store Data ML training features S3 Data Plane Enable replication aws_s3_bucket
Execution Logs Spark logs S3 Data Plane Central logging system aws_s3_bucket
Serverless Spark Compute Job execution AWS compute Data Plane Use multi-region workspace N/A (managed by Databricks)
Temporary Spark Shuffle Data Intermediate processing Compute disk Data Plane No DR required N/A

Architecture Flow

Databricks Control Plane Workspace Unity Catalog Metastore Jobs Clusters SQL Warehouses | | Secure API v AWS Data Plane Serverless Spark Compute Delta Tables ML Models Streaming Checkpoints | v Amazon S3 Data Lake

Databricks Architecture Matrix with DR Best Practices

Databricks Architecture Matrix with DR Best Practices

Databricks Architecture Matrix (Serverless on AWS) with DR Best Practices

This document explains where major Databricks components reside when running Databricks Serverless on AWS and the recommended Disaster Recovery (DR) strategy for each component.

Control Plane Components

Component Purpose Where It Runs Plane DR Best Practice
Workspace UIUser interface for notebooks and jobsDatabricks SaaSControl PlaneCreate secondary workspace in another region
Workspace APIsAutomation APIsDatabricks SaaSControl PlaneAutomate infrastructure using Terraform
Users & GroupsUser identity managementDatabricks account servicesControl PlaneUse centralized IdP like Okta/Azure AD
Authentication / SSOLogin via external identity providerDatabricks account servicesControl PlaneConfigure SSO redundancy at IdP level
Permissions / RBACAccess control policiesDatabricks control servicesControl PlaneStore policies as code using Terraform
Notebook Source CodeNotebook scriptsWorkspace storageControl PlaneSync notebooks with Git repositories
Notebook OutputsCharts and query resultsWorkspace storageControl PlaneDo not rely on outputs; regenerate from data
Workspace FilesFiles uploaded to workspaceWorkspace storageControl PlaneStore important files in S3 or Git
Repos (Git Integration)Git source control integrationWorkspace metadataControl PlaneMaintain source code in GitHub/GitLab
Job SchedulerSchedules workflowsDatabricks orchestration serviceControl PlaneDefine jobs using Infrastructure-as-Code
WorkflowsPipeline orchestrationDatabricks orchestration serviceControl PlaneExport workflows via API and Terraform
SQL Query PlannerSQL optimization engineDatabricks query servicesControl PlaneNo DR needed (managed by Databricks)
SQL Warehouse ManagementServerless SQL managementDatabricks control servicesControl PlaneRecreate warehouses in secondary region
Unity CatalogCentral governance systemDatabricks governance serviceControl PlaneReplicate catalogs configuration using scripts
MetastoreMetadata storageDatabricks metadata servicesControl PlaneExport metadata periodically
Data LineageTracks data relationshipsDatabricks governance servicesControl PlaneExport lineage metadata via APIs
Audit LogsSecurity logsDatabricks governance servicesControl PlaneSend logs to centralized SIEM storage
Cluster ManagementCompute lifecycle managementDatabricks control servicesControl PlaneRecreate clusters via automation
Feature Store MetadataFeature definitionsDatabricks metadata servicesControl PlaneBackup definitions in Git
Model Registry MetadataML model trackingDatabricks metadata servicesControl PlaneReplicate registry configuration
Lakehouse Monitoring MetadataDataset monitoring metricsDatabricks monitoring servicesControl PlaneExport monitoring metrics
Vector Search MetadataVector index configurationDatabricks control servicesControl PlaneRecreate vector indexes from embeddings

Data Plane Components (Customer AWS)

Component Purpose Where It Runs Plane DR Best Practice
Serverless Spark ComputeExecutes jobsAWS computeData PlaneDeploy in multi-region workspace
SQL Warehouse ComputeSQL query executionAWS computeData PlaneProvision warehouses in secondary region
Delta Table DataTable storageS3Data PlaneEnable S3 cross-region replication
Managed TablesManaged table storageS3Data PlaneUse versioned S3 buckets
External TablesExternal dataset storageS3Data PlaneReplicate underlying S3 storage
DBFS RootDatabricks filesystemS3Data PlaneEnable bucket replication
Unity Catalog Managed StorageCatalog table storageS3Data PlaneCross-region replication
Unity Catalog VolumesGoverned file storageS3Data PlaneReplicate S3 buckets
MLflow Model ArtifactsML modelsS3Data PlaneReplicate artifact bucket
Feature Store DataML feature datasetsS3Data PlaneS3 replication and versioning
Vector Search Index DataEmbedding storageS3Data PlaneRebuild indexes from replicated embeddings
Streaming CheckpointsStreaming progress trackingS3Data PlaneReplicate checkpoint directories
Temporary Spark Shuffle DataIntermediate processingCompute diskData PlaneNo DR required (recomputed)
Job Execution LogsSpark logsS3Data PlaneSend logs to centralized logging system
ML Training DataTraining datasetsS3Data PlaneMulti-region S3 replication
Delta Transaction LogsTable version metadataS3Data PlaneProtect using S3 versioning

Architecture Flow

Databricks Control Plane (Managed by Databricks) Workspace UI Authentication Unity Catalog Metastore Metadata Query Planner Job Scheduler | | Secure API v AWS Data Plane (Customer Account) Serverless Spark Compute SQL Warehouses Delta Tables ML Models Spark Temp Storage | v Amazon S3 Data Lake

Databricks Architecture Matrix

Databricks Architecture Matrix (Serverless on AWS)

Databricks Architecture Matrix (Serverless on AWS)

This document explains where major Databricks components reside when running Databricks Serverless on AWS. Databricks architecture is divided into two planes:

  • Control Plane – Managed by Databricks
  • Data Plane – Runs in the customer AWS account

Control Plane Components

Component Purpose / What It Does Where It Runs or Is Stored Plane
Workspace UIWeb interface to access notebooks, jobs, dashboardsDatabricks SaaS infrastructureControl Plane
Workspace APIsREST APIs for automation, Terraform, CLIDatabricks SaaSControl Plane
Users & GroupsIdentity and user managementDatabricks account servicesControl Plane
Authentication / SSOIntegrates with IdP such as Okta or Azure ADDatabricks account servicesControl Plane
Permissions / RBACAccess control policiesDatabricks control servicesControl Plane
Notebook Source CodePython / SQL / Scala notebooksWorkspace storageControl Plane
Notebook OutputsCharts and result previewsWorkspace storageControl Plane
Workspace FilesFiles uploaded to workspaceWorkspace storageControl Plane
Repos (Git Integration)GitHub / Git integrationWorkspace metadataControl Plane
Job SchedulerSchedules pipelines and jobsDatabricks orchestration serviceControl Plane
WorkflowsPipeline orchestrationDatabricks orchestration serviceControl Plane
SQL Query PlannerOptimizes SQL queriesDatabricks query servicesControl Plane
SQL Warehouse ManagementManages serverless SQL endpointsDatabricks control servicesControl Plane
Unity CatalogCentral governance systemDatabricks governance serviceControl Plane
MetastoreStores catalog and table metadataDatabricks metadata servicesControl Plane
Data LineageTracks data dependenciesDatabricks governance servicesControl Plane
Audit LogsSecurity and governance logsDatabricks governance servicesControl Plane
Cluster ManagementManages compute lifecycleDatabricks control servicesControl Plane
Feature Store MetadataML feature definitionsDatabricks metadata servicesControl Plane
Model Registry MetadataTracks ML model versionsDatabricks metadata servicesControl Plane
Lakehouse Monitoring MetadataTracks dataset qualityDatabricks monitoring servicesControl Plane
Vector Search MetadataVector index configurationDatabricks control servicesControl Plane

Data Plane Components (Customer AWS Account)

Component Purpose / What It Does Where It Runs or Is Stored Plane
Serverless Spark ComputeRuns notebooks and jobsAWS compute instancesData Plane
Databricks SQL Warehouse ComputeExecutes SQL queriesAWS compute instancesData Plane
Delta Table DataActual table dataS3Data Plane
Managed TablesDatabricks managed tablesS3Data Plane
External TablesTables referencing external datasetsS3Data Plane
DBFS RootDatabricks File System rootS3 bucketData Plane
Unity Catalog Managed StorageTable storage governed by Unity CatalogS3Data Plane
Unity Catalog VolumesFile governance storageS3Data Plane
MLflow Model ArtifactsML models and artifactsS3Data Plane
Feature Store DataML feature datasetsS3Data Plane
Vector Search Index DataVector embeddingsS3Data Plane
Streaming CheckpointsStreaming job progressS3Data Plane
Temporary Spark Shuffle DataIntermediate processing dataLocal disk / S3Data Plane
Job Execution LogsSpark execution logsS3Data Plane
ML Training DataTraining datasetsS3Data Plane
Delta Transaction LogsTable versioning metadataS3Data Plane

Architecture Flow

Databricks Control Plane (Managed by Databricks) Workspace UI Authentication Unity Catalog Metastore Metadata Query Planner Job Scheduler Notebook Code | | Secure API v AWS Data Plane (Customer AWS Account) Serverless Spark Compute SQL Warehouses Delta Tables DBFS Storage ML Models Spark Temp Storage | v Amazon S3 (Customer Data Lake)

Key Architecture Rule

TypeLocation
MetadataDatabricks Control Plane
DataCustomer AWS S3
ComputeCustomer AWS
Governance PoliciesUnity Catalog (Control Plane)