Friday, 13 March 2026

Databricks on AWS – Least Privilege Permission Matrix

Databricks on AWS Least Privilege Permission Matrix

Databricks on AWS – Least Privilege Permission Matrix

This matrix describes the minimal AWS and Databricks permissions required to create or manage common platform resources when using Databricks on AWS. The goal is to follow enterprise least-privilege security principles.

Resource Primary Owner Role Required AWS Permissions Required Databricks Permissions Purpose Security / Least Privilege Notes
Workspace Platform Admin iam:CreateRole, iam:AttachRolePolicy, ec2:CreateVpc, s3:CreateBucket Account Admin Create Databricks workspace Automate using Terraform and restrict to platform team
Cross Account IAM Role AWS Cloud Admin iam:CreateRole, iam:PutRolePolicy, sts:AssumeRole None Allows Databricks control plane access Trust only Databricks account
Root Storage (DBFS) AWS Cloud Admin s3:CreateBucket, s3:PutBucketPolicy None Workspace default storage Enable encryption and versioning
Unity Catalog Metastore Data Platform Admin s3:GetObject, s3:PutObject, s3:ListBucket Metastore Admin Central governance metadata store Dedicated metastore bucket
Metastore Assignment Platform Admin None Account Admin Attach metastore to workspace Single metastore per region recommended
Storage Credential Data Platform Admin iam:PassRole, sts:AssumeRole CREATE STORAGE CREDENTIAL Connect Unity Catalog to S3 IAM role should allow only specific S3 path
External Location Data Governance Admin s3:GetObject, s3:PutObject CREATE EXTERNAL LOCATION Expose S3 path to Unity Catalog Use path-level permissions
Catalog Data Governance Admin Access to storage location CREATE CATALOG Top governance layer One catalog per domain recommended
Schema Data Owner None CREATE SCHEMA Database container Grant schema-level privileges
Delta Table Data Engineer S3 read/write CREATE TABLE Structured table storage Use Unity Catalog governance
External Table Data Engineer S3 read CREATE TABLE Reference external dataset Avoid direct S3 access
Notebook Data Engineer / Analyst None Workspace Editor Analytics code Store production code in Git
Git Repo Integration Developer None Workspace Editor Version control integration Use GitHub / GitLab PAT
Job / Workflow Data Engineer None CREATE JOB Automated pipelines Define jobs as code
Cluster Platform Admin ec2:RunInstances, iam:PassRole CREATE CLUSTER Compute resource Restrict using cluster policies
SQL Warehouse Data Engineer None CREATE SQL WAREHOUSE Serverless SQL analytics Limit compute size via policies
Cluster Policy Platform Admin None CREATE CLUSTER POLICY Restrict compute usage Important governance control
Feature Store Table ML Engineer S3 read/write CREATE TABLE Machine learning features Stored as Delta tables
ML Model Registry ML Engineer S3 artifact storage CREATE MODEL Track ML model versions Store artifacts in secure bucket
Streaming Checkpoints Data Engineer s3:PutObject, s3:GetObject Job permission Streaming progress tracking Separate checkpoint directory
Unity Catalog Volume Data Platform Admin S3 access CREATE VOLUME File storage governance Alternative to DBFS
Audit Logs Security Team S3 write Account Admin Security auditing Send logs to SIEM
PrivateLink Networking AWS Cloud Admin ec2:CreateVpcEndpoint Account Admin Private connectivity Required for highly secure environments
DBFS File Upload User s3:PutObject Workspace User Temporary file storage Avoid for production data

No comments:

Post a Comment