Databricks on AWS – Least Privilege Permission Matrix
This matrix describes the minimal AWS and Databricks permissions required to create or manage common platform resources when using Databricks on AWS. The goal is to follow enterprise least-privilege security principles.
| Resource | Primary Owner Role | Required AWS Permissions | Required Databricks Permissions | Purpose | Security / Least Privilege Notes |
|---|---|---|---|---|---|
| Workspace | Platform Admin | iam:CreateRole, iam:AttachRolePolicy, ec2:CreateVpc, s3:CreateBucket | Account Admin | Create Databricks workspace | Automate using Terraform and restrict to platform team |
| Cross Account IAM Role | AWS Cloud Admin | iam:CreateRole, iam:PutRolePolicy, sts:AssumeRole | None | Allows Databricks control plane access | Trust only Databricks account |
| Root Storage (DBFS) | AWS Cloud Admin | s3:CreateBucket, s3:PutBucketPolicy | None | Workspace default storage | Enable encryption and versioning |
| Unity Catalog Metastore | Data Platform Admin | s3:GetObject, s3:PutObject, s3:ListBucket | Metastore Admin | Central governance metadata store | Dedicated metastore bucket |
| Metastore Assignment | Platform Admin | None | Account Admin | Attach metastore to workspace | Single metastore per region recommended |
| Storage Credential | Data Platform Admin | iam:PassRole, sts:AssumeRole | CREATE STORAGE CREDENTIAL | Connect Unity Catalog to S3 | IAM role should allow only specific S3 path |
| External Location | Data Governance Admin | s3:GetObject, s3:PutObject | CREATE EXTERNAL LOCATION | Expose S3 path to Unity Catalog | Use path-level permissions |
| Catalog | Data Governance Admin | Access to storage location | CREATE CATALOG | Top governance layer | One catalog per domain recommended |
| Schema | Data Owner | None | CREATE SCHEMA | Database container | Grant schema-level privileges |
| Delta Table | Data Engineer | S3 read/write | CREATE TABLE | Structured table storage | Use Unity Catalog governance |
| External Table | Data Engineer | S3 read | CREATE TABLE | Reference external dataset | Avoid direct S3 access |
| Notebook | Data Engineer / Analyst | None | Workspace Editor | Analytics code | Store production code in Git |
| Git Repo Integration | Developer | None | Workspace Editor | Version control integration | Use GitHub / GitLab PAT |
| Job / Workflow | Data Engineer | None | CREATE JOB | Automated pipelines | Define jobs as code |
| Cluster | Platform Admin | ec2:RunInstances, iam:PassRole | CREATE CLUSTER | Compute resource | Restrict using cluster policies |
| SQL Warehouse | Data Engineer | None | CREATE SQL WAREHOUSE | Serverless SQL analytics | Limit compute size via policies |
| Cluster Policy | Platform Admin | None | CREATE CLUSTER POLICY | Restrict compute usage | Important governance control |
| Feature Store Table | ML Engineer | S3 read/write | CREATE TABLE | Machine learning features | Stored as Delta tables |
| ML Model Registry | ML Engineer | S3 artifact storage | CREATE MODEL | Track ML model versions | Store artifacts in secure bucket |
| Streaming Checkpoints | Data Engineer | s3:PutObject, s3:GetObject | Job permission | Streaming progress tracking | Separate checkpoint directory |
| Unity Catalog Volume | Data Platform Admin | S3 access | CREATE VOLUME | File storage governance | Alternative to DBFS |
| Audit Logs | Security Team | S3 write | Account Admin | Security auditing | Send logs to SIEM |
| PrivateLink Networking | AWS Cloud Admin | ec2:CreateVpcEndpoint | Account Admin | Private connectivity | Required for highly secure environments |
| DBFS File Upload | User | s3:PutObject | Workspace User | Temporary file storage | Avoid for production data |
No comments:
Post a Comment