Databricks Serverless on AWS – IAM Roles, Policies, and Security Best Practices
Overview
In Databricks Serverless on AWS, IAM roles are required to securely enable cross-account access between the Databricks control plane and your AWS account. Unlike traditional clusters, serverless removes the need for instance profiles and instead relies on Unity Catalog and cross-account roles.
1. Cross-Account Role (Control Plane Role)
Purpose
- Allows Databricks control plane to access AWS resources
- Used for workspace validation, metadata access, and configuration
- Does NOT perform data modifications
Trust Policy (External ID Required)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DatabricksAssumeRole",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<DATABRICKS_ACCOUNT_ID>:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<UNIQUE_EXTERNAL_ID>"
}
}
}
]
}
Why External ID is Required
- Prevents the Confused Deputy Problem
- Ensures only your Databricks workspace can assume the role
- Mandatory for secure cross-account access
Permissions Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3ReadAccessForValidation",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::my-data-bucket"
},
{
"Sid": "GlueReadAccess",
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartitions"
],
"Resource": "*"
},
{
"Sid": "CloudWatchReadLogs",
"Effect": "Allow",
"Action": [
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
],
"Resource": "*"
}
]
}
Why These Permissions
- s3:ListBucket – Validate bucket existence
- s3:GetBucketLocation – Ensure region alignment
- glue:Get* – Read metadata for tables
- logs:Describe* – Optional monitoring and debugging
Security Note: No write access is granted to ensure least privilege.
2. Unity Catalog Storage Credential Role (Data Access Role)
Purpose
- Provides data access for serverless compute
- Used by Unity Catalog for governance
- Replaces instance profile roles
Trust Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DatabricksUnityCatalogAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<DATABRICKS_ACCOUNT_ID>:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<UNIQUE_EXTERNAL_ID>"
}
}
}
]
}
Permissions Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3DataAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::my-data-bucket/*"
},
{
"Sid": "S3ListAccess",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::my-data-bucket"
}
]
}
Why These Permissions
- s3:GetObject – Read data
- s3:PutObject – Write data
- s3:DeleteObject – Cleanup/overwrite
- s3:ListBucket – Required for query planning
Optional: KMS Permissions
If your S3 bucket uses encryption:
{
"Effect": "Allow",
"Action": [
"kms:Decrypt",
"kms:Encrypt",
"kms:GenerateDataKey"
],
"Resource": "<KMS_KEY_ARN>"
}
Why Needed
- Decrypt data during reads
- Encrypt data during writes
What You Should NOT Include
- s3:* – Too broad
- iam:* – Security risk
- ec2:* – Not required in serverless
- glue:* (write) – Prevent schema tampering
Architecture Summary
Databricks Serverless Compute
│
▼
Assume Role (with External ID)
│
├── Cross-Account Role → Metadata access
└── Storage Credential Role → S3 data access
Key Takeaways
- External ID is mandatory for security
- No instance profile role in serverless
- Cross-account role = control plane access
- Unity Catalog role = data plane access
- Strict least privilege must be enforced
Final Summary
- Cross-account role (read-only, control plane)
- Unity Catalog storage credential role (data access)
- Optional KMS permissions
This setup ensures a secure, scalable, and enterprise-ready Databricks Serverless architecture on AWS.
No comments:
Post a Comment