Wednesday, 18 March 2026

Databricks Serverless on AWS – IAM Roles, Policies, and Security Best Practices

Databricks Serverless on AWS – IAM Roles, Policies, and Security Best Practices

Overview

In Databricks Serverless on AWS, IAM roles are required to securely enable cross-account access between the Databricks control plane and your AWS account. Unlike traditional clusters, serverless removes the need for instance profiles and instead relies on Unity Catalog and cross-account roles.


1. Cross-Account Role (Control Plane Role)

Purpose

  • Allows Databricks control plane to access AWS resources
  • Used for workspace validation, metadata access, and configuration
  • Does NOT perform data modifications

Trust Policy (External ID Required)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DatabricksAssumeRole",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<DATABRICKS_ACCOUNT_ID>:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<UNIQUE_EXTERNAL_ID>"
        }
      }
    }
  ]
}

Why External ID is Required

  • Prevents the Confused Deputy Problem
  • Ensures only your Databricks workspace can assume the role
  • Mandatory for secure cross-account access

Permissions Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3ReadAccessForValidation",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::my-data-bucket"
    },
    {
      "Sid": "GlueReadAccess",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartitions"
      ],
      "Resource": "*"
    },
    {
      "Sid": "CloudWatchReadLogs",
      "Effect": "Allow",
      "Action": [
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": "*"
    }
  ]
}

Why These Permissions

  • s3:ListBucket – Validate bucket existence
  • s3:GetBucketLocation – Ensure region alignment
  • glue:Get* – Read metadata for tables
  • logs:Describe* – Optional monitoring and debugging

Security Note: No write access is granted to ensure least privilege.


2. Unity Catalog Storage Credential Role (Data Access Role)

Purpose

  • Provides data access for serverless compute
  • Used by Unity Catalog for governance
  • Replaces instance profile roles

Trust Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DatabricksUnityCatalogAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<DATABRICKS_ACCOUNT_ID>:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<UNIQUE_EXTERNAL_ID>"
        }
      }
    }
  ]
}

Permissions Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3DataAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::my-data-bucket/*"
    },
    {
      "Sid": "S3ListAccess",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": "arn:aws:s3:::my-data-bucket"
    }
  ]
}

Why These Permissions

  • s3:GetObject – Read data
  • s3:PutObject – Write data
  • s3:DeleteObject – Cleanup/overwrite
  • s3:ListBucket – Required for query planning

Optional: KMS Permissions

If your S3 bucket uses encryption:

{
  "Effect": "Allow",
  "Action": [
    "kms:Decrypt",
    "kms:Encrypt",
    "kms:GenerateDataKey"
  ],
  "Resource": "<KMS_KEY_ARN>"
}

Why Needed

  • Decrypt data during reads
  • Encrypt data during writes

What You Should NOT Include

  • s3:* – Too broad
  • iam:* – Security risk
  • ec2:* – Not required in serverless
  • glue:* (write) – Prevent schema tampering

Architecture Summary


Databricks Serverless Compute
        │
        ▼
Assume Role (with External ID)
        │
        ├── Cross-Account Role → Metadata access
        └── Storage Credential Role → S3 data access

Key Takeaways

  • External ID is mandatory for security
  • No instance profile role in serverless
  • Cross-account role = control plane access
  • Unity Catalog role = data plane access
  • Strict least privilege must be enforced

Final Summary

  • Cross-account role (read-only, control plane)
  • Unity Catalog storage credential role (data access)
  • Optional KMS permissions

This setup ensures a secure, scalable, and enterprise-ready Databricks Serverless architecture on AWS.

No comments:

Post a Comment