Friday, 16 January 2026

Databricks on AWS – Networking, Security & PrivateLink Architecture (Deep Dive)

Databricks on AWS – Networking, Security & PrivateLink Architecture (Deep Dive)

Databricks on AWS – Complete Networking & Security Architecture Guide

This document explains how Databricks is deployed securely on AWS, focusing on:

  • VPC & subnet design
  • Control plane vs data plane
  • IAM roles & instance profiles
  • Security groups & traffic flow
  • PrivateLink (frontend & backend)

1️⃣ Databricks Architecture Overview

::contentReference[oaicite:0]{index=0}

Control Plane vs Data Plane

PlaneOwned ByWhat Runs Here
Control Plane Databricks UI, REST APIs, Jobs scheduler, notebooks metadata
Data Plane Customer AWS Account Clusters, Spark executors, DBFS root, data access
Key rule: Your data never leaves your AWS account.

2️⃣ VPC Design (Customer-Managed)

Why Customer-Managed VPC?

  • Network isolation
  • PrivateLink support
  • Compliance (SOC2, PCI, HIPAA)

Recommended VPC Layout

VPC (10.0.0.0/16)
│
├── Private Subnet A (10.0.1.0/24)
│   └── Databricks Workers
│
├── Private Subnet B (10.0.2.0/24)
│   └── Databricks Workers
│
├── Public Subnet (optional)
│   └── NAT Gateway
│
└── VPC Endpoints
    ├── S3
    ├── STS
    ├── Kinesis (optional)
    └── Databricks PrivateLink
Databricks clusters should never be in public subnets.

3️⃣ Subnets & Routing

Private Subnets

  • No public IPs
  • Route to NAT Gateway (only if needed)
  • Preferred: VPC endpoints instead of NAT

Route Table (Private Subnet)

0.0.0.0/0 → NAT Gateway (optional)
pl-xxxxxx → Databricks PrivateLink
s3 → Gateway Endpoint

4️⃣ Security Groups (CRITICAL)

Databricks Cluster Security Group

DirectionPortSourcePurpose
Inbound All Self Worker ↔ Worker communication
Outbound 443 0.0.0.0/0 or VPC endpoints Control plane, S3, APIs
Databricks requires full intra-cluster communication.

5️⃣ IAM Roles & Instance Profiles

Why IAM Roles?

  • No access keys on clusters
  • Least privilege data access
  • Auditable via CloudTrail

Databricks EC2 Role

Trust Policy:
Service: ec2.amazonaws.com

Permissions Policy

{
  "Effect": "Allow",
  "Action": [
    "s3:GetObject",
    "s3:PutObject",
    "s3:ListBucket"
  ],
  "Resource": [
    "arn:aws:s3:::prod-data",
    "arn:aws:s3:::prod-data/*"
  ]
}

Instance Profile

  • IAM Role → Instance Profile
  • Attached to Databricks clusters

6️⃣ PrivateLink Architecture

::contentReference[oaicite:1]{index=1}

Frontend PrivateLink

  • Users access Databricks UI privately
  • No public internet exposure

Backend PrivateLink

  • Clusters talk to control plane privately
  • No NAT gateway required

Required VPC Endpoints

EndpointType
Databricks Control PlaneInterface
S3Gateway
STSInterface
CloudWatchInterface

7️⃣ Traffic Flow (End-to-End)

User Browser
  ↓ (PrivateLink)
Databricks Control Plane
  ↓ (PrivateLink)
Cluster Driver (Private Subnet)
  ↓
S3 via VPC Endpoint
At no point does traffic traverse the public internet.

8️⃣ Common Enterprise Decisions

DecisionRecommendation
Public vs Private workspace Private (PrivateLink)
NAT Gateway Avoid if endpoints available
IAM Users Never
Data access IAM Roles + Unity Catalog

9️⃣ What This Enables Next

  • Zero-trust Databricks deployment
  • Unity Catalog enforced security
  • Cross-account data sharing
  • Audit-ready architecture

10️⃣ Typical Enterprise Follow-Up Topics

  • Terraform modules for networking
  • Private DNS for Databricks
  • Multi-account AWS architecture
  • Cost & network optimization
This architecture is used by banks, healthcare, and regulated enterprises.

No comments:

Post a Comment