Databricks on AWS – Complete Networking & Security Architecture Guide
This document explains how Databricks is deployed securely on AWS, focusing on:
- VPC & subnet design
- Control plane vs data plane
- IAM roles & instance profiles
- Security groups & traffic flow
- PrivateLink (frontend & backend)
1️⃣ Databricks Architecture Overview
::contentReference[oaicite:0]{index=0}Control Plane vs Data Plane
| Plane | Owned By | What Runs Here |
|---|---|---|
| Control Plane | Databricks | UI, REST APIs, Jobs scheduler, notebooks metadata |
| Data Plane | Customer AWS Account | Clusters, Spark executors, DBFS root, data access |
Key rule: Your data never leaves your AWS account.
2️⃣ VPC Design (Customer-Managed)
Why Customer-Managed VPC?
- Network isolation
- PrivateLink support
- Compliance (SOC2, PCI, HIPAA)
Recommended VPC Layout
VPC (10.0.0.0/16)
│
├── Private Subnet A (10.0.1.0/24)
│ └── Databricks Workers
│
├── Private Subnet B (10.0.2.0/24)
│ └── Databricks Workers
│
├── Public Subnet (optional)
│ └── NAT Gateway
│
└── VPC Endpoints
├── S3
├── STS
├── Kinesis (optional)
└── Databricks PrivateLink
Databricks clusters should never be in public subnets.
3️⃣ Subnets & Routing
Private Subnets
- No public IPs
- Route to NAT Gateway (only if needed)
- Preferred: VPC endpoints instead of NAT
Route Table (Private Subnet)
0.0.0.0/0 → NAT Gateway (optional) pl-xxxxxx → Databricks PrivateLink s3 → Gateway Endpoint
4️⃣ Security Groups (CRITICAL)
Databricks Cluster Security Group
| Direction | Port | Source | Purpose |
|---|---|---|---|
| Inbound | All | Self | Worker ↔ Worker communication |
| Outbound | 443 | 0.0.0.0/0 or VPC endpoints | Control plane, S3, APIs |
Databricks requires full intra-cluster communication.
5️⃣ IAM Roles & Instance Profiles
Why IAM Roles?
- No access keys on clusters
- Least privilege data access
- Auditable via CloudTrail
Databricks EC2 Role
Trust Policy: Service: ec2.amazonaws.com
Permissions Policy
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::prod-data",
"arn:aws:s3:::prod-data/*"
]
}
Instance Profile
- IAM Role → Instance Profile
- Attached to Databricks clusters
6️⃣ PrivateLink Architecture
::contentReference[oaicite:1]{index=1}Frontend PrivateLink
- Users access Databricks UI privately
- No public internet exposure
Backend PrivateLink
- Clusters talk to control plane privately
- No NAT gateway required
Required VPC Endpoints
| Endpoint | Type |
|---|---|
| Databricks Control Plane | Interface |
| S3 | Gateway |
| STS | Interface |
| CloudWatch | Interface |
7️⃣ Traffic Flow (End-to-End)
User Browser ↓ (PrivateLink) Databricks Control Plane ↓ (PrivateLink) Cluster Driver (Private Subnet) ↓ S3 via VPC Endpoint
At no point does traffic traverse the public internet.
8️⃣ Common Enterprise Decisions
| Decision | Recommendation |
|---|---|
| Public vs Private workspace | Private (PrivateLink) |
| NAT Gateway | Avoid if endpoints available |
| IAM Users | Never |
| Data access | IAM Roles + Unity Catalog |
9️⃣ What This Enables Next
- Zero-trust Databricks deployment
- Unity Catalog enforced security
- Cross-account data sharing
- Audit-ready architecture
10️⃣ Typical Enterprise Follow-Up Topics
- Terraform modules for networking
- Private DNS for Databricks
- Multi-account AWS architecture
- Cost & network optimization
This architecture is used by banks, healthcare, and regulated enterprises.
No comments:
Post a Comment