Databricks Architecture Matrix (Serverless on AWS)
Databricks Architecture Matrix (Serverless on AWS)
This document explains where major Databricks components reside when running
Databricks Serverless on AWS. Databricks architecture is divided into two planes:
- Control Plane – Managed by Databricks
- Data Plane – Runs in the customer AWS account
Control Plane Components
| Component |
Purpose / What It Does |
Where It Runs or Is Stored |
Plane |
| Workspace UI | Web interface to access notebooks, jobs, dashboards | Databricks SaaS infrastructure | Control Plane |
| Workspace APIs | REST APIs for automation, Terraform, CLI | Databricks SaaS | Control Plane |
| Users & Groups | Identity and user management | Databricks account services | Control Plane |
| Authentication / SSO | Integrates with IdP such as Okta or Azure AD | Databricks account services | Control Plane |
| Permissions / RBAC | Access control policies | Databricks control services | Control Plane |
| Notebook Source Code | Python / SQL / Scala notebooks | Workspace storage | Control Plane |
| Notebook Outputs | Charts and result previews | Workspace storage | Control Plane |
| Workspace Files | Files uploaded to workspace | Workspace storage | Control Plane |
| Repos (Git Integration) | GitHub / Git integration | Workspace metadata | Control Plane |
| Job Scheduler | Schedules pipelines and jobs | Databricks orchestration service | Control Plane |
| Workflows | Pipeline orchestration | Databricks orchestration service | Control Plane |
| SQL Query Planner | Optimizes SQL queries | Databricks query services | Control Plane |
| SQL Warehouse Management | Manages serverless SQL endpoints | Databricks control services | Control Plane |
| Unity Catalog | Central governance system | Databricks governance service | Control Plane |
| Metastore | Stores catalog and table metadata | Databricks metadata services | Control Plane |
| Data Lineage | Tracks data dependencies | Databricks governance services | Control Plane |
| Audit Logs | Security and governance logs | Databricks governance services | Control Plane |
| Cluster Management | Manages compute lifecycle | Databricks control services | Control Plane |
| Feature Store Metadata | ML feature definitions | Databricks metadata services | Control Plane |
| Model Registry Metadata | Tracks ML model versions | Databricks metadata services | Control Plane |
| Lakehouse Monitoring Metadata | Tracks dataset quality | Databricks monitoring services | Control Plane |
| Vector Search Metadata | Vector index configuration | Databricks control services | Control Plane |
Data Plane Components (Customer AWS Account)
| Component |
Purpose / What It Does |
Where It Runs or Is Stored |
Plane |
| Serverless Spark Compute | Runs notebooks and jobs | AWS compute instances | Data Plane |
| Databricks SQL Warehouse Compute | Executes SQL queries | AWS compute instances | Data Plane |
| Delta Table Data | Actual table data | S3 | Data Plane |
| Managed Tables | Databricks managed tables | S3 | Data Plane |
| External Tables | Tables referencing external datasets | S3 | Data Plane |
| DBFS Root | Databricks File System root | S3 bucket | Data Plane |
| Unity Catalog Managed Storage | Table storage governed by Unity Catalog | S3 | Data Plane |
| Unity Catalog Volumes | File governance storage | S3 | Data Plane |
| MLflow Model Artifacts | ML models and artifacts | S3 | Data Plane |
| Feature Store Data | ML feature datasets | S3 | Data Plane |
| Vector Search Index Data | Vector embeddings | S3 | Data Plane |
| Streaming Checkpoints | Streaming job progress | S3 | Data Plane |
| Temporary Spark Shuffle Data | Intermediate processing data | Local disk / S3 | Data Plane |
| Job Execution Logs | Spark execution logs | S3 | Data Plane |
| ML Training Data | Training datasets | S3 | Data Plane |
| Delta Transaction Logs | Table versioning metadata | S3 | Data Plane |
Architecture Flow
Databricks Control Plane (Managed by Databricks)
Workspace UI
Authentication
Unity Catalog
Metastore Metadata
Query Planner
Job Scheduler
Notebook Code
|
| Secure API
v
AWS Data Plane (Customer AWS Account)
Serverless Spark Compute
SQL Warehouses
Delta Tables
DBFS Storage
ML Models
Spark Temp Storage
|
v
Amazon S3 (Customer Data Lake)
Key Architecture Rule
| Type | Location |
| Metadata | Databricks Control Plane |
| Data | Customer AWS S3 |
| Compute | Customer AWS |
| Governance Policies | Unity Catalog (Control Plane) |
No comments:
Post a Comment