Friday, 13 March 2026

Databricks Architecture Matrix

Databricks Architecture Matrix (Serverless on AWS)

Databricks Architecture Matrix (Serverless on AWS)

This document explains where major Databricks components reside when running Databricks Serverless on AWS. Databricks architecture is divided into two planes:

  • Control Plane – Managed by Databricks
  • Data Plane – Runs in the customer AWS account

Control Plane Components

Component Purpose / What It Does Where It Runs or Is Stored Plane
Workspace UIWeb interface to access notebooks, jobs, dashboardsDatabricks SaaS infrastructureControl Plane
Workspace APIsREST APIs for automation, Terraform, CLIDatabricks SaaSControl Plane
Users & GroupsIdentity and user managementDatabricks account servicesControl Plane
Authentication / SSOIntegrates with IdP such as Okta or Azure ADDatabricks account servicesControl Plane
Permissions / RBACAccess control policiesDatabricks control servicesControl Plane
Notebook Source CodePython / SQL / Scala notebooksWorkspace storageControl Plane
Notebook OutputsCharts and result previewsWorkspace storageControl Plane
Workspace FilesFiles uploaded to workspaceWorkspace storageControl Plane
Repos (Git Integration)GitHub / Git integrationWorkspace metadataControl Plane
Job SchedulerSchedules pipelines and jobsDatabricks orchestration serviceControl Plane
WorkflowsPipeline orchestrationDatabricks orchestration serviceControl Plane
SQL Query PlannerOptimizes SQL queriesDatabricks query servicesControl Plane
SQL Warehouse ManagementManages serverless SQL endpointsDatabricks control servicesControl Plane
Unity CatalogCentral governance systemDatabricks governance serviceControl Plane
MetastoreStores catalog and table metadataDatabricks metadata servicesControl Plane
Data LineageTracks data dependenciesDatabricks governance servicesControl Plane
Audit LogsSecurity and governance logsDatabricks governance servicesControl Plane
Cluster ManagementManages compute lifecycleDatabricks control servicesControl Plane
Feature Store MetadataML feature definitionsDatabricks metadata servicesControl Plane
Model Registry MetadataTracks ML model versionsDatabricks metadata servicesControl Plane
Lakehouse Monitoring MetadataTracks dataset qualityDatabricks monitoring servicesControl Plane
Vector Search MetadataVector index configurationDatabricks control servicesControl Plane

Data Plane Components (Customer AWS Account)

Component Purpose / What It Does Where It Runs or Is Stored Plane
Serverless Spark ComputeRuns notebooks and jobsAWS compute instancesData Plane
Databricks SQL Warehouse ComputeExecutes SQL queriesAWS compute instancesData Plane
Delta Table DataActual table dataS3Data Plane
Managed TablesDatabricks managed tablesS3Data Plane
External TablesTables referencing external datasetsS3Data Plane
DBFS RootDatabricks File System rootS3 bucketData Plane
Unity Catalog Managed StorageTable storage governed by Unity CatalogS3Data Plane
Unity Catalog VolumesFile governance storageS3Data Plane
MLflow Model ArtifactsML models and artifactsS3Data Plane
Feature Store DataML feature datasetsS3Data Plane
Vector Search Index DataVector embeddingsS3Data Plane
Streaming CheckpointsStreaming job progressS3Data Plane
Temporary Spark Shuffle DataIntermediate processing dataLocal disk / S3Data Plane
Job Execution LogsSpark execution logsS3Data Plane
ML Training DataTraining datasetsS3Data Plane
Delta Transaction LogsTable versioning metadataS3Data Plane

Architecture Flow

Databricks Control Plane (Managed by Databricks) Workspace UI Authentication Unity Catalog Metastore Metadata Query Planner Job Scheduler Notebook Code | | Secure API v AWS Data Plane (Customer AWS Account) Serverless Spark Compute SQL Warehouses Delta Tables DBFS Storage ML Models Spark Temp Storage | v Amazon S3 (Customer Data Lake)

Key Architecture Rule

TypeLocation
MetadataDatabricks Control Plane
DataCustomer AWS S3
ComputeCustomer AWS
Governance PoliciesUnity Catalog (Control Plane)

No comments:

Post a Comment