Tips to Improve Knowledge
Monday, 13 April 2026
DBX workspace and Job with Github action
To handle the dependency where the Job can only be created after the Workspace is up and running, we use two separate provider blocks for Databricks.
The first provider manages the Account level (creating the workspace), and the second provider uses the workspace_url generated by the first to create the Job.
1. variables.tf
Define your IDs and naming conventions here.
Terraform
variable "databricks_account_id" {
description = "Databricks Account ID from the Account Console"
type = string
}
variable "region" {
default = "us-east-1"
}
variable "workspace_name" {
default = "private-prod-ws"
}
2. main.tf (VPC & IAM Foundation)
This creates the "Private Only" network.
Terraform
provider "aws" {
region = var.region
}
# 1. VPC with no Internet Gateway
resource "aws_vpc" "dbx_vpc" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_resolution = true
tags = { Name = "databricks-private-vpc" }
}
# 2. Subnets
resource "aws_subnet" "priv_a" {
vpc_id = aws_vpc.dbx_vpc.id
cidr_block = "10.0.1.0/24"
availability_zone = "${var.region}a"
}
resource "aws_subnet" "priv_b" {
vpc_id = aws_vpc.dbx_vpc.id
cidr_block = "10.0.2.0/24"
availability_zone = "${var.region}b"
}
# 3. IAM Cross-account Role
data "databricks_aws_assume_role_policy" "this" {
external_id = var.databricks_account_id
}
resource "aws_iam_role" "cross_account" {
name = "databricks-cross-account"
assume_role_policy = data.databricks_aws_assume_role_policy.this.json
}
# S3 Bucket for DBFS Root
resource "aws_s3_bucket" "root_storage" {
bucket = "dbx-root-${var.workspace_name}"
force_destroy = true
}
resource "aws_s3_bucket_public_access_block" "root_storage" {
bucket = aws_s3_bucket.root_storage.id
block_public_acls = true
block_public_policy = true
}
3. databricks_mws.tf (The Workspace)
We use the mws alias to talk to the Databricks Account API.
Terraform
provider "databricks" {
alias = "mws"
host = "https://accounts.cloud.databricks.com"
account_id = var.databricks_account_id
}
resource "databricks_mws_credentials" "this" {
provider = databricks.mws
role_arn = aws_iam_role.cross_account.arn
credentials_name = "${var.workspace_name}-creds"
}
resource "databricks_mws_storage_configurations" "this" {
provider = databricks.mws
bucket_name = aws_s3_bucket.root_storage.bucket
storage_configuration_name = "${var.workspace_name}-storage"
}
resource "databricks_mws_networks" "this" {
provider = databricks.mws
network_name = "${var.workspace_name}-net"
security_group_ids = [aws_security_group.db_sg.id] # Define SG allowing 443
subnet_ids = [aws_subnet.priv_a.id, aws_subnet.priv_b.id]
vpc_id = aws_vpc.dbx_vpc.id
}
resource "databricks_mws_workspaces" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
aws_region = var.region
workspace_name = var.workspace_name
credentials_id = databricks_mws_credentials.this.credentials_id
storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
network_id = databricks_mws_networks.this.network_id
deployment_name = var.workspace_name
}
4. databricks_job.tf (The Job)
Crucial: This provider waits for the workspace_url to be available.
Terraform
provider "databricks" {
alias = "workspace"
# This triggers the dependency: Job won't create until Workspace URL is known
host = databricks_mws_workspaces.this.workspace_url
}
resource "databricks_job" "daily_etl" {
provider = databricks.workspace
name = "Daily_ETL_Job"
job_cluster {
job_cluster_key = "etl_cluster"
new_cluster {
spark_version = "14.3.x-scala2.12"
node_type_id = "m5.large"
num_workers = 2
# Ensure cluster stays in private subnets
aws_attributes {
availability = "SPOT_WITH_FALLBACK"
}
}
}
task {
task_key = "run_notebook"
job_cluster_key = "etl_cluster"
notebook_task {
notebook_path = "/Shared/ETL_Notebook"
}
}
schedule {
quartz_cron_expression = "0 0 1 * * ?" # 1 AM Daily
timezone_id = "UTC"
}
}
5. .github/workflows/deploy.yml
This automates the entire sequence.
YAML
name: "Databricks CI/CD"
on:
push:
branches: [ "main" ]
jobs:
terraform:
runs-on: ubuntu-latest
env:
# AWS Auth
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
# Databricks Auth (Service Principal recommended)
DATABRICKS_ACCOUNT_ID: ${{ secrets.DATABRICKS_ACCOUNT_ID }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: TF Init
run: terraform init
- name: TF Apply
run: terraform apply -auto-approve -var="databricks_account_id=${{ se
Subscribe to:
Comments (Atom)