Tips to Improve Knowledge

To handle the dependency where the Job can only be created after the Workspace is up and running, we use two separate provider blocks for Databricks. The first provider manages the Account level (creating the workspace), and the second provider uses the workspace_url generated by the first to create the Job. 1. variables.tf Define your IDs and naming conventions here. Terraform variable "databricks_account_id" { description = "Databricks Account ID from the Account Console" type = string } variable "region" { default = "us-east-1" } variable "workspace_name" { default = "private-prod-ws" } 2. main.tf (VPC & IAM Foundation) This creates the "Private Only" network. Terraform provider "aws" { region = var.region } # 1. VPC with no Internet Gateway resource "aws_vpc" "dbx_vpc" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_resolution = true tags = { Name = "databricks-private-vpc" } } # 2. Subnets resource "aws_subnet" "priv_a" { vpc_id = aws_vpc.dbx_vpc.id cidr_block = "10.0.1.0/24" availability_zone = "${var.region}a" } resource "aws_subnet" "priv_b" { vpc_id = aws_vpc.dbx_vpc.id cidr_block = "10.0.2.0/24" availability_zone = "${var.region}b" } # 3. IAM Cross-account Role data "databricks_aws_assume_role_policy" "this" { external_id = var.databricks_account_id } resource "aws_iam_role" "cross_account" { name = "databricks-cross-account" assume_role_policy = data.databricks_aws_assume_role_policy.this.json } # S3 Bucket for DBFS Root resource "aws_s3_bucket" "root_storage" { bucket = "dbx-root-${var.workspace_name}" force_destroy = true } resource "aws_s3_bucket_public_access_block" "root_storage" { bucket = aws_s3_bucket.root_storage.id block_public_acls = true block_public_policy = true } 3. databricks_mws.tf (The Workspace) We use the mws alias to talk to the Databricks Account API. Terraform provider "databricks" { alias = "mws" host = "https://accounts.cloud.databricks.com" account_id = var.databricks_account_id } resource "databricks_mws_credentials" "this" { provider = databricks.mws role_arn = aws_iam_role.cross_account.arn credentials_name = "${var.workspace_name}-creds" } resource "databricks_mws_storage_configurations" "this" { provider = databricks.mws bucket_name = aws_s3_bucket.root_storage.bucket storage_configuration_name = "${var.workspace_name}-storage" } resource "databricks_mws_networks" "this" { provider = databricks.mws network_name = "${var.workspace_name}-net" security_group_ids = [aws_security_group.db_sg.id] # Define SG allowing 443 subnet_ids = [aws_subnet.priv_a.id, aws_subnet.priv_b.id] vpc_id = aws_vpc.dbx_vpc.id } resource "databricks_mws_workspaces" "this" { provider = databricks.mws account_id = var.databricks_account_id aws_region = var.region workspace_name = var.workspace_name credentials_id = databricks_mws_credentials.this.credentials_id storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id network_id = databricks_mws_networks.this.network_id deployment_name = var.workspace_name } 4. databricks_job.tf (The Job) Crucial: This provider waits for the workspace_url to be available. Terraform provider "databricks" { alias = "workspace" # This triggers the dependency: Job won't create until Workspace URL is known host = databricks_mws_workspaces.this.workspace_url } resource "databricks_job" "daily_etl" { provider = databricks.workspace name = "Daily_ETL_Job" job_cluster { job_cluster_key = "etl_cluster" new_cluster { spark_version = "14.3.x-scala2.12" node_type_id = "m5.large" num_workers = 2 # Ensure cluster stays in private subnets aws_attributes { availability = "SPOT_WITH_FALLBACK" } } } task { task_key = "run_notebook" job_cluster_key = "etl_cluster" notebook_task { notebook_path = "/Shared/ETL_Notebook" } } schedule { quartz_cron_expression = "0 0 1 * * ?" # 1 AM Daily timezone_id = "UTC" } } 5. .github/workflows/deploy.yml This automates the entire sequence. YAML name: "Databricks CI/CD" on: push: branches: [ "main" ] jobs: terraform: runs-on: ubuntu-latest env: # AWS Auth AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} # Databricks Auth (Service Principal recommended) DATABRICKS_ACCOUNT_ID: ${{ secrets.DATABRICKS_ACCOUNT_ID }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - name: Checkout Code uses: actions/checkout@v4 - name: Setup Terraform uses: hashicorp/setup-terraform@v3 - name: TF Init run: terraform init - name: TF Apply run: terraform apply -auto-approve -var="databricks_account_id=${{ se

Tips to Improve Knowledge

Monday, 13 April 2026

DBX workspace and Job with Github action