Monday, 30 March 2026

Databricks Serverless Jobs with Terraform

Databricks Serverless Jobs with Terraform

🚀 Databricks Serverless Jobs using Terraform (Step-by-Step Guide)

Important: Workspace infrastructure (IAM roles, S3, etc.) is assumed to be pre-created outside Terraform.

🏢 Part 1: Create Databricks Workspace

Step 1: Account-Level Provider

provider "databricks" {
  alias      = "account"
  host       = "https://accounts.cloud.databricks.com"
  account_id = var.account_id
  username   = var.username
  password   = var.password
}

Step 2: Create Workspace

resource "databricks_mws_workspaces" "workspace" {
  provider = databricks.account

  account_id = var.account_id
  aws_region = var.aws_region

  workspace_name = "demo-workspace"

  credentials_id           = var.credentials_id
  storage_configuration_id = var.storage_config_id
}

Step 3: Configure Workspace Provider

provider "databricks" {
  host  = databricks_mws_workspaces.workspace.workspace_url
  token = var.workspace_token
}

📓 Part 2: Job WITH Notebook

Step 1: Create Notebook

resource "databricks_notebook" "notebook" {
  path     = "/Shared/demo-notebook"
  language = "PYTHON"

  content_base64 = base64encode(<<EOF
print("Hello from Notebook Job")
EOF
  )
}

Step 2: Create Job

resource "databricks_job" "notebook_job" {
  name = "notebook-job"

  task {
    task_key = "task1"

    notebook_task {
      notebook_path = databricks_notebook.notebook.path
    }

    environment_key = "serverless_env"
  }

  environment {
    key = "serverless_env"

    spec {
      client = "1"
    }
  }

  schedule {
    quartz_cron_expression = "0 0 0 * * ?"
    timezone_id            = "UTC"
  }
}

🐍 Part 3: Job WITHOUT Notebook (Python Script)

Step 1: Create Python Script

resource "databricks_workspace_file" "script" {
  path = "/Shared/demo-script.py"

  content_base64 = base64encode(<<EOF
print("Hello from Python Script Job")
EOF
  )
}

Step 2: Create Job

resource "databricks_job" "python_job" {
  name = "python-job"

  task {
    task_key = "task1"

    spark_python_task {
      python_file = databricks_workspace_file.script.path
    }

    environment_key = "serverless_env"
  }

  environment {
    key = "serverless_env"

    spec {
      client = "1"
    }
  }

  schedule {
    quartz_cron_expression = "0 0 0 * * ?"
    timezone_id            = "UTC"
  }
}

▶️ Execution Steps

terraform init
terraform plan
terraform apply

🔥 Key Takeaways

  • Workspace is created at account level
  • Jobs and notebooks are workspace-level resources
  • Serverless jobs require environment_key
  • Use Python scripts for production workloads

🚀 Pro Tip

For production workloads, avoid notebooks and use Python scripts or packaged jobs with CI/CD pipelines.

No comments:

Post a Comment