🚀 Databricks Serverless Jobs using Terraform (Step-by-Step Guide)
Important: Workspace infrastructure (IAM roles, S3, etc.) is assumed to be pre-created outside Terraform.
🏢 Part 1: Create Databricks Workspace
Step 1: Account-Level Provider
provider "databricks" {
alias = "account"
host = "https://accounts.cloud.databricks.com"
account_id = var.account_id
username = var.username
password = var.password
}
Step 2: Create Workspace
resource "databricks_mws_workspaces" "workspace" {
provider = databricks.account
account_id = var.account_id
aws_region = var.aws_region
workspace_name = "demo-workspace"
credentials_id = var.credentials_id
storage_configuration_id = var.storage_config_id
}
Step 3: Configure Workspace Provider
provider "databricks" {
host = databricks_mws_workspaces.workspace.workspace_url
token = var.workspace_token
}
📓 Part 2: Job WITH Notebook
Step 1: Create Notebook
resource "databricks_notebook" "notebook" {
path = "/Shared/demo-notebook"
language = "PYTHON"
content_base64 = base64encode(<<EOF
print("Hello from Notebook Job")
EOF
)
}
Step 2: Create Job
resource "databricks_job" "notebook_job" {
name = "notebook-job"
task {
task_key = "task1"
notebook_task {
notebook_path = databricks_notebook.notebook.path
}
environment_key = "serverless_env"
}
environment {
key = "serverless_env"
spec {
client = "1"
}
}
schedule {
quartz_cron_expression = "0 0 0 * * ?"
timezone_id = "UTC"
}
}
🐍 Part 3: Job WITHOUT Notebook (Python Script)
Step 1: Create Python Script
resource "databricks_workspace_file" "script" {
path = "/Shared/demo-script.py"
content_base64 = base64encode(<<EOF
print("Hello from Python Script Job")
EOF
)
}
Step 2: Create Job
resource "databricks_job" "python_job" {
name = "python-job"
task {
task_key = "task1"
spark_python_task {
python_file = databricks_workspace_file.script.path
}
environment_key = "serverless_env"
}
environment {
key = "serverless_env"
spec {
client = "1"
}
}
schedule {
quartz_cron_expression = "0 0 0 * * ?"
timezone_id = "UTC"
}
}
▶️ Execution Steps
terraform init
terraform plan
terraform apply
🔥 Key Takeaways
- Workspace is created at account level
- Jobs and notebooks are workspace-level resources
- Serverless jobs require
environment_key - Use Python scripts for production workloads
🚀 Pro Tip
For production workloads, avoid notebooks and use Python scripts or packaged jobs with CI/CD pipelines.
No comments:
Post a Comment