Thursday, 19 March 2026

Databricks Job with S3, Volume and JAR (Serverless)

Databricks Job with S3, Volume and JAR (Serverless)

Databricks Serverless Job with S3 + Volume + Notebook

๐Ÿงฉ Architecture

S3 (JAR / Libraries) → Databricks Volume → Notebook → Databricks Job
  • S3 stores JAR files and artifacts
  • Volume (Unity Catalog) provides governed access
  • Notebook runs logic
  • Job executes workload

๐Ÿ” IAM Role & Policy

IAM Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3ReadArtifacts",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-artifact-bucket",
        "arn:aws:s3:::my-artifact-bucket/*"
      ]
    },
    {
      "Sid": "S3VolumeAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::my-volume-bucket/*"
      ]
    }
  ]
}

Trust Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam:::root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": ""
        }
      }
    }
  ]
}

๐Ÿงฑ Unity Catalog Setup

Create Storage Credential

CREATE STORAGE CREDENTIAL my_cred
WITH IAM_ROLE = 'arn:aws:iam::123456789012:role/databricks-role';

Create External Location

CREATE EXTERNAL LOCATION my_ext_loc
URL 's3://my-volume-bucket/'
WITH (STORAGE CREDENTIAL my_cred);

Create Volume

CREATE VOLUME my_catalog.my_schema.my_volume
LOCATION 's3://my-volume-bucket/vol/';

๐Ÿ“ Notebook Example

volume_path = "/Volumes/my_catalog/my_schema/my_volume/"

# Copy JAR from S3 to Volume
dbutils.fs.cp(
    "s3://my-artifact-bucket/libs/my-app.jar",
    volume_path + "my-app.jar"
)

# List files
display(dbutils.fs.ls(volume_path))

⚙️ Using JAR in Job

Option 1: Add JAR in Notebook

spark.sparkContext.addJar("/Volumes/my_catalog/my_schema/my_volume/my-app.jar")

Option 2: Job Configuration (Recommended)

/Volumes/my_catalog/my_schema/my_volume/my-app.jar

๐Ÿš€ Databricks Job JSON

{
  "name": "jar-test-job",
  "tasks": [
    {
      "task_key": "run-notebook",
      "notebook_task": {
        "notebook_path": "/Workspace/Users/test/notebook"
      },
      "libraries": [
        {
          "jar": "/Volumes/my_catalog/my_schema/my_volume/my-app.jar"
        }
      ]
    }
  ]
}

๐Ÿงช Test Scenarios

Positive Tests

  • JAR loads successfully
  • Notebook executes without error
  • Volume is accessible

Negative Tests

  • No permission on volume → Access Denied
  • Invalid IAM role → Storage credential failure
  • Missing JAR → Job failure

✅ Summary

ComponentPurpose
S3Stores JAR and artifacts
IAM RoleGrants access to S3
Storage CredentialConnects Databricks to AWS
External LocationMaps S3 to Databricks
VolumeSecure file access layer
NotebookExecutes logic
JobRuns the workflow

No comments:

Post a Comment