Thursday, 19 March 2026

Databricks Serverless Job with JAR from S3 via Volume

Databricks Serverless Job with JAR from S3 via Volume

Databricks Serverless Job with JAR (S3 → Volume → Notebook → Job)

๐ŸŽฏ Goal

  • Upload JAR to S3
  • Create Databricks Volume
  • Copy JAR to Volume
  • Create Notebook
  • Create Job via UI
  • Run and validate

๐Ÿงช 1️⃣ Sample Test JAR

HelloSpark.java

package com.example;

import org.apache.spark.sql.SparkSession;

public class HelloSpark {
    public static void main(String[] args) {

        SparkSession spark = SparkSession.builder()
                .appName("Test JAR Job")
                .getOrCreate();

        long count = spark.range(1, 100).count();

        System.out.println("Count is: " + count);

        spark.stop();
    }
}

pom.xml

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.example</groupId>
  <artifactId>hello-spark</artifactId>
  <version>1.0</version>

  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.5.0</version>
      <scope>provided</scope>
    </dependency>
  </dependencies>
</project>

Build JAR

mvn clean package

Output: target/hello-spark-1.0.jar


☁️ 2️⃣ Upload JAR to S3

aws s3 cp target/hello-spark-1.0.jar s3://my-artifact-bucket/libs/

๐Ÿงฑ 3️⃣ Databricks UI Setup

Step 1: Create Storage Credential

  • Go to: Data → Credentials
  • Click: Create Credential
  • Name: my_cred
  • IAM Role ARN: your role ARN

Step 2: Create External Location

  • Go to: Data → External Locations
  • Name: my_ext_loc
  • URL: s3://my-volume-bucket/
  • Credential: my_cred

Step 3: Create Volume

  • Go to: Catalog → Schema
  • Create Volume: my_volume

๐Ÿ“ 4️⃣ Copy JAR to Volume

volume_path = "/Volumes/my_catalog/my_schema/my_volume/"

dbutils.fs.cp(
    "s3://my-artifact-bucket/libs/hello-spark-1.0.jar",
    volume_path + "hello-spark.jar"
)

display(dbutils.fs.ls(volume_path))

๐Ÿ““ 5️⃣ Notebook Example

print("Running Databricks Job with JAR")

df = spark.range(1, 10)
display(df)

⚙️ 6️⃣ Create Job (UI)

  • Go to: Workflows → Jobs → Create Job
  • Job Name: test-jar-job
  • Task Type: Notebook
  • Select Notebook

Add Library

/Volumes/my_catalog/my_schema/my_volume/hello-spark.jar

Compute

Serverless

▶️ 7️⃣ Run Job

  • Click Run Now
  • Check logs under Runs

✅ 8️⃣ Expected Output

Count is: 99

๐Ÿงช 9️⃣ Test Scenarios

Positive

  • JAR loads successfully
  • Notebook executes
  • Volume accessible

Negative

  • No Volume permission → Access Denied
  • Wrong IAM Role → S3 Access Denied
  • Missing JAR → File not found

๐Ÿ” ๐Ÿ”ฅ 10️⃣ Enterprise Best Practices

  • Use separate S3 bucket for artifacts
  • Use Unity Catalog Volumes for governance
  • Restrict S3 access to specific prefixes
  • Enable audit logging

๐ŸŽฏ Final Flow

S3 (JAR) → Volume → Notebook → Job → Output

No comments:

Post a Comment