Databricks Serverless Job with JAR from S3 via Volume

Databricks Serverless Job with JAR (S3 → Volume → Notebook → Job)

🎯 Goal

Upload JAR to S3
Create Databricks Volume
Copy JAR to Volume
Create Notebook
Create Job via UI
Run and validate

🧪 1️⃣ Sample Test JAR

HelloSpark.java

package com.example;

import org.apache.spark.sql.SparkSession;

public class HelloSpark {
    public static void main(String[] args) {

        SparkSession spark = SparkSession.builder()
                .appName("Test JAR Job")
                .getOrCreate();

        long count = spark.range(1, 100).count();

        System.out.println("Count is: " + count);

        spark.stop();
    }
}

pom.xml

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.example</groupId>
  <artifactId>hello-spark</artifactId>
  <version>1.0</version>

  <dependencies>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.5.0</version>
      <scope>provided</scope>
    </dependency>
  </dependencies>
</project>

Build JAR

mvn clean package

Output: target/hello-spark-1.0.jar

☁️ 2️⃣ Upload JAR to S3

aws s3 cp target/hello-spark-1.0.jar s3://my-artifact-bucket/libs/

🧱 3️⃣ Databricks UI Setup

Step 1: Create Storage Credential

Go to: Data → Credentials
Click: Create Credential
Name: my_cred
IAM Role ARN: your role ARN

Step 2: Create External Location

Go to: Data → External Locations
Name: my_ext_loc
URL: s3://my-volume-bucket/
Credential: my_cred

Step 3: Create Volume

Go to: Catalog → Schema
Create Volume: my_volume

📁 4️⃣ Copy JAR to Volume

volume_path = "/Volumes/my_catalog/my_schema/my_volume/"

dbutils.fs.cp(
    "s3://my-artifact-bucket/libs/hello-spark-1.0.jar",
    volume_path + "hello-spark.jar"
)

display(dbutils.fs.ls(volume_path))

📓 5️⃣ Notebook Example

print("Running Databricks Job with JAR")

df = spark.range(1, 10)
display(df)

⚙️ 6️⃣ Create Job (UI)

Go to: Workflows → Jobs → Create Job
Job Name: test-jar-job
Task Type: Notebook
Select Notebook

Add Library

/Volumes/my_catalog/my_schema/my_volume/hello-spark.jar

Compute

Serverless

▶️ 7️⃣ Run Job

Click Run Now
Check logs under Runs

✅ 8️⃣ Expected Output

Count is: 99

🧪 9️⃣ Test Scenarios

Positive

JAR loads successfully
Notebook executes
Volume accessible

Negative

No Volume permission → Access Denied
Wrong IAM Role → S3 Access Denied
Missing JAR → File not found

🔐 🔥 10️⃣ Enterprise Best Practices

Use separate S3 bucket for artifacts
Use Unity Catalog Volumes for governance
Restrict S3 access to specific prefixes
Enable audit logging

🎯 Final Flow

S3 (JAR) → Volume → Notebook → Job → Output

Tips to Improve Knowledge

Thursday, 19 March 2026

Databricks Serverless Job with JAR from S3 via Volume

Databricks Serverless Job with JAR (S3 → Volume → Notebook → Job)

🎯 Goal

🧪 1️⃣ Sample Test JAR

HelloSpark.java

pom.xml

Build JAR

☁️ 2️⃣ Upload JAR to S3

🧱 3️⃣ Databricks UI Setup

Step 1: Create Storage Credential

Step 2: Create External Location

Step 3: Create Volume

📁 4️⃣ Copy JAR to Volume

📓 5️⃣ Notebook Example

⚙️ 6️⃣ Create Job (UI)

Add Library

Compute

▶️ 7️⃣ Run Job

✅ 8️⃣ Expected Output

🧪 9️⃣ Test Scenarios

Positive

Negative

🔐 🔥 10️⃣ Enterprise Best Practices

🎯 Final Flow

No comments:

Post a Comment