Databricks Serverless Job with JAR (S3 → Volume → Notebook → Job)
๐ฏ Goal
- Upload JAR to S3
- Create Databricks Volume
- Copy JAR to Volume
- Create Notebook
- Create Job via UI
- Run and validate
๐งช 1️⃣ Sample Test JAR
HelloSpark.java
package com.example;
import org.apache.spark.sql.SparkSession;
public class HelloSpark {
public static void main(String[] args) {
SparkSession spark = SparkSession.builder()
.appName("Test JAR Job")
.getOrCreate();
long count = spark.range(1, 100).count();
System.out.println("Count is: " + count);
spark.stop();
}
}
pom.xml
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>hello-spark</artifactId>
<version>1.0</version>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.5.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
Build JAR
mvn clean package
Output: target/hello-spark-1.0.jar
☁️ 2️⃣ Upload JAR to S3
aws s3 cp target/hello-spark-1.0.jar s3://my-artifact-bucket/libs/
๐งฑ 3️⃣ Databricks UI Setup
Step 1: Create Storage Credential
- Go to: Data → Credentials
- Click: Create Credential
- Name:
my_cred - IAM Role ARN: your role ARN
Step 2: Create External Location
- Go to: Data → External Locations
- Name:
my_ext_loc - URL:
s3://my-volume-bucket/ - Credential:
my_cred
Step 3: Create Volume
- Go to: Catalog → Schema
- Create Volume:
my_volume
๐ 4️⃣ Copy JAR to Volume
volume_path = "/Volumes/my_catalog/my_schema/my_volume/"
dbutils.fs.cp(
"s3://my-artifact-bucket/libs/hello-spark-1.0.jar",
volume_path + "hello-spark.jar"
)
display(dbutils.fs.ls(volume_path))
๐ 5️⃣ Notebook Example
print("Running Databricks Job with JAR")
df = spark.range(1, 10)
display(df)
⚙️ 6️⃣ Create Job (UI)
- Go to: Workflows → Jobs → Create Job
- Job Name:
test-jar-job - Task Type: Notebook
- Select Notebook
Add Library
/Volumes/my_catalog/my_schema/my_volume/hello-spark.jar
Compute
Serverless
▶️ 7️⃣ Run Job
- Click Run Now
- Check logs under Runs
✅ 8️⃣ Expected Output
Count is: 99
๐งช 9️⃣ Test Scenarios
Positive
- JAR loads successfully
- Notebook executes
- Volume accessible
Negative
- No Volume permission → Access Denied
- Wrong IAM Role → S3 Access Denied
- Missing JAR → File not found
๐ ๐ฅ 10️⃣ Enterprise Best Practices
- Use separate S3 bucket for artifacts
- Use Unity Catalog Volumes for governance
- Restrict S3 access to specific prefixes
- Enable audit logging
๐ฏ Final Flow
S3 (JAR) → Volume → Notebook → Job → Output
No comments:
Post a Comment