Databricks Job with S3, Volume and JAR (Serverless)
Databricks Serverless Job with S3 + Volume + Notebook
๐งฉ Architecture
S3 (JAR / Libraries) → Databricks Volume → Notebook → Databricks Job
- S3 stores JAR files and artifacts
- Volume (Unity Catalog) provides governed access
- Notebook runs logic
- Job executes workload
๐ IAM Role & Policy
IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3ReadArtifacts",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-artifact-bucket",
"arn:aws:s3:::my-artifact-bucket/*"
]
},
{
"Sid": "S3VolumeAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::my-volume-bucket/*"
]
}
]
}
Trust Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:::root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": ""
}
}
}
]
}
๐งฑ Unity Catalog Setup
Create Storage Credential
CREATE STORAGE CREDENTIAL my_cred
WITH IAM_ROLE = 'arn:aws:iam::123456789012:role/databricks-role';
Create External Location
CREATE EXTERNAL LOCATION my_ext_loc
URL 's3://my-volume-bucket/'
WITH (STORAGE CREDENTIAL my_cred);
Create Volume
CREATE VOLUME my_catalog.my_schema.my_volume
LOCATION 's3://my-volume-bucket/vol/';
๐ Notebook Example
volume_path = "/Volumes/my_catalog/my_schema/my_volume/"
# Copy JAR from S3 to Volume
dbutils.fs.cp(
"s3://my-artifact-bucket/libs/my-app.jar",
volume_path + "my-app.jar"
)
# List files
display(dbutils.fs.ls(volume_path))
⚙️ Using JAR in Job
Option 1: Add JAR in Notebook
spark.sparkContext.addJar("/Volumes/my_catalog/my_schema/my_volume/my-app.jar")
Option 2: Job Configuration (Recommended)
/Volumes/my_catalog/my_schema/my_volume/my-app.jar
๐ Databricks Job JSON
{
"name": "jar-test-job",
"tasks": [
{
"task_key": "run-notebook",
"notebook_task": {
"notebook_path": "/Workspace/Users/test/notebook"
},
"libraries": [
{
"jar": "/Volumes/my_catalog/my_schema/my_volume/my-app.jar"
}
]
}
]
}
๐งช Test Scenarios
Positive Tests
- JAR loads successfully
- Notebook executes without error
- Volume is accessible
Negative Tests
- No permission on volume → Access Denied
- Invalid IAM role → Storage credential failure
- Missing JAR → Job failure
✅ Summary
| Component | Purpose |
| S3 | Stores JAR and artifacts |
| IAM Role | Grants access to S3 |
| Storage Credential | Connects Databricks to AWS |
| External Location | Maps S3 to Databricks |
| Volume | Secure file access layer |
| Notebook | Executes logic |
| Job | Runs the workflow |
No comments:
Post a Comment